Raw Scihub Database

This database contains one week’s worth of downloads from the Sci-Hub scientific papers free download system.

The week is the first week of September 2015. It has 15 columns, and contains over 1.1M downloads from around 160K different pseudonymized IP addresses.

The pseudonymized IP addresses are used as the unique identifier (uid).

The data can be explored via SQL client at https://db001.gda-score.org/. Server ‘PostgreSQL’, database ‘raw_scihub’, table ‘downloads’.

Other databases

The GDA Score project offers a number of real databases that can be used to test and measure anonymization methods.

Raw USA Census Database

This database is taken from the US Census of 2013. 

Find out more

Pseudonymization, Column Suppression

The column-suppressed pseudonymized tables are generated by simply deleting columns that contain Personally Identifying Information (PII).

Find out more

Pseudonymization, with K-anonymity

These pseudonymized tables are generated by applying K-anonymity to columns that contain Personally Identifying Information (PII).

Find out more