Databases

The GDA Score project offers a number of publicly released real databases that can be used to test and measure anonymization methods.

See https://www.gda-score.org/notes-on-our-databases/ for more information on how the databases were made.

All of the data may be viewed through an SQL client at db001.gda-score.org. We also provide a quick guide to the SQL client.  

K-anonymized location data

This table holds the K-anonymized data for the trip time and location columns of the taxi dataset.

Find out more

Raw NYC Taxi Database

This database contains four hours of New York City taxi rides (from Jan. 8, 2013, 8AM to noon).

Find out more

Raw Scihub Database

This database contains one week’s worth of downloads from the Sci-Hub scientific papers free download system. 

Find out more

Raw USA Census Database

This database is taken from the US Census of 2013. 

Find out more

Raw Czech Banking Data

This dataset contains a set of banking transactions and other data from a Czech bank.

Find out more

Pseudonymization, Column Suppression

The column-suppressed pseudonymized tables are generated by simply deleting columns that contain Personally Identifying Information (PII).

Find out more

Pseudonymization, with K-anonymity

These pseudonymized tables are generated by applying K-anonymity to columns that contain Personally Identifying Information (PII).

Find out more

K-anonymization

The K-anonymized tables are generated by applying K-anonymity to all columns.

Find out more