Raw USA Census Database

This database is taken from the US Census of 2013.

This dataset is already anonymized by the US Census Bureau through sampling, aggregation, and other means. It contains 120 columns and represents 250K individuals.

Column definitions can be found via this page: https://usa.ipums.org/usa-action/variables/group. We added a per-row identifier as the unique user ID (uid). Each uid represents one individual.

The data can be explored via SQL client at https://db001.gda-score.org/. Server ‘PostgreSQL’, database ‘raw_census’, table ‘persons’.

Other databases

The GDA Score project offers a number of real databases that can be used to test and measure anonymization methods.

Raw Czech Banking Data

This dataset contains a set of banking transactions and other data from a Czech bank.

Find out more

Raw Scihub Database

This database contains one week’s worth of downloads from the Sci-Hub scientific papers free download system. 

Find out more

Pseudonymization, Column Suppression

The column-suppressed pseudonymized tables are generated by simply deleting columns that contain Personally Identifying Information (PII).

Find out more