Raw NYC Taxi Database

This database contains four hours of New York City taxi rides (from Jan. 8, 2013, 8AM to noon).

It has over 95000 taxi rides driven by over 11000 drivers. It has 29 columns.

The hack columns (driver) is set as the unique user id (uid).

Links to documents describing most of the column definitions can be found at http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml.

The data can be explored via SQL client at https://db001.gda-score.org/. Server ‘PostgreSQL’, database ‘raw_taxi’, table ‘rides’.

Other databases

The GDA Score project offers a number of real databases that can be used to test and measure anonymization methods.

K-anonymized location data

This table holds the K-anonymized data for the trip time and location columns of the taxi dataset.

Find out more

Pseudonymization, with K-anonymity

These pseudonymized tables are generated by applying K-anonymity to columns that contain Personally Identifying Information (PII).

Find out more

Raw USA Census Database

This database is taken from the US Census of 2013. 

Find out more