Raw Czech Banking Data

This dataset contains a set of banking transactions and other data from a Czech bank.

The data has been pseudonymized and slightly anonymized. It was originally used for a machine learning competition.

It has seven different tables. The transactions table for example contains over 1.2M transactions across 5300 customers. The unique user identifier (uid) represents one customer (noting that some customers share an account).  

The data can be explored via SQL client at https://db001.gda-score.org/. Server ‘PostgreSQL’, database ‘raw_banking’, tables ‘accounts’, ‘cards’, ‘clients’, ‘disp’, ‘loans’, ‘orders’, ‘transactions’.

Other databases

The GDA Score project offers a number of real databases that can be used to test and measure anonymization methods.

Raw USA Census Database

This database is taken from the US Census of 2013. 

Find out more

Raw NYC Taxi Database

This database contains four hours of New York City taxi rides (from Jan. 8, 2013, 8AM to noon).

Find out more

Raw Scihub Database

This database contains one week’s worth of downloads from the Sci-Hub scientific papers free download system. 

Find out more