Quick guide to writing attacks

February 6, 2019 | Leave a comment

Author: Paul Francis, MPI-SWS

The Open GDA Score Project provides a software toolkit for building attacks and automatically computing a GDA Score. Example attacks can be found at https://github.com/gda-score/attacks. This article describes how to use the toolkit to build attacks.

Recommended Background Reading

This article describes the design of the GDA score: https://www.gda-score.org/what-is-a-gda-score/

This article shows how to read a GDA score diagram: https://www.gda-score.org/a-brief-guide-score-diagrams/

Resources

Existing attacks may be found at: https://github.com/gda-score/attacks. See the README for how to install.

API documentation may be found at: https://gda-score.github.io

Basic Attack Score Components

As described in https://www.gda-score.org/what-is-a-gda-score/, the GDA Attack Score has 5 components:

Susceptibility: how susceptible the various dataset attributes are to attack (how much of the data can be attacked),

Confidence Improvement and Claim Probability: the accuracy of what is learned relative to how much of the susceptible data the attacker chooses to attack,

Prior Knowledge: the amount and type of prior knowledge needed by the attacker (what the attacker knows about the protected data or related external data), and

Work: the amount of “work” needed to do the attack (i.e. the number of queries).

All of these components can be computed by the class gdaAttack(), but the API must be used properly for the scores to be correct. Note in particular that gdaAttack() does not enforce correct usage: an attack designer can manipulate the API into producing any score he or she wishes.

General Exploration of Data

The gdaAttack() API allows one to explore the data without this exploration affecting the score. Most generally this is done through the askExplore()/getExplore() API calls, which executes and answers SQL queries. This interface is useful for any program that wishes to explore data, and is for instance used by the software that measures utility.

The gdaAttack() API also provides a number of convenience functions for data exploration. Often in the context of an attack, it is reasonable to assume that there is general public knowledge of certain aspects of a dataset, and that this knowledge should not be regarded as special prior knowledge associated with the attack.

For instance, it is reasonable to assume that the name of the table, table columns, and column types is all commonly known information. The API provides the following helper functions to obtain this information:

    getAttackTableName()
    getColNames()
    getColNamesAndTypes()
    getTableNames()

It is also reasonable to assume common public knowledge of certain values in a column, specifically values that are likely to pertain to many users. The gdaAttack() API provides a function that returns these values for the named column:

    getPublicColValues()

Public Database for Linkability attacks

Linkability attacks assume that there is a public database that contains data that may be linked to a protected database. This public database can be explored with the askExplore()/getExplore() interface, or browsed with the SQL client at https://db001.gda-score.org/.

Prior Knowledge

If an attack requires specific prior knowledge (beyond the assumed common public knowledge just mentioned), then that knowledge should be obtained using the either the getPriorKnowledge() (preferred) or the askKnowledge()/getKnowledge() interface. The getPriorKnowledge() interface allows prior knowledge to be requested according to a number of criteria, for instance whether the known rows or users are randomly selected or are selected according to specific values of a given column. The getPriorKnowledge() interface is required by the Diffix bounty program. By contrast, the askKnowledge()/getKnowledge() returns prior knowledge according to a SQL query.

Both interfaces record the number of cells (column values) that are returned by all of the calls. Note that if the same data is requested more than once, the interface will over-count.

Work

Work is computed through the askAttack()/getAttack() interface. As with the ask/get interfaces mentioned above, askAttack()/getAttack() executes the supplied SQL query on the anonymous database (as specified by the anonDb parameter) and provides the answer.

askAttack()/getAttack() records the number of cells (column values) that are returned by all of the queries.

Confidence Improvement and Claim Probability

Confidence improvement and claim probability are computed with the askClaim()/getClaim() interface.

Recall from https://www.gda-score.org/what-is-a-gda-score/ that there are three criteria for measuring anonymity, singling out, inference, and linkability. Each of these criteria has an associated claim:

Singling-out: There is a single user with attributes A, B, C, …

Linkability: A given set of one or more users in a known dataset are also in the protected dataset

Inference: All users with attributes A, B, C, … also have attribute X

Currently getAttack() only supports equality claims, for instance column = value. getAttack() does not support non-equality claims (column != value) or inequality claims (column <= value or column BETWEEN val1 and val2). These will be added in the future as required.

For all three criteria, the askClaim()/getClaim() interface requires a set of column = value equalities (where each such equality is an attribute). The equalities are conveyed in the spec data structure, which is a list of column/value pairs. getAttack() uses this spec to generate the appropriate queries needed to determine if the claim is correct or not.

The equalities in the spec are labeled as being either known or guess. For the inference claim, attributes A, B, C, … are the known attributes (inequalities), while X would be the guessed attribute. askClaim() generates queries the raw database for attributes A, B, C, …, and then checks the answer to see if attribute X holds for all returned rows.

The known and guess spec labels are also used to compute confidence improvement for the singling-out and inference criteria. For instance, suppose the attacker is making a claim that the gender of a user is male. If 50% of all users are male, then confidence improvement is the extent to which an analyst improves over a statistical guess of 50%. In order to compute confidence improvement, gdaAttack() needs to know that the attacker is trying to guess that gender is male so that it may measure the statistical probability of gender being male. In this case, the gender equality is labeled as guess, and any other equalities in the claim are labeled as known.

Claim probability is also computed using the askClaim()/getClaim() interface. Recall from https://www.gda-score.org/what-is-a-gda-score/ that an attacker can improve confidence by only making claims where the attacker has high confidence of being correct. While this increases confidence for the attacker, it also means that the attacker learns about fewer users (lower claim probability). To measure claim probability, the askClaim()/getClaim() interface requires an indication as to whether the claim should be counted towards the confidence improvement score or not. This is done by setting claim = True or claim = False.

Susceptability

Currently gdaAttack() does not automatically measure susceptability. In principle it could do so by forcing the attack to attack all susceptable columns, and then measuring which columns were attacked, but currently we don’t do that.

By default all columns are labeled as fully susceptable to attack. If this is not the case, then the assignColumnSusceptibility() call in the gdaScores() method can be used to assign a different column susceptibility score.

Viewing Score Graphs

Once you’ve created a score (json file), you can view a graphical representation of the score by drag-and-drop to https://www.gda-score.org/preview-graph/. The credentials for this service are username: gdascore, password: previewgraph.

Leave a comment

Your email address will not be published. Required fields are marked *

*