
About The Client
The client was a start-up company focusing on business data and documents. Their primary product was a secure web app which allowed businesses to liaise with service industry professionals, such as solicitors, accountancy firms and business management companies. The web app allowed companies and their service providers to create, edit and share various documents necessary for general company operation.
Project Brief
This quick report details two potential issues with the storage of user transactions for the system. An issue is considered to be any aspect of the system that would allow the linking of non rent or income transactions (“other” transactions) to an individual. It is understood that rent and income transactions that have been approved by a user are allowed to be linked to that user, but that all others must not be, although they need to be stored in the system for other purposes. As such, these “other” transactions must be anonymised to avoid linking them to the individual. Various methods have been detailed in the technical document that show how these transactions will be anonymised, so the purpose of this report is to identify other ways that could potentially be used to de-anonymise these transactions as well as ways to avoid them.
Vulnerabilities Discovered
Correlating Transactions Using Dates
Imagine that user A stores the following income and rent transactions: –
- 2021-03-01 – IN – £500 – Wages
- 2021-03-02 – OUT – £350 – Rent
- 2021-04-01 – IN – £500 – Wages
- 2021-04-02 – OUT – £350 – Rent
- 2021-05-01 – IN – £600 – Wages
- 2021-05-02 – OUT – £350 – Rent
- 2021-06-01 – IN – £600 – Wages
- 2021-06-02 – OUT – £350 – Rent
As per the technical document, the transactions have been shifted temporally so that the first transaction appears to be on the 1st of March 2021. We can see that after the second income from “Wages” the user got a pay rise of £100.
Now imagine that the following “other” transactions are stored and not linked directly to the user: –
- 2021-03-02 – OUT – £100 – Gambling
- 2021-04-02 – OUT – £100 – Gambling
- 2021-05-02 – OUT – £150 – Gambling
- 2021-06-02 – OUT – £150 – Gambling
First of all, if all transactions for a user are shifted temporally by the same offset, the “Gambling” transactions here correspond to the income transactions for that user, indicating that the user may gamble the day after being paid. Secondly, the increase in spending on gambling corresponds to the increase in income from wages.
This dataset of course would be much larger in reality as transactions from other users would be included, all starting from 2021-03-01, for example: –
2016-03-01 OUT £ 50 Shopping
2016-03-01 OUT £ 20 Drinks
2016-03-02 OUT £100 Gambling
2016-03-02 OUT £ 32 Shopping
2016-03-02 OUT £120 Gambling
2016-03-02 OUT £ 10 Parking
. . . . . . . . . . . .
2016-04-01 OUT £ 40 Shopping
2016-04-01 OUT £ 21 Drinks
2016-04-02 OUT £ 50 Shopping
2016-04-02 OUT £100 Gambling
2016-04-02 OUT £ 8 Parking
. . . . . . . . . . . .
2016-05-01 OUT £ 60 Travel
2016-05-01 OUT £ 30 Shopping
2016-05-02 OUT £ 22 Drinks
2016-05-02 OUT £150 Gambling
2016-05-02 OUT £ 55 Shopping
. . . . . . . . . . . .
2016-06-01 OUT £ 38 Shopping
2016-06-01 OUT £ 25 Drinks
2016-06-02 OUT £ 75 Gambling
2016-06-02 OUT £150 Gambling
2016-06-02 OUT £ 12 Parking
2016-06-02 OUT £ 42 Shopping
This means that we can’t say for certain that user A is a gambler, but we can still make educated guesses, especially as gambling transactions exist consistently after the user’s income and also change to reflect changes in income. For example, the “Shopping” transactions here might relate to another user, but we also can see that they correlate to user A’s income dates so we might wrongly assume that they belong to user A.
Imagine that for one month user A is paid one day later. If the guessed gambling transaction for that month also shifts by one day then this is another sign that the transactions can be matched.
While nothing here gives a completely accurate prediction, collecting more transactions provides more hints to increase the estimation accuracy over time. The potential flaw here is that while a small amount of data is of course bad, increasing amounts also become worse as this provides more of a basis for accurate statistical analyses.
This is all assuming that both income/rent and other transactions are all shifted temporally in the same way. If the other transactions are shifted independently then this potential correlation is either lost or made much more difficult, which is of course a good thing.
Operating System and Database-Level Vulnerabilities
From a technical standpoint, depending on the database system used there may also be problems due to OS time-stamping. For example, if adding a user’s rent/income and other transactions causes the database file to be created or touched, this could potentially be used to correlate the addition of both, allowing both sets to be linked. This of course depends on the database being used and is probably not an issue, however it needs to be checked.
It is mentioned in the technical document that the addition of transactions are batched. This would be a good solution to this problem so long as multiple transactions from multiple users are written to the final database in batches and then the original transactions destroyed. Doing so would severely reduce the granularity of creation/update timestamps meaning that correlation becomes more difficult.
Database systems also sometimes maintain an insertion order, so this could be used to correlate both sets of transactions. Special care would be needed to ensure that records cannot be returned in insertion order as again this would allow undesired correlation. As above, batching and then inserting records from the batch in a random order would help to alleviate this problem.
Conclusions and Following Steps
Two potential problems have been identified together with ways in which these problems could be mitigated. These issues should be considered and handled together with the safeguards already detailed in the technical document in order to avoid the possibility of transactions being linked to individuals. As discussed previously, care should also be taken to remove any date/time information from transaction descriptions as this would counteract any work done by the system to anonymise transactions based on date. No further weaknesses have been found at this point, however statistical analysis of actual data might show that flaws still exist. It is recommended that that the system be reviewed again at such time that this data becomes available, so that proper statistical analyses can be conducted.
Similar Projects
Get In Touch
Thank you for your interest in our business. If you have any questions about our services, a project you’d like us to help with, or if you just want to say hello, please don’t hesitate to get in touch. We look forward to hearing from you!