Are you building blockchain applications? Here you will find information about the conflicts between the GDPR and Blockchain and what solutions the industry is applying to solve privacy and data protection issues.
Introduction to GDPR and Blockchain
The GDPR (General Data Protection Regulation) is the European law which provides for the protection of personal data of individuals who are in the European Union, regardless whether the data are processed by organisations established in the EU or on a third country.
This legal framework for data protection was born with the purpose of granting a higher level of protection to individuals while promoting the free flow of personal data within the EU boundaries and to countries or organisations with an adequate level of data protection.
It is, in addition, a technology-neutral law, which means it applies to any processing of personal data which falls under its scope of application, regardless of the technology involved. This is, however, the source of all conflicts arising between the GDPR and Blockchain, which was drafted with the idea of protecting citizens against the abusive power of big centralised data silos and did not consider free access technologies with nodes distributed across the world which maintain the same copy of a database.
The GDPR was not conceived to address issues arising from the processing of personal data by distributed ledger technologies like blockchain.
Types of blockchains
Depending on the permissions that a given blockchain requires from users to allow them to participate, blockchains can be separated between permissioned and permissionless blockchains.
- Permissionless blockchains: they are accessible to any person accross the world. Anyone can participate keeping a local copy of the blockchain, recording transactions or validating blocs.
- Permissioned blockchains: there is a central party which grants access and participation permissions.
In addition, permissionless blockchains are also commonly known as “public blockchains”, and permissioned blockchains as “private blockchains”. However, this classification may result ambiguous, since a permissioned blockchain may at the same time be private, when it restricts lecture of the blocs, or public when it is fully transparent.
Whereas both categories have their own risks to privacy and data protection, it is in permissionless blockchains where most conflicts and incompatibilities between the GDPR and Blockchain arise and where more effort is required to develop applications with an appropriate level of data protection build in.
When does the GDPR apply to a blockchain
One of the first problems arising between the GDPR and Blockchain concerns the scope of application of the Regulation.
According to Article 3 of the GDPR, it applies to “the processing of personal data wholly or partly by automated means and to the processing other than by automated means of personal data which form part of a filing system or are intended to form part of a filing system”.
The first item to assess to know whether the GDPR applies to your blockchain is whether or not you process personal data.
Next, you need to verify whether:
- you process personal data in the context of the activities of an establishment in the Union, or
- with no establishment in the Union, you process personal data of data subjects who are in the Union, offering them goods or services or monitoring their behaviour.
Personal data on blockchains
The GDPR defines personal data on Article 4.1 as “any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person”.
There is a category of personal data which is special because it links information to an individual, these are called identifiers. In a blockchain, there are usually two types of identifiers, users public keys and their addresses, which are derived from applying a hash algorithm to public keys.
Any information included on a block which refers to an individual should be considered personal data insofar as the data subject might be identified.
Furthermore, take into account that, as highlighted by the WP29 in the Opinion 4/2007 on the concept of personal data (wp136), information may also refer to an object or a process and only indirectly to a individual and yet in this case it will still qualify as personal data.
Can the owner of a public key be identified?
Recital 26 of the GDPR provides some guidance on this:
To determine whether a natural person is identifiable, account should be taken of all the means reasonably likely to be used, such as singling out, either by the controller or by another person to identify the natural person directly or indirectly.
To operate in a blockchain such as Bitcoin or Ethereum, you only need to create a public and private key pair and receive a transaction. You are not required to follow any identification process where you submit your identity or address.
That is the reason why direct identification with blockchain data is absolutely impossible in most cases. However, Blockchain is not an isolated technology, on the contrary, it is one of the several layers which incorporate a software.
As a result thereof, whereas direct identification might not be possible, in many scenarios owners of public keys are identifiable by indirect means, linking their blockchain data with personally identifiable information present on other protocols, devices or applications. Arun Devan, in “The Blockchain Technology Stack”, discriminates between the following layers:
Many organisations which provide blockchain services might have access to other personal data relating to you. For instance, a regulated exchange needs to implement KYC and anti-money laundering procedures and ,when using one of their wallets to order transactions, the exchange should be capable of linking your real identity with your public keys.
A second example are e-commerce businesses which accept crypto-payments. On a study carried out by the University of Princeton, they verified that many of these companies install cookies in user’s browsers which collect information about the purchasing data flow, thus making the identification of the transaction on the blockchain possible. Furthermore, in other cases the cookie did not only collected data about the purchase but it even recorded the specific transaction or, even worse, the purchaser’s public key.
When the purchasing data flow is linked with other data to witch an online merchant has access to, such as IP addresses or real identity of the purchaser and the delivery address, it is possible to identify an individual on a blockchain.
Conversely, nodes of a blockchain require a network to transmit data and achieve consensus about the state (synchronise). As of today, most permissionless blockchains are designed to use the Internet and TCP/IP protocols, therefore nodes require access to IP addresses from other nodes to make or answer any requests. If a blockchain application has not incorporated strong privacy safeguards, users will also have access to IP addresses.
Access from third parties and use of an open network such as Internet to transmit and receive communications are just two examples of how to reveal the owner of a public key, though there are many more.
If you are interested in finding out about the different manners to identify the individual behind a public key, check my thesis “Personal data and anonymity on Bitcoin”, published in Spanish on num. 48 of Revista Aranzadi de Derecho y Nuevas Tecnologías.
Are not data anonymised?
On the WP136, the WP29 defined anonymised data as:
Any information relating to a natural person where the person cannot be identified, whether by the data controller or by any other person, taking account of all the means likely reasonably to be used either by the controller or by any other person to identify that individual. “Anonymised data” would therefore be anonymous data that previously referred to an identifiable person, but where that identification is no longer possible.
In order to assess if an anonymisation technique is robust and re-identification impossible, the technique applied “ (see WP216) should “prevent all parties from singling out an individual in a dataset, from linking two records within a dataset (or between two separate datasets) and from inferring any information in such dataset”.
That an anonymisation solution needs to offer resistance to singularization, linkability and inference means that at present the threshold is so high that most blockchains do record personal data.
On permissionless blockchains nodes require to read and process data in order to validate transactions. Except in those applications where a privacy-focused technique has been embedded within the blockchain, such as monero’s ring signatures or Zcash’s zk-SNARKs, data will be visible to anyone and the sole barrier for this data to be personal data will be the participation of its users through public keys and encryption or hashing of transactions, preventing them from appearing in plain text.
However, since (i) public keys single out individuals, (ii) it is possible to link records of the blockchain with other records of the same blockchain or another dataset, and (iii) information can be inferred from blockchain data, in most blockchains data cannot be considered anonymised.
Encryption and hashing are pseudonymisation and not anonymisation techniques, thus they allow linking of the information pseudonymised with other datasets and identify a data subject.
Territorial scope of the GDPR and Blockchain
One of the main problems between the GDPR and Blockchain concerns the territorial scope of application. Permissionless blockchains are of free access to any person from any part of the globe, which means that any individual can set up a node an start processing data from their home.
When that individual is established outside the EU and the blockchain contains personal data, the individual will potentially fall under the role of a controller, thus rising several issues concerning jurisdiction, governing law and data subject rights enforcement.
As foreseen by Article 3 of the GDPR, you should apply the GDPR to the following processings:
- processing of personal data in the context of the activities of an establishment in the Union, regardless of where are located the means of processing; or
- with no establishment in the Union, you process personal data of data subjects who are in the Union, offering of goods or services or monitoring their behaviour in the Union.
Conversely, on permissioned blockchains it is the entity or organisation behind the blockchain who decides which nodes participate on the blockchain and with what roles and permissions (mining, broadcasting transactions, only lecture…).
International data transfers on blockchains
A blockchain is a distributed database where every node maintains an identical copy of the information. To operate, nodes need to be synchronised, which brings them to constantly and reciprocally communicate new transactions and blocks.
As established by the GDPR, transfers of personal data outside of the European Economic Area can only be conducted where one of the conditions set forth by Chapter V of the Regulation is met.
Satisfying these conditions should not present excessive difficulties on permissioned blockchains, since the controller can have absolute control on the location of the network nodes.
On the other hand, permissionless blockchains present a bigger challenge since personal data may be transferred without control all across the world and the GDPR does not seem to offer an appropriate mechanism to legitimate them.
Controllers and processors in the GDPR and Blockchain
The GDPR defines the data controller as “the natural or legal person, public authority, agency or other body which, alone or jointly with others, determines the purposes and means of the processing of personal data”.
Usually, it is the same entity that first asks and collect the data, though this is not always the case; the controller is the entity or organisation which determines why the data is collected and how is going to be processed.
The data processor is “a natural or legal person, public authority, agency or other body which processes personal data on behalf of the controller”.
On a permissioned blockchain there is a central entity or group of entities that operate the blockchain which have the capacity to grant other nodes with access or write privileges. This entity or organisation that manages the blockchain is the controller and the other nodes of the network are processors insofar as they are entities separated from the controller.
On the contrary, determining the data controller or processor on a permissionless blockchain is a much harder task and far from a clear solution. In this type of blockchains the processing is conducted jointly by all nodes of the network within the boundaries set in code, which participate broadcasting and validating transactions, without the need for any human interaction.
Whereas automation could raise the idea that there is no human action behind and therefore no controller, that would not be accurate since a permissionless blockchain application must be installed first and that requires human action.
Because of the above, it is possible that users behind permissionless blockchains nodes could be deemed as controllers, which would arise significant issues between the GDPR and blockchain. For instance, to enforce the Regulation or facilitate the exercise of data subject rights, as explained below.
Data protection principles of the GDPR and Blockchain
Lawfulness, fairness and transparency
On permissioned blockchains this a quite easy principle to comply with, since it is enough to collect data subjects informed consent. However, what happens on permissionless blockchains?
On these blockchains it might not be as easy since consent would not be valid in terms of the GDPR. Perhaps, a data processing of these characteristics would be lawful on the grounds of legitimate interest where complete information about the processing is given to data subjects at the moment prior accessing the software.
In that case, data subjects interested on joining the network would know about the context and consequences for their data when using the blockchain and the risks associated thereto, which would make them capable of taking an informed decision.
Then it would be possible to argue that data subjects can reasonably expect that their data are processed and transferred in the way blockchains dictate.
Here the GDPR and Blockchain are definitely more compatible. On centralised data silos there is the risk that entities process personal data for purposes other than those that motivated the collection of the data. This does not happen on a blockchain.
On a blockchain the data flow of the processing is embedded in the software and there is absolute certainty about the roles nodes have and the purpose for processing the data. The same does not strictly apply, however, to permissioned blockchains, since there are no technical requirements impeding the blockchain operator to alter the protocol and process the personal data for other purposes.
This principle of the GDPR and Blockchain are hopelessly confronted. The data minimisation principle means that personal data should be adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed.
Blockchains are immutable. Each new block which is added to the blockchain contains a hash of the previous block header, thus if a block is changed, its hash would be different and blocks on top would no longer be valid.
This operating system is essential to maintain the blockchain safe from attacks, yet it is also a point of conflict with the GDPR and specially with the data minimisation principle.
A blockchain will never stop adding new blocks on top of previous blocks, growing in size and thus increasing the amount of personal data it holds.
Blocks on permissionless blockchains such as Bitcoin or Ethereum do not expire, they exist in perpetuity. There are exceptions and some data can be erased, for instance, on Bitcoin blocks can be pruned in order to delete transactions totally spend.
However, pruning cannot be performed on all nodes, since in order to validate transactions included on a block nodes require access to a full version of the blockchain.
Regarding public keys, which are also personal data, these are part of blocks metadata and cannot be erased.
Smart contracts can also be deleted. On Ethereum, smart contracts can be designed to enable its self-destruction when executing the function selfdestruct(). A “smart contract suicide” can achieve the erasure of its personal data. Smart contracts can therefore be designed to comply with the principle of storage limitation.
Principle of accuracy requires that the appropriate reasonable measures are taken to ensure personal data are accurate, up to date and not misleading.
Blockchains do not permit erasing some piece of data and replacing it with another, data can only be added mining new blocks with new transactions which include an additional statement and “amend” the inaccuracy.
On the other hand, modifying the block content would require creating a hard fork, which is a different version of the blockchain that would coexist with the old and inaccurate blockchain.
On permissioned blockchains blocks might be modified without creating a hard fork. However, as further on the blockchain the inaccuracy goes, more costly on time and resources amending that data is.
Integrity and confidentiality
Blockchains bring their own risks to information security and those risks need to be evaluated and treated following an appropriate risk management framework such as ISMS ISO 27001.
The principle of integrity and confidentiality, or information security, means that personal data should be processed “in a manner that ensures appropriate security of the personal data, including protection against unauthorised or unlawful processing and against accidental loss, destruction or damage, using appropriate technical or organisational measures”.
To comply with this principle, you need to identify the threats to integrity, confidentiality and availability of information on information assets which arise from the use of blockchain technology, to estimate the likelihood and the impact of every threat, to determinate the risk combining both and to select and implement appropriate controls to treat risks.
For instance, on Blockchain hashing algorithms are widely used, therefore one of the risks to review is that those algorithms are not easy to break. Under to circumstance a blockchain should use unsafe algorithms such as md5, since it may compromise integrity and confidentiality of the information published on the blockchain and the security of the whole network.
A second threat to information on blockchains is loss of private keys. Losing a private key may pose a threat to availability of the information where no recovery mechanisms have been implemented.
GDPR rights on Blockchain
Another field where the GDPR and Blockchain also enter into conflict concerns the enforceability of the data protection rights granted by the GDPR.
The data controller has the legal obligation of making available reasonable means to facilitate the exercise of data protection rights to data subjects.
However, this may prove especially challenging when personal data is published on a blockchain.
For starters, if we agree on the assumption that all nodes taking part on a permissionless blockchain are controllers (which is in itself a lot to assume since it is not clear), any data subject should be entitled to require any node fulfilment of their rights under the GDPR.
This again can be very problematic because data subjects would have a really hard time identifying a node to address their request and, even if they succeed, the individual operating a node would not know which data in the blockchain belongs to the data subject because they only have access to encrypted data or hashes.
Right of access
The right of access empowers data subjects to obtain from you confirmation as to whether or not personal data concerning him or her are being processed, and, where that is the case, a copy of that personal data. Furthermore, they are also entitled to request the following information:
- the purposes of the processing;
- the categories of personal data concerned (health, financial, professional…);
- the recipients or categories of recipient to whom the personal data have been or will be disclosed, in particular recipients in third countries or international organisations;
- the envisaged period for which the personal data will be stored;
- the existence of the right of rectification, erasure, restriction or objection and the right to lodge a complaint with a supervisory authority;
- where the personal data are not collected from the data subject, any available information as to their source;
- the existence of automated decision-making, including profiling, information about the logic involved, the significance and the envisaged consequences.
How can a data controller respond an access request when he is not in a position to distinguish which of the data published on the blockchain belongs to the data subject?
One possible way to achieve this would be to provide the data subject with a link where they can download the software and thus access the blockchain, however it remains unclear if this would be enough to answer an access request.
Right to rectification
The right to rectification allows data subjects to request rectification of their personal data where inaccurate and completion of their personal data where incomplete.
On a blockchain it is impossible to modify the data of blocks which have been already added to the chain because attempting to do so would cause a change on the block’s hash and all blocks mined on top would stop being valid. This resistance is necessary to prevent attacks yet it also is an issue for the efectiveness of the right to rectification, unless it is accepted as a solution that rectification can be achieved through adding additional statements.
Thus is foreseen by Article 16 of the GDPR, which states:
The data subject shall have the right to obtain from the controller without undue delay the rectification of inaccurate personal data concerning him or her. Taking into account the purposes of the processing, the data subject shall have the right to have incomplete personal data completed, including by means of providing a supplementary statement.
Whereas a blockchain does not enable block data to be rectified, it does allow to add new blocks with more information, which could include an additional statement completing incomplete information or amending any inaccuracies. It remains to be seen if such a way of executing a rectification request would be enough.
Right of erasure (to be forgotten)
The right to erasure, or right to be forgotten, entitles data subjects to request the erasure of their personal data where one of the following grounds applies:
- the personal data are no longer necessary in relation to the purposes for which you collected or processed them;
- the processing is based on consent and the data subject has withdrawn consent, unless there is other legal ground for the processing;
- the data subject objects to the processing;
- the personal data have been unlawfully processed (in breach of a law or with no legal grounds);
- the personal data have to be erased for compliance with a statutory obligation;
- the personal data were collected from a child below the age of 16 without authorisation or consent by the holder of parental responsibility over the child.
The immutability of blockchains may also pose a problem to the right of erasure.
As explained above on the subject of the storage limitation principle, there are exceptions to immutabilty and in some instances data may actually be erased, such as in forks, pruning and smart contracts suicide.
These solutions however do have their own disadvantages. In some instances they might be too costly to even be on the table and in others the degree of erasure they provide is just not enough.
Another possible way to conduct the right of erasure was first presented (if I am not wrong) on July 2017 in the Blockchain Policy Initiative Report Tokens As Novel Asset Class, which consists on the destruction of private keys. By destroying the private keys, the information would remain blocked inside the blockchain, encrypted ad infinitum with no way to be accessed.
Solutions to reconcile the GDPR and Blockchain
Centralised databases and hash pointers
Because of the high number of points of conflict between the GDPR and Blockhain, one of the most widely used solutions is to move transactional data offchain and store them instead on a centralised database. Unlike “normal” blockchains, blocks on this protocol only have the header metadata and hash pointers linking to a second database.
What are hash pointers?
Hash pointers are structures widely used on blockchains which consists of:
- a pointer to where the data is stored, and
- a cryptographic hash of the data.
Whereas the pointer is used to retrieve the information, the hash is used to verify that the data has not been tampered with (vid. Hash Pointers and Data Structures, by Huabing Zhao).
With this second-layer solution personal data included on the list of transactions is moved to a centralised database, where the data can be rectified, erased and altered with absolute freedom. However, it should be taken into account that any change on the information stored offchain would cause a change on its hash and therefore the hash pointer published on the blockchain would stop being valid.
While this solution certainly helps reconciling the GDPR and Blockchain, public keys included on block headers are still a problem, since these are necessary for the block validation process and cannot be moved out of the chain.
Use of centralised databases and hash pointers is a practice quite extended yet also criticised. While an ideal solution for permissioned blockchains, for permissionless blockchains it is not a very popular solution, since turning to a centralised third party to store the data means bringing in a new element of trust, which is precisely what blockchains intent to remove, or more precisely, transfer from private companies to code and mathematics.
Zero knowledge proofs
“Zero-knowledge” proofs allow one party (the prover) to prove to another (the verifier) that a statement is true, without revealing any information beyond the validity of the statement itself.
zk-SNARK, which stands for “Zero-Knowledge Succinct Non-Interactive Argument of Knowledge,” is a Zero-Knowledge cryptography where it is possible to prove possession of certain information, e.g. a secret key, without revealing that information, and without any interaction between the prover and verifier.
zk-SNARKs are used by Zcash to prove that the conditions for a valid transaction have been satisfied without revealing any crucial information about the addresses or values involved.
On Zcash, transactions are fully encrypted on the blockchain and do not reveal the sender address, the receiver address or the value of the transaction.
If you are interested and want to know more about zero knowledge proofs and how Zcash makes use of zk-SNARKs, click here.
Data protection impact assessments
Article 35.1 of the GDPR sets forth a requirement to conduct a data protection impact assessment (DPIA) where a type of processing, in particular using new technologies, and taking into account the nature, scope, context and purposes of the processing, is likely to result in a high risk to the rights and freedoms of natural persons.
To assess whether a processing “is likely to result in a high risk” and therefore carrying out a DPIA is mandatory, the EDPB sets out 9 criteria that must be evaluated, amongst which there is the innovative use or application of new technological or organisational solutions.
Determining whether or not a DPIA is required is something that must be assessed on a case-by-case basis, as in some scenarios it is possible for a single criterion to entail a high risk and thus motivate a DPIA and, in other scenarios, two or more criteria may concur and yet it may be argued that a DPIA is not necessary for that specific situation.
The EDPB estimates that, in most cases, the concurrence of two criteria would motivate the necessity to carry out a DPIA.
When observing the remaining criteria, it is very possible that many blockchains applications, especially those of public permissionless nature, result in a high risk and require conducting a DPIA; most permissionless blockchains process data at a large scale or prevent data subjects from exercising a right.
When the assessment results on a “high risk to the rights and freedoms of natural persons” is one of the situations where you are required to conduct a DPIA, but is not the only one.
A data protection impact assessment is also mandatory when it is one of the processings of Article 35.3 or the processing has been included by the data protection supervisory authority in the list of processings that require a DPIA.