The European Patent Office refused to grant a software patent on a method of de-identifying data for privacy reasons. Here are the practical takeaways of the decision T 1248/12 (Privacy preserving data mining/CROSSIX) of 12.3.2019 of Technical Board of Appeal 3.5.01:
This European patent application concerns data privacy in a database system.
The processing of privacy sensitive data, e.g. medical records, is subject to legal restrictions. For example, the Health Insurance Portability and Accountability Act 1996 (HIPAA) in the USA prevents health care providers from sharing individually identifiable health information with third parties, such as researchers or pharmaceutical companies. Similar data privacy laws exist in Europe, and in other jurisdictions.
However, it is possible to share de-identified data that does not identify or provide information that could identify an individual. But de-identification is a lossy process in that information is removed. Therefore, it might not be possible to extract certain information from the de-identified data, even if this information does not breach individual privacy. The invention seeks to overcome this problem.
Here is how the invention is defined in claim 1 (main request):
Claim 1 (main request)A Privacy Preserving Data-Mining Protocol, operating between a secure “aggregator” data processor and at least one of “source-entity” data processor, wherein the “aggregator” and the “source-entity” processors are interconnected via an electronic data-communications topology, and the protocol includes the steps of:
A) on the side of the “aggregator” processor:
(i) from a user interface–accepting a query against a plurality of the predetermined attributes and therewith forming a parameter list,
(ii) via the topology–transmitting the parameter list to each of the “source-entity” processors,
(iii) via the topology–receiving a respective file from each of the “source-entity” processors,
(iv) aggregating the plurality of files into a data-warehouse,
(iv[sic]) using the parameter list, extracting query relevant data from the data-warehouse,
(vi) agglomerating the extract, and
(vii) to a user interface–reporting the agglomerated extract; and
B) on the side of each processor of the at least one “source-entity” processors:
(i) accumulating data-items wherein some of the data-items have privacy sensitive micro-data,
(ii) organizing the data-items using the plurality of predetermined attributes,
(iii) via the topology–receiving a parameter list from the “aggregator” processor,
(iv) forming a file by “crunching together” the data-items according to the parameter list,
(v) filtering out portions of the file which characterize details particular to less than a predetermined quantity of micro-data-specific data-items, and
(vi) via the topology–transmitting the file to the “aggregator” processor.
Is it technical?
Claim 1 defines a data mining protocol that operates between an “aggregator” and a number of “source-entities”. The “source entities” correspond to health care providers. The “aggregator” is a trusted, central processor.
According to the appellant, the claimed data-mining protocol is as follows: A user, for example a researcher who wants to get information about a group of people, inputs a query including, for example, the names or IDs of the people in the group. The query (or “parameter list”) is sent via the aggregator to the source entities that store the data. The source entities collect the relevant data into files (the data items are “crunched together”), they de-identify the data to a certain extent, for example by removing addresses, and send the files to the aggregator that aggregates them into a data warehouse. The aggregation further protects privacy by de-identifying the source-entities. The aggregator also extracts query-relevant data from the data warehouse, and presents a condensed (“agglomerated”) extract to the user.
The board of appeal did not share the appellants view that de-identifying data is technical. Interestingly, the board made a distinction between data privacy, which was considered non-technical, and data security:
The Board shares the examining division’s view that de-identifying data, by removing individually identifiable information, and by aggregating data from a plurality of sources, is not technical. It aims to protect data privacy, which is not a technical problem. The problem of data privacy is not synonymous with data security. Data privacy concerns what information to share and not to share (and making sure that only the information that is to be shared is shared), whereas data security is about how to prevent unauthorised access to information.
It is established case law that non-technical features cannot contribute to inventive step. Therefore, non-technical features may legitimately be part of the problem to be solved (T 641/00 – Two identities/COMVIK), for example in the form of a requirement specification given to the skilled person to implement.
The board considered a generic data processing system to be the closest prior art, which includes at least a database system corresponding to the source-entities in claim 1. Regarding the distinguishing features of claim 1, the board took the following point of view:
[T]he steps of de-identifying the data at the source and aggregating the results from a plurality of sources is part of the non-technical requirement specification to be implemented. So is the presentation of the result in a condensed form.
The skilled person having been given the task of implementing the requirement specification would provide an “aggregator processor”, because that is what the requirement specification (“aggregate the results”) is telling him to do. The aggregator processor and the database system (source-entities) need to communicate with each other: the source entities need to obtain the query and the aggregator processor needs to obtain the query results. The skilled person would find suitable formats for this. The Board notes that the claims do not specify any particular format beyond the use of a “list” and files. The processing performed by the source-entities (de-identifying) and aggregator (aggregating, extracting and agglomerating), and the presentation of the results to the user, does not go beyond what the requirement specification dictates.
Thus, the skilled person would have arrived at the subject-matter of claim 1 without inventive effort.
For these reasons, the board concluded that claim 1 lacks an inventive step.
You can read the whole decision here: T 1248/12 (Privacy preserving data mining/CROSSIX) of 12.3.2019