On 1 March 2017, the UK Information Commissioner’s Office (ICO) published a paper on big data, artificial intelligence, machine learning and data protection (replacing its early paper published in 2014). Although the paper is described as a “discussion paper”, it makes a number of recommendations that those involved in big data projects would be well advised to incorporate into their projects, and it firmly rejects suggestions that either the existing data protection framework or the GDPR cannot be applied in this context.
The paper works through the implications of big data against the core data protection principles; it then discusses compliance tools that can be used to meet those implications (including a useful analysis of how its current Privacy Impact Assessment Code of Practice is still fit for purpose under the GDPR and for big data projects). It concludes with six key recommendations.
The ICO’s Six Recommendations
The ICO’s six recommendations are that organisations should:
- Carefully considerwhether the big data analytics to be undertaken actually require the processing of personal data. Often, this will not be the case; in such circumstances organisations should use appropriate techniques to anonymise the personal data in their dataset(s) before analysis;
- Be transparent about their processing of personal data by using a combination of innovative approachesin order to provide meaningful privacy notices at appropriate stages throughout a big data project. This may include the use of icons, just-in-time notifications and layered privacy notices;
- Embed a privacy impact assessment framework into their big data processing activities to help identify privacy risks and assess the necessity and proportionality of a given project. The privacy impact assessment should involve input from all relevant parties including data analysts, compliance officers, board members and the public;
- Adopt a privacy by design approachin the development and application of their big data analytics. This should include implementing technical and organisational measures to address matters including data security, data minimisation and data segregation;
- Develop ethical principles to help reinforce key data protection principles. Employees in smaller organisations should use these principles as a reference point when working on big data projects. Larger organisations should create ethics boards to help scrutinise projects and assess complex issues arising from big data analytics; and
- Implement innovative techniques to develop auditable machine learning algorithms. Internal and external audits should be undertaken with a view to explaining the rationale behind algorithmic decisions and checking for bias, discrimination and errors.
Other Interesting Points Raised in the ICO’s Paper
Other specific interesting points made by the ICO in their paper are set out below.
- Challenges of giving notices / obtaining consent: The paper includes a discussion of the difficulties in drafting privacy impact notices or obtaining consent to cover the “thinking with data” stage where the project has not fully determined the purpose for which correlations will be used (which therefore makes a specific use notice or consent impossible). The paper envisages updating notices as the project progresses, suggesting that anonymised data might be used at the “thinking with data” / discovery stage and that this would be substituted with personalised data at the full scale point if necessary and once the aims of the analysis are better understood. The organisation would comply with notice and grounds requirements at that point, when it is in a better position to describe the envisaged information flows, the purposes for processing and, where necessary, its legitimate interests.
- Indirect consequences of anonymised analytics: The paper notes the need to take into account the indirect effects on individuals when using correlations (even where the analysis data is anonymised and aggregated). For example, a data set is used to learn about a particular community; the inferences made about the members of the community are then applied to the individuals in the community whose data was originally used to learn about the community. In this situation, although the controller did not use a specific individual’s details to formulate its conclusions, the individual’s anonymised data has allowed the controller to assume something about him or her which may be applied against him/her in the future (albeit indirectly). This complex dilemma prompts the requirement for the use of privacy impact assessments or PIAs (particularly stakeholder consultation) to flush out what the indirect effects might be in any given scenario, and an ethical framework or board to decide how the inferences can be used fairly. There is also emphasis on the backlash organisations will face from their citizens, employees or customers if they are not benign and not transparent about what they are doing.
- Explaining the logic of profiling and wholly automated decision making: The paper highlights the GDPR provisions on profiling and wholly automated decision making and the need to make sure such activities are not used in a discriminatory manner and the logic can be articulated.
- Ethical frameworks: A key area of discussion in this area relates to the importance of ethical approaches and the development of big data ethical principles, which are addressed both to employees within organisations and are used as a means of reassuring customers and building a relationship of trust with them. These principles range from litmus tests such as “if the details of the processing where made public, would it strengthen or threaten customer relationships” to fuller sets of principles and councils and boards of ethics.
- Algorithmic transparency: Another key area relates to algorithmic transparency (echoing the GDPR’s requirement to articulate the logic of any profiling). It is particularly important for organisations to understand algorithms developed through artificial intelligence and machine learning and to be able to describe how they work in intelligible terms and then to audit that they are indeed functioning as expected. Here, visualisation features such as bookmark interrelation charts and Venn diagrams, coupled with features such as adjustable sliders to show data subjects the impact of different factors, are cited by the ICO as ways of explaining the logic. There is also discussion of the use of Natural Language Generation where the software itself generates output text that explains why or how a decision was made.
- Opt out rights, where legitimate interest is relied on: As much big data work will be based on legitimate interests grounds, the ICO quotes the European Data Protection Supervisor that giving individuals opt out rights should be considered in projects with any potential prejudice.
- Conducting analytics in parallel with main purpose: Although much personal data in big data projects is dual purpose (e.g., required to perform the contract and also being used for analytics purposes), happily, the paper is not prescriptive about using the data for the analytics purpose whilst it is being retained for the contracts performance purpose. The paper’s anonymisation examples suggest that such use would be possible for uses compatible with the primary purpose, provided the other principles are also complied with.
- Analytics activities impacting an organisation’s status of processor or controller: Finally, the paper reiterates the ability of data protection authorities to find organisations that have classified themselves as data processors to be controllers or joint controllers with their clients if the organisation has enough freedom to use its expertise to decide what data to collect and how to apply its analytics techniques.
This discussion paper pre-empts both the ICO’s own GDPR guidance on profiling and the Article 29 Working Party’s guidance on consent and profiling, both of which are expected in the first half of 2017. The concepts, particularly around the development of ethical principles and algorithmic transparency, are likely to feature in those guidance documents, too. All organisations undertaking analytics that do not equate to big data projects would also be well advised to read this paper to understand the detailed expectations of regulators as to what will or will not be acceptable in relation analytics generally.