Part 3 - Ensuring there is no Discrimination in the Data and machine learning models

    Jul 10, 2017 10:55:00 AM by Brendan Tierney

    This is the third part of series of blog posts on 'How the EU GDPR will affect the use of Machine Learning'

    The new EU GDPR has some new requirements that will affect what data can be used to ensure there is no discrimination. Additionally, the machine learning models needs to ensure that there is no discrimination with the predictions it will make. There is an underlying assumption that the organisation has the right to use the data about individuals and that this data has been legitimately obtained. The following outlines the areas relating to discrimination:

    • Discrimination based on unfair treatment of an individual based on using certain variables that may be inherently discriminatory. For example, race, gender, etc., and any decisions based on machine learning methods or not, that are based on an individual being part of one or more of these variables. This is particularly challenging for data scientists and it can limit some of the data points that can be included in their data sets.
    • All data mining models need to tested to ensure that there is no discrimination built into them. Although the data scientist has removed any obvious variables that may cause discrimination, the machine learning models may have been able to discover some bias or discrimination based on the patterns it has discovered in the data.
    • In the text preceding the EU GDPR (paragraph 71), details the requirements for data controllers to “implement appropriate technical and organizational measures” that “prevent, inter alia, discriminatory effects” based on sensitive data. Paragraph 71 and Article 22 paragraph 4 addresses discrimination based on profiling (using machine learning and other methods) that uses sensitive data. Care is needed to remove any associated correlated data.
    • If one group of people are under represented in a training data set then, depending on the type of prediction being used, may unknowingly discriminate this group when it comes to making a prediction. The training data sets will need to be carefully partitioned and separate machine learning models built on each partition to ensure that such discrimination does not occur.


    In the next blog post I will look at addressing the issues relating to Article 22 on the right to an explanation on outcomes automated individual decision-making, including profiling using machine learning and other methods.


    Click back to 'How the EU GDPR will affect the use of Machine Learning - Part 1' for links to all the blog posts in this series.


    Tags: Oracle GDPR

    Brendan Tierney

    Written by Brendan Tierney

    Brendan Tierney, Oracle ACE Director, is an independent consultant and lectures on Data Mining and Advanced Databases in the Dublin Institute of Technology in Ireland. He has 22+ years of extensive experience working in the areas of Data Mining, Data Warehousing, Data Architecture and Database Design. Brendan has worked on projects in Ireland, UK, Belgium and USA. Brendan is the editor of the UKOUG Oracle Scene magazine and deputy chair of the OUG Ireland BI SIG. Brendan is a regular speaker at conferences across Europe and the USA and has written technical articles for OTN, Oracle Scene, IOUG SELECT Journal and ODTUG Technical Journal. Brendan has published the following books with Oracle Press book Predictive Analytics using Oracle Data Miner Oracle R Enterprise SQL & PL/SQL from the Experts These books are available on Amazon.