Data-centric research is all about trying to obtain useful insights on products that its researchers trying to create/design by collecting usage data on their prototypes. This type of research, usually performed by data-scientist, make use of machine learning techniques in order to improve any system or product that makes use of automated decisions. Now that other researchers, in this case researchers from design studies, would like to perform data-centric research. Therefore, they must be made aware of the issues that could be introduced by using machine learning to automate decisions in electronic devices. To address the issues around bias, we thought that it would be good to make them aware through some type of system where they should be made aware of bias. So we proposed that there was a need for a system where the user could explore bias present in the dataset or introduced by machine learning practices. To be able to find biases we had to decide which bias toolkit which led to use of the AIF360 toolkit that we eventually used to create FairData. FairData lets the user explore bias in datasets that include machine learning predictions by going through and iterative process consisting of a exploration stage and a result stage. The exploration part contains the parts for analysing the datasets by going through the attributes and making small modifications to the dataset. At the results page of this system it will show a change in bias represented by multiple metrics to support the users in becoming aware of bias. The experimentation part of this thesis showed that our system is able to inform two different groups of participants (compute science & industrial design) about the bias present in the dataset. The experiment made it clear that even though participants have different behaviour when analysing/exploring the dataset, they all tend to become aware of the bias by looking at the different metrics provided by FairData. It also seems that it is possible to support users in trying to reduce this bias as they can see the effects of the modifications done to the dataset.

Master Thesis: TU Delft repository