Privacy and Algorithmic Accountability
Big data analytics presents threats to privacy and related values like fairness. Our position is that accountability is a key component of the solution space. Over the last decade, my research group has developed foundations and tools for protecting privacy via accountability. and extensive case studies to validate them.
In this talk, I will focus on the theory and practice of algorithmic accountability and comment on connections with formal methods. Our work is driven by the following question: When algorithmic systems, based on machine learning and related statistical methods, drive decision-making, how can we detect violations, explain decisions, hold entities in the decision-making chain accountable, and institute corrective measures? I will present two recent results in this space.
First, I will describe our work on detection of violations. We have developed the first statistically rigorous methodology for information flow experiments (IFE) to discover personal data use by black-box Web services. Our AdFisher tool implements an augmented version of this methodology to enable discovery of causal effects at scale. Its application resulted in the first study to demonstrate statistically significant evidence of discrimination in online behavioral advertising, more specifically, gender-based discrimination in the targeting of job-related ads. This methodology and class of tools can be used to provide external oversight of big data systems by researchers, regulatory agencies, investigative journalists, and civil liberties groups.
Second, I will describe our work on algorithmic transparency aimed at explaining decisions by big data systems with machine learning components. We develop a suite of quantitative input influence (QII) measures that quantify the causal influence of features (e.g., gender, age) on decisions made by a big data system. The QII measures form the basis of transparency reports that explain decisions about individuals (e.g., identifying features that were influential in a specific credit or insurance decision) and groups (e.g., identifying features influential in disparate impact based on gender). The associated methodology can be used to drive design of transparency mechanisms as well as internal testing and audit of big data systems.