Remote Query Execution: A Powerful Way to do Privacy-Protecting Research on Platform Data

Jonathan Stray, Senior Scientist at The Center for Human-Compatible Artificial Intelligence (CHAI), Berkeley and Brandie Nonnecke, founding director of the CITRIS Policy Lab at UC Berkeley, provide comment to the European Commission on researcher access to platform data under the Digital Services Act. Shutterstock The recently passed EU Digital Services Act (DSA) includes a provision for external researchers to request access to internal platform data, for the purpose of evaluating certain systemic risks of very large online platforms (including illegal content, threats to elections, effects on mental health, and more). The Act says that user privacy, trade secrets, and data security must be respected, but it doesn’t say how. The European Commission invited public comment to determine how best to administer researchers’ access. This comment builds upon our UC Berkeley submission, further detailing an approach to enable researcher data access which is simple and powerful, yet protects the rights of users and platforms. It is based on a straightforward idea: send the researcher’s analysis code to the platform data, rather than sending platform data to researchers. The process would work like this: Platforms publish synthetic data sets — fake data with the same format as the real data Researchers develop their query and analysis code on this synthetic data, then submit their code to the platform for execution The query can perform arbitrarily complex analysis but returns only aggregated results to the researcher. There is no standard name for this data access strategy, even though it has been used in many contexts. In…Remote Query Execution: A Powerful Way to do Privacy-Protecting Research on Platform Data