Shining a Light on “Shadow Prompting”

Eryk Salvaggio is a Research Advisor for Emerging Technology at the Siegel Family Endowment. Shutterstock When you type a prompt into an AI image model, you might expect that what you type has a meaningful impact on what you get back. But we often fail to see the extent to which requests are modified on the back end by the designers of these systems. OpenAI’s DALL-E 3, announced this month, connects the image generation model to its Large Language Model, GPT-4. Rather than expanding capabilities through ever-expanding datasets, this linkage points to a new strategy for expanding capabilities: combining existing models. This is OpenAI’s experiment in deploying one system to refine and constrain what the other system can produce.  But what is touted as a tool for safety and convenience of users has a dark side. OpenAI has acknowledged that what you prompt is only taken as a suggestion: your words are altered before they reach the model, with opaque editorial decisions employed to filter out problematic requests and obscure the model’s inherent biases.  The practice should raise concerns for anyone concerned with transparency or user agency. We need openness about these decisions and how they are deployed — especially as AI is integrated more deeply into our newsrooms, social media, and other media infrastructure. It will also create friction for independent researchers seeking to understand OpenAI’s design choices, model biases, and vulnerabilities.  Shadow Prompting On the surface, it makes sense to connect a powerful Large Language Model to an…Shining a Light on “Shadow Prompting”