Platforms are blocking independent researchers from investigating deepfakes
Sora and other platforms are proliferating content at an unprecedented speed, but researchers are unable to investigate how the content is reaching or impacting people.
Hello from Lisbon, where I’m representing Hard Reset at the Web Summit this year. I have so many thoughts marinating about everyone’s blind sprint to adopt AI as well as the friction between human culture and AI, but I’m still here and processing, so I’ll fill you in with a full dispatch later.
I just had a conversation with Brandi Geurkink, executive director at the Coalition for Independent Tech Research. They work with independent researchers across academia, journalism, and civil society to challenge the tech industry’s resistance to research in the public interest.
As we know, reality online—and subsequently offline—is becoming murkier by the second. Trust and safety teams and human content moderators have been gutted or paid pennies, and regulators in America have been unable to bring forward meaningful policy change around privacy, the data accumulation schemes to train AI models, and the proliferation of unsafe content.
And now the floodgates are open. With the recent entry of OpenAI’s video generation tool Sora, many versions of visual and audio reality are possible, and chaos may reign. For instance, a few weeks ago, a conservative preacher criticizing billionaires went viral on TikTok, only to then be identified as a Sora deepfake. The latest iteration of Sora (Sora 2) is only available in a few countries, and OpenAI claims they are cracking down on deepfakes following concerns raised by actor Bryan Cranston. But that’s of little solace to journalist Chris Stokel-Walker, who has been teaching people for years how to spot a fake, and says that Sora 2 is a “godsend” for people who want to muddy what is true on the web.
One of many things we don’t understand about this content is exactly how it’s being allowed to show up on people’s feeds or take space in their minds. In the past, independent researchers had some access to understand the ripple effects: for instance, one study found that 50% of internet traffic comes from non-human sources, meaning that what we think of as true and human online is increasingly being challenged.
But today, platforms are limiting independent researchers from understanding what might happen if and when, for example, someone sees an AI video about criminals supporting New York City mayor-elect Zohran Mamdani. Did viewers know that it was AI? If they did, how influenced were they by AI content as opposed to other video campaigns? How did the speed at which Andrew Cuomo’s campaign crafted that video lead to its potency, virality, or timeliness?
Hard Reset talked to Brandi about this gaping hole in the research ecosystem: why and how platforms are opposing researchers through litigation, how regulation needs to allow them back in to investigate, and what we can learn from their insights about why content spreads and what psychological impact that content may have on users.
Ariella Steinhorn: Who is behind the development of these deepfakes? What are their north stars, their end game? Destruction of democracy, chaos?
Brandi Geurkink: Honestly, I don’t know. But the real question is, how is that content spreading? How does it reach people? It is valuable to focus on the source, but also we are going to be flooded.
Understanding deepfakes in a vacuum is not the hardest question. The hardest question is, what are the impacts on our information ecosystems? We need to start from the very social media platforms that we’ve been trying hard to study and have been blocked.
AS: Right. I think about why and how we’re being flooded with content and who’s seeing it–and I recall this anecdote from your Coalition’s report:
“Take, for example, the well-documented finding that outrage-inducing content gets more likes. ‘The question always remains whether that is because that’s people’s psychology, and they just like this content more, or whether it is somehow being pushed by the platforms,’ Lorenz-Spreen says. ‘That kind of data is what we really need if we want to make serious progress in our field.’”
BG: Exactly. I think the conversation right now needs to be about deepfakes: what their widespread presence will mean, and what they will do to society. In my mind, it’s the latest in any conversation about how content moves on the internet.
What will the impacts be, and what are they now? What are the ways in which deepfakes are being used to push scams or misinformation? The ability to detect and understand this are necessary to build mitigations. And all of this relies on researchers being able to study the broader ecosystem.
Deepfakes are proliferating on the same platforms we’ve been trying to study. And we need to understand the forensics–where they’re going and how they’re disappearing. We can’t do that if we don’t understand the platforms. But we are encountering the same problem over and over: platforms are making it harder than ever to study them. This makes it harder for researchers to ask the questions that they need to ask.
At least coming from the European side, they now have a legal obligation to do so. There was an announcement the other week out of the European Union, that Meta and TikTok are in violation of the Digital Services Act because they’re not providing researchers with access to data. (Editor’s note: If the alleged violations are found to be valid, the Commission has the power to issue a fine of up to 6% of total annual turnover from these companies.)
They understand that ultimately, to really be able to understand this problem, we need independent researchers to probe and investigate.
AS: Were the barriers to researching these platforms ever less impenetrable than they are now?
BG: There were times when it was easier to do this kind of research. It was easier to have access to people who held power at these social media companies. But I don’t believe that the companies ever saw the researchers as genuine partners in their missions.
Companies ultimately had a realization that research poses a risk, that it could lead to bad publicity. That’s because actual regulatory enforcement or litigation has increased as the field of research has developed. Different countries are picking up and implementing serious regulation, so that realization is what’s led to these crackdowns.
But it’s always been on the company’s terms. For example, in 2019, Meta announced a big partnership called Social Science One, which promised access for academic researchers to study the impact of social media on democracy and elections. The process dragged on, and eventually Facebook reneged. Voluntary collaborations have left researchers disappointed, because platforms have not delivered.
Take something like Sora, for example, where the potential for problematic deepfakes has become a recognized problem. The volume of these videos needs preparedness, as there is the potential for information ecosystems to be flooded. But now accessibility is much more limited, and it’s become much harder for people doing public interest research to have a window into it.
AS: How exactly are companies cracking down on researchers?
BG: We’ve seen a few different strategies. There is quite a bit of case law around the Computer Fraud and Abuse Act (CFAA) in the US, a law about hacking with a long history to it. We see cases go that route, where there have been federal charges against researchers, or the use of tort/breach claims, and breach of contract claims, because almost all of the social media platforms have in their terms of service some kind of provision that you can’t scrape data.
But that’s how all the most relevant research is done, and almost all the platforms forbid it in their terms of service. More than the lawsuits and cease-and-desists, researchers can get their access to platforms revoked entirely. Platforms will change their interfaces so that it breaks your scrapers, which is hard for you to do your research.
That’s the weird thing about this moment. Interestingly though, the significant motivating force behind breaking scrapers doesn’t have to do with public interest research—it’s the blocking of generative AI from scraping data to train large language models. And now public interest researchers have become collateral damage within this economic moment.
This kind of research methodology all relies on an underlying access to information. Recognition of the right that researchers have is a necessity to society, but there is a concerted opposing force.
AS: What is the biggest risk you foresee with the proliferation of deepfakes, and how platforms are handling them?
BG: There’s a huge problem with the platform’s deepfake detection methods, and what it looks like to apply those. Companies can look for watermarks and signatures within the metadata.
It’s one thing for the company themselves, but a different thing for that content to spread across the internet in a way that might limit the ability for those signals to be present.
For instance, after the Christchurch terrorist attack in New Zealand, certain videos were able to bypass detection systems that platforms put in place. The platforms had said, we will not allow the spread of these videos on our platforms. But it happened anyway, and it’s going to continue to happen.
Research has demonstrated that the internal systems of companies will never be 100% –and that’s why we need researchers to detect and prevent these things from spiraling. The platforms themselves are never going to say we’re falling short. We need independent people who are well-equipped to do that.
AS: What are the ways in which we can actually make things safer?
BG: Regulation needs to change access, as it enables the broader research community to have insight into the harms. The baseline for any technology is to be realigned with the public interest to have a check on the power. This provides a pathway towards accountability.
Let’s workshop the example of a campaign video being used politically. Who is actually seeing this video, and how are they seeing it? Is access to the video a result of recommendations to someone’s feed, or is it shared widely in interest groups that they are a part of? What is the method by which people are coming across this content? Who are they sharing it with, and what is the impact on these people?
And longer-term, what are the impacts of people seeing this content? Do they believe it to be true, and does it change their opinions? If we don’t have that base level of information, we’re going to have bad laws that either ban free expression like satire and comedy, or they don’t do anything at all and just hope that people understand what the content is.
The baseline is that this research should be independent and rooted in the public interest. This is essential to allow people to have better experiences. We can then ask the questions that matter, which is what internet users care about.
If we leave it up to the companies, they’re going to be motivated by profit or by competitive advantage. Political leaders need to be able to recognize the corporate conditions that enable it.
Thanks for reading, stay tuned for the Lisbon dispatch!


