Kelvin Kiogora
Updated on: June 5, 2025
In 2024, a team of researchers ran an online survey to collect opinions from the population. They do that all the time, it’s how they collect useful data, but this time 90% of the responses were garbage generated by bots, gamed by repeaters, or submitted just to collect a quick incentive.
And they’re not alone: studies indicate that up to 40% of survey responses can be fraudulent, particularly when incentives are involved.
Why should you care?
We all should, because this data issue has far-reaching implications also for fields like cybersecurity that rely on accurate survey data for assessing threats and developing strategies to keep our data safe.
I recently interviewed Stephanie Clapham, Director of Research at brand tracking firm Latana, who told me that one reason lies in how companies approach data sampling.
I decided to look deeper into this, reading all material I could find and talking to our network of cybersecurity experts.
Here is what I’ve found out about the biases in survey fraud, and how they can affect our online security.
Zoom Out: The Bigger Problem
We live in a world where big, costly decisions depend on good data.
It’s how brands decide where to spend their (your) millions.
How cybersecurity firms track threats.
How policymakers shape laws.
It’s all fueled by data, and this data is often measured with “fraud scores”.
Fraud scores help filter out fake or suspicious survey responses, and indeed they are pretty good at catching obvious signs, like someone answering too quickly or choosing the same option repeatedly, but oftentimes they mistakenly fail to catch more sophisticated fraudsters, allowing inaccurate data to slip through.
Sometimes they even block honest participants!
Let that sink in… The tools we’re using to verify people’s responses are excluding real, honest humans and letting fakes pass undetected.
This goes far beyond advertising, because the same survey technologies and approaches used to evaluate brand sentiment can be used by cybersecurity platforms to collect user behavior data, crowdsource threat intelligence, and shape how they defend us online.
When that data is broken, so is your firewall.
So what’s wrong exactly in survey data collection and sampling?
Deep Dive: How Survey Fraud Happens
Online surveys are supposed to collect honest answers from real people.
Real people are often busy, so you often need to incentivize them to take these surveys.
But when you offer money or anything else for a response, you create a magnet for fraudsters with their bots. Not just basic bots built by some nerd kid in his bedroom… These are AI-powered fraud bots that simulate human reading speeds, vary their answer patterns, and even write great open-ended responses.
The only way to combat fraud effectively is to tackle it from the very source, which tends to be incentivization.
According to Clapham no one’s tackling that head-on yet, and data seems to confirm this.
Just as an example, in a 2024 study from Frontiers in Research Metrics and Analytics, researchers found that usable responses dropped from 75% to 10% in just two years due to AI fraud and panel fatigue.
So what’s wrong with how companies are filtering data from surveys?
The Old Tools Don’t Work Anymore
Most survey platforms try to stop fraud using fraud scores: rules that flag suspicious behavior like answering too fast or clicking the same option repeatedly.
But these are just band-aids that often even backfire, flagging genuine people who just happen to be fast readers or decisive with their survey responses.
Meanwhile, the bots take note and learn how to mimic human behaviors that can get them good fraud scores.
And then there’s sampling…
Whether it’s quota sampling, river sampling, or otherwise, the core flaws are the same:
- Bias
- Limited reach
- Poor representativity
Surveys often rely on “panels” (groups of people who sign up to take surveys regularly).
But guess who signs up for group panels?
People who like taking surveys!
This is called”sampling bias” and it leads to misleading conclusions because the sample is not diversified enough, hence affecting decisions on how to protect our data, but also in many other sensitive fields like healthcare and social sciences.
Why This Matters for Cybersecurity
Let’s step back.
Cybersecurity firms, digital privacy advocates, and even governments are starting to crowdsource data more than ever. They want to know:
- How users behave online
- What threats people are seeing
- Which scams are spreading
They generally do this using a system called Opt-in telemetry.
It works like this: when you visit their apps, websites, or devices, they ask you for permission to collect information about how you use these platforms. When you click “yes”, they will start taking note of what features you click on, how long you stay on a page, what kind of device or browser you’re using, etc.
The second method is User-reported surveys, and it’s more straightforward: these are the feedback forms or polls you’re asked to fill out, like rating a product, reporting a bug, or answering questions about your online habits.
Both of these data sources—telemetry and surveys—help companies improve their services or products. But they’re being used more and more also to detect cybersecurity threats, understand user vulnerabilities, and design digital protections.
But if most of that data is fake or biased, these companies are feeding poisoned inputs into their threat detection systems.
What then, if cybersecurity companies stop trusting their own data? Surrender?
Maybe Marketers Have Found The Fix?
I work with data all day, and I can’t help but wonder if Latana’s unique approach to sampling, although for marketing, could possibly be a solution for security companies as well.
In short, they threw out panels entirely and started using ad-based sampling—sending surveys out into ad spaces across the internet, the same way you’d target someone with a product ad.
No sign-ups. No incentives.
Just real people encountering a survey in the wild and deciding to participate.
Then they use MRP (Multilevel Regression and Poststratification), a statistical model that adjusts for demographic and behavioral skews in the data.
That’s a mouthful, but here’s the punchline: they say it cuts error rates by up to 90% compared to traditional models.
So it works for tracking what people think about brands, but could it also be used by cybersecurity companies for collecting data that’s harder to fake and build more resilient, fraud-resistant data pipelines?
Takeaway: If You Don’t Trust the Data, You Can’t Trust the Defense
Bad data isn’t just a marketing nuisance. It’s a threat for all of us.
Just like bad intel in the military, bad data in cybersecurity can leave you exposed. It leads to bad decisions, wasted resources, and a false sense of security.
But it doesn’t have to stay that way.
If we can rebuild how we gather and analyze data, starting with honesty, diversity, and smart modeling, we don’t just fix marketing.
We strengthen the foundations of digital security itself.
Sources:
- Frontiers in Research Metrics and Analytics – “AI-powered fraud and the erosion of online survey integrity”
frontiersin.org - PeopleMetrics – “What AI Survey Fraud Actually Looks Like”
peoplemetrics.com - Greenbook – “The Rising Issue of Bad Data in Online Surveys”
greenbook.org