Share this article:
On the Internet, Nobody Knows You’re a Bot Participant
When it comes to conducting online user studies, bots have become that one jerk that ruins it for everyone! The Internet allows researchers to conduct user studies online 24-hours a day with nearly anyone in the world but bots are increasingly “participating” in these studies and corrupting the data. Here are some tips for how researchers can distinguish human from bot and how to stop the bad guys from contaminating everyone’s data.
The Good, The Bad, and The Ugly
Bots are all the rage. These little software applications can automatically complete online tasks on the creator’s behalf. They can be used for good like answering simple questions (known as “chatbots”), to block abusive Twitter accounts (e.g., Block Together), or just for fun (e.g., Mitsuku). They can also be used for evil such as overwhelming, confusing, or silencing online discussions that a bot’s creator wants to stop.
This post is focused on the ugly side of bots — those that harm online research studies like surveys and unmoderated usability testing. The beauty of an unmoderated study is that you can collect data from many people at once in a cost efficient manner. Unfortunately, this is also a perfect opportunity for bots to sneak in.
Online panels + bots
Most web surveys and unmoderated online studies rely on panels of volunteers to participate. People join these panels as a quick and easy way to make money. Unfortunately, these non-probability based panels are not representative of the broader population (Callegaro, 2014) and often filled with “speeders” (people who speed through as fast as they can with as little effort as possible) and “cheaters” (those that randomly select answers). If you don’t know what you’re doing, you can end up with some pretty bad data (“Garbage in, Garbage Out”). However, if you screen well and meticulously clean your data, you can be reasonably comfortable with the data you collect using online panels.
Add bots into online panels and your data have gone from bad to worse! “Survey bots” are an automated way of randomly answering questions in a web survey to earn a quick buck. They have been around for awhile but are now showing up in other types of unmoderated user studies. In a recent unmoderated experiment Alyssa Vincent-Hill and I conducted, we were shocked by the number of bots in our data set. For example, on one page that required no clicks, Alyssa observed some “participants” registered 20 clicks and two had over 100 clicks! We had to repeatedly ask the vendor we worked with to collect more data to replace the bots in the study and with each new batch, at least one bot was present.
How to get the best data possible
There are several established best practices you should be using to protect your online research from bot and bad human alike. We list these below along with a couple more we have identified. Some are easy to implement while others are more difficult or may impact how your study is conducted.
Stop the bad guys from getting in
Your best defense is a good offense — you want to stop the speeders, cheaters, and bots from ever touching your study.
Use a probability-based panel: A probability-based sample (i.e., everyone in the desired population has an equal chance of being selected) is significantly more expensive and harder to recruit if your population is difficult to reach; however, the quality will be higher and there will be less chance of speeders, cheaters, and bots since these individuals are recruited to join the panel for your study.
CAPTCHA is your friend: Yeah, we hate CAPTCHAs too but they were designed to stop the bots and only let the humans in. Add one to the beginning of your study and that should stop the bulk of the bots.
One response per IP address: Block multiple responses from the same IP address. There are ways the bad guys can get around this of course (e.g., refresh their IP address), but it is an easy fix. It is important to remember that by recording IP addresses, the responses are not anonymous and you need to notify respondents that you are collecting their IP address. If you are collecting data that is sensitive or contains other personally identifying information (PII), you will want your participants to be anonymous.
Password protection: Using client or server-side password protection prevents multiple completions from the same respondent, regardless of IP address. It won’t stop a human from speeding, cheating, or applying a bot but they’ll only be able to do it once. The responses may be confidential (i.e., name or other PII are not directly tied to the data) but since you can match responses back to a specific password, it is not anonymous so inform participants.
Throw out the speeders, cheaters, and bots that got in
Nothing is foolproof so if any of the bad guys got in, you have a second chance to get them out before you ever see their data.
Speed traps: Set a lower bounds for how long it would take a reasonable human being to read and complete each question or task. Anyone who goes below that threshold for a few questions should receive a warning dialog (e.g., “You are completing this study too quickly. Please take your time or the session will be closed.”) Alternatively, you can just end the study with no warning at that point. The benefit of the former approach is that, if it is a real human being, you are giving him/her a chance to give good quality data rather than just throwing that participant away and having to recruit another. However, a genuine speeder or cheater can choose to progress more slowly without actually putting in more thought and an actual bot will continue unabated.
Test questions: If your survey or study is more than a couple pages or a few minutes long, you’ll want to add in at least one test question that, if they fail, will end the study. For example, “What is the first letter of this paragraph?” The closed-ended responses might be “A,” “C,” “T,” “X,” “Z.” It’s an easy question for a human but a bot has a 20% chance of randomly clicking the right answer. A speeder or a cheater has a better chance of getting it right but you still may catch him/her.
Accuracy checking: If your study involves asking questions that have clear right and wrong answers, your survey or testing platform should have a way of ending the study when too many of the questions are answered incorrectly. The risk is that you might be throwing out genuinely valid data (e.g., your tasks are too hard for the majority of people to complete successfully). If you do this, you need to ensure you have done a pilot study and are confident what a reasonable error rate is for honest participants. The alternative is to keep these participants but flag the data for scrutiny later.
Clean the data
There is still a chance that you have some speeders, cheaters, and bots in your data set so take this last opportunity to get rid of any dirty data.
Accuracy checking: If you have a set of questions or tasks with clear right and wrong answers, identify all of the participants with a higher than average error rate. Does it seem random or is there a pattern (e.g., everyone got the same questions wrong)? Throw out those that seem random or you don’t have another reasonable explanation for.
Straight-lining: Look for participants that always choose the same answer in closed-ended questions (usually the first option). If your questions have an actual right or wrong answer, luck may be on their side and they don’t stand out as having a high error rate but the straight-lining should catch them.
Speeders and slow pokes: If you didn’t stop speeders in their tracks during the study, throw them out now. With a large sample size, you should be able to identify those that are a couple standard deviations slower or faster than everyone else. You may not care if someone went to a meeting or made a sandwich in the middle of taking your survey but if you want to make sure that the participant completed your study in one sitting, throw out the slow pokes that started watching cat videos instead of paying attention.
Look for patterns in the drop out or quality rate: Your study or survey may be so onerous that even the best intentioned human beings begin satisficing, getting questions wrong, or quitting altogether and it’s only the bots that make it to the end. Hopefully you did a pilot study with representative participants prior to launching the full study. If you didn’t (or even if you did but you see patterns you can’t explain), conduct your study in-person and see what happens when participants hit those trouble spots. The problem may be your instrument, not the participants!
Document who you removed and why: It is always good research practice to document your cleaning rules, who you removed from the final data set, and why. If anyone wants to replicate your study or calls into question the validity of the data, you will need this information. You will also need to know the demographics of who you removed to see if your sample is now skewed. If it is, don’t simply open the study back up for anyone to complete — recruit participants that match those demographics so your sample remains balanced.
About the Book
Understanding Your Users: A Practical Guide to User Research Methods, 2nd Edition is a comprehensive, easy-to-read, “how-to” guide on user research methods. You’ll learn about many distinct user research methods and also pre- and post-method considerations such as recruiting, facilitating activities or moderating, negotiating with product developments teams/customers, and getting your results incorporated into the product. For each method, you’ll understand how to prepare for and conduct the activity, as well as analyze and present the data – all in a practical and hands-on way.
Visit elsevier.com and use discount code STC317 at checkout and save up to 30% on your very own copy!
About the Authors
Kathy Baxter is a Principal User Researcher at Salesforce in San Francisco. Previously, she worked at Google for over 10 years as a Staff User Experience Researcher & UX Infrastructure Manager. Prior to 2005, she worked as a Sr. UER at eBay and Oracle. Kathy is active in the UX community, volunteering on the EPIC and CHI conference committees, as well as teaching courses and mentoring young girls in STEAM careers. She received her MS in Engineering Psychology and a BS degree in Applied Psychology from the Georgia Institute of Technology. The second edition of the book she coauthored, Understanding Your Users, was published in May 2015 & was the #1 New Release in HCI & Software & Product Design on Amazon the first several weeks it was on sale. https://www.linkedin.com/in/kathykbaxter
Catherine Courage’s passion is transforming corporate culture by making customer-focus a driver of innovation and change. She leads the DocuSign Customer Experience team where her group’s mission is to create world-class products and services for customers, partners and employees. Catherine co-authored Understanding Your Users and is an active writer and speaker on creativity, innovation and design. She has been featured in Harvard Business Review, The Wall Street Journal, Fast Company, Huffington Post, and TEDx. She has twice been selected by the Silicon Valley Business Journal – in 2011 as one of Silicon Valley’s tech leaders, and in 2013 as one of Silicon Valley’s 100 Most Influential Women. Also in 2013, Catherine made Forbes list of “Top 10 Rising Stars at The World’s Most Innovative Companies.” In 2014, the National Diversity Council named her one of the Top 50 Most Powerful Women in Technology. https://www.linkedin.com/in/catherinecourage, @ccourage
Kelly Caine is a researcher and professor working at the intersection of people and technology. She directs the Humans and Technology Lab at Clemson University where she and her students advocate for users and create easy to use, useful technology that meets people’s needs. She co-authored Understanding Your Users, has published dozens of peer-reviewed papers and is regularly cited by media such as the AP, Washington Post, NPR, and New York Times, making her a sought-after speaker, thinker and writer on understanding people and their relationship to technology. @kellycaine
This article was originally published on the Salesforce-ux website and written in collaboration with Alyssa Vincent-Hill. Thank you Ian Schoen, raymonst, Emily Witt, and Jenny Williams for all of your feedback! Read the original article here.
Computing functionality is ubiquitous. Today this logic is built into almost any machine you can think of, from home electronics and appliances to motor vehicles, and it governs the infrastructures we depend on daily — telecommunication, public utilities, transportation. Maintaining it all and driving it forward are professionals and researchers in computer science, across disciplines including:
- Computer Architecture and Computer Organization and Design
- Data Management, Big Data, Data Warehousing, Data Mining, and Business Intelligence (BI)
- Human Computer Interaction (HCI), User Experience (UX), User Interface (UI), Interaction Design and Usability
- Artificial intelligence (AI)