It's not just bad bots that need to be managed. A bot management strategy needs to avoid blocking the good bots while mitigating the bad ones.
After reading this article you will be able to:
Related Content
What is a bot?
What is bot traffic?
What is bot management?
Brute force attack
What is content scraping?
Subscribe to theNET, Cloudflare's monthly recap of the Internet's most popular insights!
Copy article link
A bot is a computer program that automates interactions with web properties over the Internet. A "good" bot is any bot that performs useful or helpful tasks that aren't detrimental to a user's experience on the Internet. Because good bots can share similar characteristics with malicious bots, the challenge is ensuring good bots aren’t blocked when putting together a bot management strategy.
There are many kinds of good bots, each designed for different tasks. Here are some examples:
Web properties need to make sure they aren't blocking these kinds of bots as they attempt to filter out malicious bot traffic. It's especially important that search engine web crawler bots don't get blocked, because without them a website can't show up in search results.
Bad bots can steal data, break into user accounts, submit junk data through online forms, and perform other malicious activities. Types of bad bots include credential stuffing bots, content scraping bots, spam bots, and click fraud bots.
Good bot management starts with properly setting up rules in a website's robots.txt file. A robots.txt file is a text file that lives on a web server and specifies the rules for any bots accessing the hosted website or application. These rules define which pages the bots can and can't crawl, which links they should and shouldn't follow, and other requirements for bot behavior.
Good bots will follow these rules. For instance, if a website owner doesn't want a certain page on their site to show up in Google search results, they can write a rule in the robots.txt file, and Google web crawler bots won't index that page. Although the robots.txt file cannot actually enforce these rules, good bots are programmed to look for the file and follow the rules before they do anything else.
Bad bots, however, will often either disregard the robots.txt file or will read it to learn what content a website is trying to keep off-limits from bots, then access that content. Thus, managing bots requires a more active approach than simply laying out the rules for bot behavior in the robots.txt file.
Think of an allowlist as being like the guest list for an event. If someone who isn't on the guest list tries to enter the event, security personnel will prevent them from entering. Anyone who's on the list can freely enter the event. Such an approach is necessary because uninvited guests may behave badly and ruin the party for everyone else.
For bot management, that's basically how allowlists work. An allowlist is a list of bots that are allowed to access a web property. Typically this works via something called the "user agent," the bot's IP address, or a combination of the two. A user agent is a string of text that identifies the type of user (or bot) to a web server.
By maintaining a list of allowed good bot user agents, such as those belonging to search engines, and then blocking any bots not on the list, a web server can ensure access for good bots.
Web servers can also have a blocklist of known bad bots.
A blocklist, in the context of networking, is a list of IP addresses, user agents, or other indicators of online identity that are not allowed to access a server, network, or web property. This is a slightly different approach than using an allowlist: a bot management strategy based around blocklisting will block those specific bots and allow all other bots through, while an allowlisting strategy only allows specified bots through and blocks all others.
It is possible for a bad bot to fake its user agent string so that it looks like a good bot, at least initially – just as a thief might use a fake ID card to pretend to be on the guest list and sneak into an event.
Therefore, allowlists of good bots have to be combined with other approaches to detect spoofing, such as behavioral analysis or machine learning. This helps proactively identify both bad bots and unknown good bots, in addition to simply allowing known good bots.
A bot manager product allows good bots to access a web property while blocking bad bots. Cloudflare Bot Management uses machine learning and behavioral analysis of traffic across their entire network to detect bad bots while automatically and continually allowlisting good bots. Similar functionality is available for smaller organizations with Super Bot Fight Mode, now included in Cloudflare Pro and Business plans.