Bot management involves identifying and blocking some bots from a website or application, while still allowing access to other bots.
After reading this article you will be able to:
Copy article link
Bot management refers to blocking undesired or malicious Internet bot traffic while still allowing useful bots to access web properties. Bot management accomplishes this by detecting bot activity, discerning between desirable and undesirable bot behavior, and identifying the sources of the undesirable activity.
Bot management is necessary because bots, if left unchecked, can cause massive problems for web properties. Too much bot traffic can put a heavy load on web servers, slowing or denying service to legitimate users (sometimes this takes the form of a DDoS attack). Malicious bots can scrape or download content from a website, steal user credentials, rapidly spread spam content, and perform various other kinds of cyberattacks.
A bot manager is any software product that manages bots. Bot managers should be able to block some bots and allow others through, instead of simply blocking all non-human traffic. If all bots are blocked and Google bots aren't able to index a page, for instance, then that page can't show up in Google search results, resulting in greatly reduced organic traffic to the website.
A good bot manager accomplishes the following goals. It can:
A bot is a computer program that operates on a network. Bots are programmed to automatically do certain actions. Typically the tasks a bot performs are fairly simple, but a bot can do them over and over at a much faster rate than a human could.
For instance, Google uses bots to constantly crawl webpages and index content for search. It would take an astronomical amount of time for a team of humans to review the content spread out across the Internet, but Google's bots are able to keep Google's search index fairly up-to-date.
As a negative example, spammers use email harvesting bots to collect email addresses from all over the Internet. The bots crawl webpages, look for any text that follows the email address format (text + @ symbol + domain), and save that text to a database. Naturally, a human could look webpages over for email addresses, but because these email harvesting bots are automated and only look for text that fits certain parameters, they are exponentially faster at finding email addresses.
Unlike when a human user accesses the Internet, a bot typically does not access the Internet via a traditional web browser like Google Chrome or Mozilla Firefox. Instead of operating a mouse (or a smartphone) and clicking on visual content in a browser, bots are just software programs that make HTTP requests (among other activities), typically using what’s called a "headless browser."
Bots can do essentially any repetitive, non-creative task – anything that can be automated. They can interact with a webpage, fill out and submit forms, click on links, scan (or "crawl") text, and download content. Bots can "watch" videos, post comments, and post, like, or retweet on social media platforms. Some bots can even hold basic conversations with human users – these are known as chatbots.
Amazingly, many sources estimate that roughly half of all Internet traffic is bot traffic. Just as some, but not all, software is malware, some bots are malicious, and some are "good."
Any bot that misuses an online product or service can be considered "bad." Bad bots can range from the blatantly malicious, such as bots that try to break into user accounts, to more mild forms of resource misuse, such as bots that buy up tickets on an events website.
A bot that performs a needed or helpful service can be considered "good." Customer service chatbots, search engine crawlers, and performance monitoring bots are all examples of good bots. Good bots typically look for and abide by the rules outlined in a website's robots.txt file.
Robots.txt is a file on a web server outlining the rules for bots accessing properties on that server. However, the file itself does not enforce these rules. Essentially, anyone who programs a bot is supposed to follow an honor system and make sure that their bot checks a website's robots.txt file before accessing the website. Malicious bots, of course, typically do not follow this system – hence the need for bot management.
If a bot is determined to be bad, it can be redirected to a different page or blocked from accessing a web resource altogether.
Good bots may be added to an allowlist, or a list of allowed bots (the opposite of a blocklist). A bot manager may also distinguish between good and bad bots via further behavioral analysis.
Another bot management approach is to use the robots.txt file to set up a honeypot. A honeypot is a fake target for bad actors that, when accessed, exposes the bad actor as malicious. In the case of a bot, a honeypot could be a webpage on the site that's forbidden to bots by the robots.txt file. Good bots will read the robots.txt file and avoid that webpage; some bad bots will crawl the webpage. By tracking the IP address of the bots that access the honeypot, bad bots can be identified and blocked.
A bot management solution can help stop a variety of attacks:
These other bot activities are not always considered "malicious," but a bot manager should be able to mitigate them regardless:
Cloudflare has the unique ability to collect data from billions of requests flowing through its network per day. With this data, Cloudflare is able to identify likely bot activity with machine learning and behavioral analysis, and can provide the data necessary for creating an effective allowlist of good bots or blocklist of bad bots. Cloudflare also has an extensive IP reputation database. Learn more about Cloudflare Bot Management.
Learning Center Navigation