Bot traffic is non-human traffic to a website. While some bot traffic is beneficial, abusive bot traffic can be very disruptive.
Bot traffic describes any non-human traffic to a website or an app. The term bot traffic often carries a negative connotation, but in reality bot traffic isn’t necessarily good or bad; it all depends on the purpose of the bots.
Some bots are essential for useful services such as search engines and digital assistants (e.g. Siri, Alexa). Most companies welcome these sorts of bots on their sites.
Other bots can be malicious, for example those used for the purposes of credential stuffing, data scraping, and launching DDoS attacks. Even some of the more benign ‘bad’ bots, such as unauthorized web crawlers, can be a nuisance because they can disrupt site analytics and generate click fraud.
It is believed that over 40% of all Internet traffic is comprised of bot traffic, and a significant portion of that is malicious bots. This is why so many organizations are looking for ways to manage the bot traffic coming to their sites.
Web engineers can look directly at network requests to their sites and identify likely bot traffic. An integrated web analytics tool, such as Google Analytics or Heap, can also help to detect bot traffic.
The following analytics anomalies are the hallmarks of bot traffic:
As mentioned above, unauthorized bot traffic can impact analytics metrics such as page views, bounce rate, session duration, geolocation of users, and conversions. These deviations in metrics can create a lot of frustration for the site owner; it is very hard to measure the performance of a site that’s being flooded with bot activity. Attempts to improve the site, such as A/B testing and conversion rate optimization, are also crippled by the statistical noise created by bots.
Google Analytics does provide an option to “exclude all hits from known bots and spiders” (spiders are search engine bots that crawl webpages). If the source of the bot traffic can be identified, users can also provide a specific list of IPs to be ignored by Google Analytics.
While these measures will stop some bots from disrupting analytics, they won’t stop all bots. Furthermore, most malicious bots pursue an objective besides disrupting traffic analytics, and these measures do nothing to mitigate harmful bot activity outside of preserving analytics data.
Sending massive amounts of bot traffic is a very common way for attackers to launch a DDoS attack. During some types of DDoS attacks, so much attack traffic is directed at a website that the origin server becomes overloaded, and the site becomes slow or altogether unavailable for legitimate users.
Some websites can be financially crippled by malicious bot traffic, even if their performance is unaffected. Sites that rely on advertising and sites that sell merchandise with limited inventory are particularly vulnerable.
For sites that serve ads, bots that land on the site and click on various elements of the page can trigger fake ad clicks; this is known as click fraud. While this may initially result in a boost in ad revenue, online advertising networks are very good at detecting bot clicks. If they suspect a website is committing click fraud, they will take action, usually in the form of banning that site and its owner from their network. For this reason, owners of sites that host ads need to be ever-wary of bot click fraud.
Sites with limited inventory can be targeted by inventory hoarding bots. As the name suggests, these bots go to e-commerce sites and dump tons of merchandise into their shopping carts, making that merchandise unavailable for purchase by legitimate shoppers. In some cases this can also trigger unnecessary restocking of inventory from a supplier or manufacturer. The inventory hoarding bots never make a purchase; they are simply designed to disrupt the availability of inventory.
The first step to stopping or managing bot traffic to a website is to include a robots.txt file. This is a file that provides instructions for bots crawling the page, and it can be configured to prevent bots from visiting or interacting with a webpage altogether. But it should be noted that only good bots will abide by the rules in robots.txt; it will not prevent malicious bots from crawling a website.
A number of tools can help mitigate abusive bot traffic. A rate limiting solution can detect and prevent bot traffic originating from a single IP address, although this will still overlook a lot of malicious bot traffic. On top of rate limiting, a network engineer can look at a site’s traffic and identify suspicious network requests, providing a list of IP addresses to be blocked by a filtering tool such as a WAF. This is a very labor-intensive process and still only stops a portion of the malicious bot traffic.
Separate from rate limiting and direct engineer intervention, the easiest and most effective way to stop bad bot traffic is with a bot management solution. A bot management solution can leverage intelligence and use behavioral analysis to stop malicious bots before they ever reach a website. For example, Cloudflare Bot Management uses intelligence from over 25,000,000 Internet properties and applies machine learning to proactively identify and stop bot abuse.
After reading this article you will be able to:
What is a Bot?
What is Bot Management?
What is Credential Stuffing?
What is Content Scraping?