What is bot traffic?
Bot traffic describes any non-human traffic to a website or an app. The term bot traffic often carries a negative connotation, but in reality bot traffic isn’t necessarily good or bad; it all depends on the purpose of the bots.
Some bots are essential for useful services such as search engines and digital assistants (e.g. Siri, Alexa). Most companies welcome these sorts of bots on their sites.
Other bots can be malicious, for example those used for the purposes of credential stuffing, data scraping, and launching DDoS attacks. Even some of the more benign ‘bad’ bots, such as unauthorized web crawlers, can be a nuisance because they can disrupt site analytics and generate click fraud.
It is believed that over 40% of all Internet traffic is comprised of bot traffic, and a significant portion of that is malicious bots. This is why so many organizations are looking for ways to manage the bot traffic coming to their sites.
How can bot traffic be identified?
Web engineers can look directly at network requests to their sites and identify likely bot traffic. An integrated web analytics tool, such as Google Analytics or Heap, can also help to detect bot traffic.
The following analytics anomalies are the hallmarks of bot traffic:
- Abnormally high pageviews: If a site undergoes a sudden, unprecedented and unexpected spike in pageviews, it’s likely that there are bots clicking through the site.
- Abnormally high bounce rate: The bounce rate identifies the number of users that come to a single page on a site and then leave the site before clicking anything on the page. An unexpected lift in the bounce rate can be the result of bots being directed at a single page.
- Surprisingly high or low session duration: Session duration, or the amount of time users stay on a website, should remain relatively steady. An unexplained increase in session duration could be an indication of bots browsing the site at an unusually slow rate. Conversely, an unexpected drop in session duration could be the result of bots that are clicking through pages on the site much faster than a human user would.
- Junk conversions: A surge in phony-looking conversions, such as account creations using gibberish email addresses or contact forms submitted with fake names and phone numbers, can be the result of form-filling bots or spam bots.
- Spike in traffic from an unexpected location: A sudden spike in users from one particular region, particularly a region that’s unlikely to have a large number of people who are fluent in the native language of the site, can be an indication of bot traffic.
How can bot traffic hurt analytics?
As mentioned above, unauthorized bot traffic can impact analytics metrics such as page views, bounce rate, session duration, geolocation of users, and conversions. These deviations in metrics can create a lot of frustration for the site owner; it is very hard to measure the performance of a site that’s being flooded with bot activity. Attempts to improve the site, such as A/B testing and conversion rate optimization, are also crippled by the statistical noise created by bots.
How to filter bot traffic from Google Analytics
Google Analytics does provide an option to “exclude all hits from known bots and spiders” (spiders are search engine bots that crawl webpages). If the source of the bot traffic can be identified, users can also provide a specific list of IPs to be ignored by Google Analytics.
While these measures will stop some bots from disrupting analytics, they won’t stop all bots. Furthermore, most malicious bots pursue an objective besides disrupting traffic analytics, and these measures do nothing to mitigate harmful bot activity outside of preserving analytics data.
How can bot traffic hurt performance?
Sending massive amounts of bot traffic is a very common way for attackers to launch a DDoS attack. During some types of DDoS attacks, so much attack traffic is directed at a website that the origin server becomes overloaded, and the site becomes slow or altogether unavailable for legitimate users.
How can bot traffic be bad for business?
Some websites can be financially crippled by malicious bot traffic, even if their performance is unaffected. Sites that rely on advertising and sites that sell merchandise with limited inventory are particularly vulnerable.
For sites that serve ads, bots that land on the site and click on various elements of the page can trigger fake ad clicks; this is known as click fraud. While this may initially result in a boost in ad revenue, online advertising networks are very good at detecting bot clicks. If they suspect a website is committing click fraud, they will take action, usually in the form of banning that site and its owner from their network. For this reason, owners of sites that host ads need to be ever-wary of bot click fraud.
Sites with limited inventory can be targeted by inventory hoarding bots. As the name suggests, these bots go to e-commerce sites and dump tons of merchandise into their shopping carts, making that merchandise unavailable for purchase by legitimate shoppers. In some cases this can also trigger unnecessary restocking of inventory from a supplier or manufacturer. The inventory hoarding bots never make a purchase; they are simply designed to disrupt the availability of inventory.
How can websites manage bot traffic?
The first step to stopping or managing bot traffic to a website is to include a robots.txt file. This is a file that provides instructions for bots crawling the page, and it can be configured to prevent bots from visiting or interacting with a webpage altogether. But it should be noted that only good bots will abide by the rules in robots.txt; it will not prevent malicious bots from crawling a website.
A number of tools can help mitigate abusive bot traffic. A rate limiting solution can detect and prevent bot traffic originating from a single IP address, although this will still overlook a lot of malicious bot traffic. On top of rate limiting, a network engineer can look at a site’s traffic and identify suspicious network requests, providing a list of IP addresses to be blocked by a filtering tool such as a WAF. This is a very labor-intensive process and still only stops a portion of the malicious bot traffic.
Separate from rate limiting and direct engineer intervention, the easiest and most effective way to stop bad bot traffic is with a bot management solution. A bot management solution can leverage intelligence and use behavioral analysis to stop malicious bots before they ever reach a website. For example, Cloudflare Bot Management uses intelligence from over 25,000,000 Internet properties and applies machine learning to proactively identify and stop bot abuse.