What is bot traffic? | How to stop bot traffic

机器人流量是指网站的非人类流量。 虽然某些机器人流量有益,但过多的机器人流量破坏性很强。

学习目标

阅读本文后,您将能够:

  • 明确机器人流量的概念。
  • 了解如何分辨机器人流量。
  • 概述恶意机器人所带来的不良后果。
  • 了解如何阻止恶意机器人流量。

复制文章链接

什么是机器人流量?

机器人流量是指网站或应用程序的非人类流量。机器人流量这个词语通常含有贬义,但实际上机器人流量不一定有益或有害,这主要取决于机器人所要实现的目的。

有些机器人对搜索引擎和数字助手(如 Siri、Alexa)等实用服务而言,必不可少。多数公司欢迎这类机器人访问其网站。

例如用于凭证填充数据抓取,以及发动 DDoS 攻击等目的的其他机器人,则可能是恶意机器人。即使是某些较为良性的“恶意”机器人,如未经授权的 Web 爬网程序,也会造成损害,因为这些机器人会对站点分析造成妨碍,并产生点击欺诈。

据信,全部 Internet 流量中,超过 40% 以上是由机器人流量组成,其中很大一部分为恶意机器人流量。这也是许多组织开始寻求方法管理进入其站点的机器人流量的原因。

如何辨别机器人流量?

Web 工程师可直接查看指向其站点的网络请求,并辨别是否为机器人流量。Google Analytics 或 Heap 等集成式 Web 分析工具也可助力机器人流量检测。

以下分析异常标志着网络流量为机器人流量:

  • 页面访问量异常高:如果站点页面访问量突然毫无预兆地空前暴增,则有可能是机器人在点击浏览该站点。
  • 跳出率异常高:跳出率是指进入站点的单个页面,而后不点击页面任何内容即离开该站点的用户数量。跳出率毫无预兆地上升可能是因为机器人被指向单个页面所致。
  • 会话持续时间过长或过短:会话持续时间或用户在网站停留的时长,应当保持相对稳定。会话持续时间突然增加可能表明机器人正在以异常缓慢的速率浏览该站点。与之相反,会话持续时间突然缩短可能是由于机器人正快速点击页面所致,其点击速率比人类操作快得多。
  • 垃圾转换次数:虚假转换次数剧增,如使用垃圾电子邮件地址创建帐户或者用虚假的姓名和电话号码提交联系人表单,可能是填表机器人或者垃圾邮件机器人活动的结果。
  • 意外位置流量剧增:某一特定区域的用户数量突然猛增,尤其是不可能有很多能够流利说网站本地语言人口的区域,这可能表明是机器人流量。

机器人流量如何对分析造成损害?

如上所述,未经授权的机器人流量会影响分析度量指标,如页面访问量、跳出率、会话持续时间、用户定位以及转换次数。度量指标偏差会给站点所有者带来许多不利影响;对于充斥着机器人活动的站点,很难衡量其性能。尝试通过 A/B 测试以及优化转换率来改善站点性能,也会因机器人造成的统计噪声而受阻。

如何从 Google Analytics 过滤机器人流量?

Google Analytics does provide an option to “exclude all hits from known bots and spiders” (spiders are search engine bots that crawl webpages). If the source of the bot traffic can be identified, users can also provide a specific list of IPs to be ignored by Google Analytics.

虽然采取这些措施会阻止某些机器人妨碍分析,但无法阻止所有机器人。此外,大部分恶意机器人目的不只是为了扰乱流量分析,而这些措施除了保存分析数据之外,在进行有害机器人活动防护方面别无他法。

机器人流量如何损害性能?

攻击者发动 DDoS 攻击最常用的方式就是发送大量机器人流量。某些类型的 DDoS 攻击活动期间,有大量攻击流量指向网站,以致源服务器负担过重,站点运行变慢或者合法用户根本无法访问。

机器人流量带来的不利业务影响有哪些?

受恶意机器人流量影响,一些网站即使性能未受影响,也可能蒙受经济损失。依赖于广告推广的站点和销售有限库存商品的站点特别容易遭到攻击。

For sites that serve ads, bots that land on the site and click on various elements of the page can trigger fake ad clicks; this is known as click fraud. While this may initially result in a boost in ad revenue, online advertising networks are very good at detecting bot clicks. If they suspect a website is committing click fraud, they will take action, usually in the form of banning that site and its owner from their network. For this reason, owners of sites that host ads need to be ever-wary of bot click fraud.

Sites with limited inventory can be targeted by inventory hoarding bots. As the name suggests, these bots go to e-commerce sites and dump tons of merchandise into their shopping carts, making that merchandise unavailable for purchase by legitimate shoppers. In some cases this can also trigger unnecessary restocking of inventory from a supplier or manufacturer. The inventory hoarding bots never make a purchase; they are simply designed to disrupt the availability of inventory.

网站如何管理机器人流量?

The first step to stopping or managing bot traffic to a website is to include a robots.txt file. This is a file that provides instructions for bots crawling the page, and it can be configured to prevent bots from visiting or interacting with a webpage altogether. But it should be noted that only good bots will abide by the rules in robots.txt; it will not prevent malicious bots from crawling a website.

A number of tools can help mitigate abusive bot traffic. A rate limiting solution can detect and prevent bot traffic originating from a single IP address, although this will still overlook a lot of malicious bot traffic. On top of rate limiting, a network engineer can look at a site’s traffic and identify suspicious network requests, providing a list of IP addresses to be blocked by a filtering tool such as a WAF. This is a very labor-intensive process and still only stops a portion of the malicious bot traffic.

Separate from rate limiting and direct engineer intervention, the easiest and most effective way to stop bad bot traffic is with a bot management solution. A bot management solution can leverage intelligence and use behavioral analysis to stop malicious bots before they ever reach a website. For example, Cloudflare Bot Management uses intelligence from over 25,000,000 Internet properties and applies machine learning to proactively identify and stop bot abuse. Super Bot Fight Mode, available on Pro and Business plans, offers smaller organizations similar visibility and control over their bot traffic.