Good Bots

We’ve written about bots before. In that previous post, we covered how to block bots that wanted to scrape your content and use it to train LLMs. But many bots are good, and you wouldn’t want to block them.

QUIC.cloud has a bot. Did you know that? When you enable our optimization services, our bot facilitates those services. In fact, if you don’t allow the bot access to your server and other firewalls, we can’t optimize your site.

And then there are third party bots, which want access to your site for purposes such as adding your content to a search engine, and monitoring your site’s uptime. These are bots that we automatically consider “friendly” and we allow them access at the CDN level.

There may be other good bots that you want to allow to access your site, and that is what the CDN’s allowlist is for.

Let’s take a closer look at all of these scenarios.

The QUIC.cloud Bot

First, let’s talk about our own bot: the QUIC.cloud Bot, or qcbot. qcbot functions as a web crawler for all QUIC.cloud Online Services which require data to be fetched from and returned to your website. For example, qcbot retrieves images from your site for Image Optimization, and notifies your site when the updated images are ready to be pulled.

qcbot is used for all QUIC.cloud services except for the CDN.

If you want to take full advantage of our online services, qcbot may need to make multiple access requests to your website. This is done through a QUIC.cloud IP, so be sure you are allowing them through your firewalls!

Recognizing QUIC.cloud Bot

So how will you know when a request is coming from our friendly bot?

The QUIC.cloud bot will always have a user-agent containing qcbot, as in the following examples:

Mobile:

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; qcbot/1.0; +http://quic.cloud/bot.html) Chrome/112.0.0.0 Safari/537.36

Desktop:

Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Mobile Safari/537.36 (compatible; qcbot/1.0; +http://quic.cloud/bot.html)

Controlling the QUIC.cloud Bot

The number of requests made by the QUIC.cloud bot to your website depends on the specific Online Services you use. For instance, the Image Optimization service will request image files on your site, while UCSS, CCSS, and VPI will access pages.

All requests from qcbot are essential for performing the optimization services you enable.

There should be no reason to reduce the number of requests qcbot makes to your site. However, if you wish to do so, the only way is to review the QUIC.cloud services you have enabled in the LiteSpeed Cache plugin, and disable any features you no longer want.

Other Friendly Bots

Let’s look at this from the CDN side now. If you are a QUIC.cloud CDN user, there are some common bots that you wouldn’t want blocked. These include search engines, social media preview generators, website uptime monitors and the like.

We have allowlisted the following bots at the CDN level:

  • Applebot
  • Baidu
  • BetterUptime
  • Bingbot
  • DuckDuckGo
  • Facebook/Instagram/Messenger
  • Googlebot
  • Hetrix
  • Jetpack
  • LinkedinBot
  • Monitis
  • NodePing
  • Paypal
  • Pingdom
  • Pinteristbot
  • Site24x7
  • Stripe
  • Uptime Robot
  • Yandexbot

If you are aware of another friendly bot that would benefit other sites, and you think it should be added to the QUIC.cloud CDN allowlist, you can email support@quic.cloud and let us know the name of the bot.

Allowlisting Your Own Friendly Bots

If you are aware of another friendly bot that is not widely used, like maybe a custom bot that you have made yourself, or a bot that is used by your hosting provider, you can add this to your domain’s CDN Allowlist.

In your QUIC.cloud Dashboard, click your domain name, and then navigate to CDN > CDN Config > Security. Scroll down to Access Control > User Agent, and see the Allowlist field. You can enter one good bot per line, and you can use RegEx, if you want to.

User agents listed in this Allowlist will bypass security checks and ignore any configured reCAPTCHA connection limits. They will be allowed 100 visits per 10 seconds per IP to a single node. Please be careful with this setting. Only allowlist a user agent if necessary. It is easy to spoof a user agent in order to bypass site security.

An entry in the Allowlist is considered a match if it is found anywhere in the User-Agent header.

Conclusion

The decision to allow bots to crawl your site is yours and yours alone. Many bots are good, and keep your site in working order. Some bots are bad. But the power to control access is in your hands.

If you want to learn more, here is some documentation that may help:

Leave a Comment