Written by 9:00 AM News

AI Web Crawlers Emerge as Major Internet Force, New Vercel Study Reveals

A comprehensive study released by Vercel on December 17, 2024, has revealed that AI crawlers have become a significant presence on the internet, with OpenAI’s GPTBot and Anthropic’s Claude generating nearly 1 billion requests combined across Vercel’s network in the past month.

According to the research, OpenAI’s GPTBot led with 569 million requests, while Anthropic’s Claude followed with 370 million. Together, these AI crawlers represent approximately 20% of Googlebot’s 4.5 billion requests during the same period.

The study uncovered distinct operational patterns among AI crawlers. All measured AI crawlers operate from U.S. data centers, with ChatGPT running from Des Moines, Iowa, and Phoenix, Arizona, while Claude operates from Columbus, Ohio. This contrasts with traditional search engines like Google, which operates from seven different U.S. locations.

A key finding reveals that major AI crawlers, including OpenAI, Anthropic, Meta, and ByteDance, currently lack JavaScript rendering capabilities. While these crawlers fetch JavaScript files, they don’t execute them, limiting their ability to process dynamic web content. ChatGPT dedicates 11.50% of its requests to JavaScript files, while Claude allocates 23.84%.

The research also highlighted significant inefficiencies in AI crawler behavior. Both ChatGPT and Claude spend over 34% of their fetches on 404 error pages, with ChatGPT using an additional 14.36% of fetches following redirects. These rates are notably higher than Googlebot’s 8.22% for 404s and 1.49% for redirects.

The crawlers show distinct content preferences, with ChatGPT prioritizing HTML content (57.70% of fetches) and Claude focusing heavily on images (35.17% of total fetches). This differs from Googlebot’s more balanced approach, which distributes fetches across HTML (31.00%), JSON data (29.34%), and other content types.

“In the coming weeks, we will embark on a major infrastructure initiative to ensure our systems are resilient to an extended outage in any region of any of our cloud providers by adding a layer of indirection under our control in between our applications and our cloud databases,” the report states. “This will allow significantly faster failover.”

Visited 10 times, 1 visit(s) today
Tags: , , Last modified: January 8, 2025
Close