Why the opaque army of little robots that roam the internet and feed AI inherit the blind spots and biases of the web
Why AI systems, which rely on the internet, poorly reflect the diversity of human knowledge
InvestigationAs the internet supplies the vast amounts of data AI systems need for training, ChatGPT and the likes inherit the web's biases and thus cannot claim to be comprehensive.
They are tiny creatures that we unknowingly share our lives with. Discreet insects drawn to the traces of human activity, swarming in the shadows of our everyday existence. But unlike the spiders that inhabit our gardens and homes, these crawlers – sometimes called "web spiders" – are not made of chitin; their web is built from code, fiber optics and network protocols. These industrious little robots spread across the internet as the web's surveyors, tasked with moving from link to link across the vast digital landscape.
In the great mechanical family of web spiders, not all species have the same job. One of the oldest appeared with the first major search engines and directories: these crawlers, such as Googlebot (Google's crawler), Bingbot (Bing's crawler) and Slurp (Yahoo!'s first crawler), are sent into the wild to catalog and index existing web pages, making them easier for internet users to access.
In recent years, a new generation of crawlers has swept across the internet. Powered by large language models (LLMs), the programs that power artificial intelligence agents, they do far more than simply index the web. New bots such as GPTBot, ClaudeBot, Meta-ExternalAgent and Bytespider scrape content on a massive scale.
The goal is to sweep through the web, an inexhaustible reservoir of knowledge, to build gigantic corpora of textual data. These corpora are then used to feed and train the LLMs developed by OpenAI, Anthropic and Meta, enabling their respective agents – ChatGPT, Claude and Llama – to generate increasingly plausible responses to user prompts.
You have 91.52% of this article left to read. The rest is for subscribers only.
Related Stories
AI News
World Cup 2026: Why the debate surrounding Jude Bellingham for England remains ahead of Ghana game
26 minutes ago
AI News
France restricts public drinking and outdoor sports as heat wave bakes parts of Europe
27 minutes ago
AI News
Mbappe, France play Iraq in World Cup match: prediction, team news, lineups
27 minutes ago
AI News
Four months after the horrific Iran school bombing, fears grow that Trump and Hegseth will bury the truth
27 minutes ago
AI News
A decade after Brexit, its economic and political aftershocks haunt Britain
27 minutes ago
AI News
The black community's 'untold stories' to be shared
27 minutes ago
AI News
Record Canadian trade mission heads to Japan as CUSMA review looms
28 minutes ago
AI News
Mark Carney shifts his tone on U.S. trade tensions
28 minutes ago