Inside the war between genAI and the internet – Computerworld

Fighting back

Cloudflare is now deliberately poisoning large language model (LLM) training data, fighting back against the AI companies that are taking data from websites without permission. (The company offers content delivery networks, cybersecurity, DDoS mitigation, and web performance optimization.)

Here’s the problem Cloudflare is trying to solve: Companies like OpenAI, Anthropic, and Perplexity have been accused of harvesting data from websites, ignoring robots.txt files on the sites (originally designed to tell search engines which files were off-limits for indexing), and taking data anyway. In addition to these big names, all kinds of smaller, less legitimate companies are capturing data without permission from the rightful owners.

Cloudflare’s solution is a feature available to all customers called “AI Labyrinth.” The program redirects incoming bots to its own special-purpose websites, which are filled with huge quantities of factually accurate but irrelevant (irrelevant to the target website) AI-generated information.

Source link