The development of artificial intelligence brings along the need for large data. The main source of this data is mostly the internet. However, not every type of data on the internet can be used in artificial intelligence training.

Websites determine which data can be collected through files called robots.txt. According to a Reuters report, some artificial intelligence developers continue to collect data by violating the rules in these files. Particularly, Perplexity, described as a ‘free artificial intelligence search engine,’ is among the most criticized companies in this regard.

OpenAI and Anthropic do not comply with robots.txt guidelines

The same applies to other major players such as OpenAI and Anthropic. Reports indicate that these companies ignore the content-blocking guidelines of robots.txt. Perplexity CEO Aravind Srinivas previously stated that the company first violated the protocols and then did not lie about it.

The robots.txt protocol has been in use since the 1990s and does not have any legal binding. This situation requires new regulations to be introduced to the protocol and likely necessitates the creation of stricter rules to contribute to solving the problem.

James Webb Solves the Mystery of a 13 Billion-Year-Old Planet

Why is Taking Night Photos of the Eiffel Tower Prohibited?

EU Commission Investigates Corning’s Alleged Antitrust Violations on Gorilla Glass

Are Artificial Intelligence Developers Violating Protocol Rules?

Kling: New AI Video Creation Model from China Competing with OpenAI’s Sora

Using Artificial Intelligence Features Coming with iOS 18 Will Require iPhone 15 Pro!

Elon Musk’s xAI Initiative Reached a Valuation of $6 Billion