The development of artificial intelligence brings along the need for large data. The main source of this data is mostly the internet. However, not every type of data on the internet can be used in artificial intelligence training.

Websites determine which data can be collected through files called robots.txt. According to a Reuters report, some artificial intelligence developers continue to collect data by violating the rules in these files. Particularly, Perplexity, described as a ‘free artificial intelligence search engine,’ is among the most criticized companies in this regard.

OpenAI and Anthropic do not comply with robots.txt guidelines

Are Artificial Intelligence Developers Violating Protocol Rules?

The same applies to other major players such as OpenAI and Anthropic. Reports indicate that these companies ignore the content-blocking guidelines of robots.txt. Perplexity CEO Aravind Srinivas previously stated that the company first violated the protocols and then did not lie about it.

The robots.txt protocol has been in use since the 1990s and does not have any legal binding. This situation requires new regulations to be introduced to the protocol and likely necessitates the creation of stricter rules to contribute to solving the problem.

