The New York Times blocks OpenAI’s web crawler

OpenAI, a San Francisco-based artificial intelligence venture, recently found itself in hot water with The New York Times after deploying web crawler bots to scrape the news outlet’s website for public domain documents. The Times accused OpenAI of violating its terms of service, leading the AI firm to remove and disable its bots.

In a statement, the Times said, “We’re disappointed that OpenAI apparently chose to engage in indiscriminate scraping of our website without contacting us first. All users of our public-facing webpages must observe the Terms of Service and use of our content in a responsible way, and those terms do not permit scraping of our website or using our content without our permission. We’ve taken appropriate measures to block OpenAI access.”

OpenAI, however, disputed the Times’ version of events, claiming it had followed established protocols by using an automated spider to examine public documents. According to OpenAI, the news outlet’s terms of service allow users to “crawl our site for the sole purpose of returning search results for publicly available information.”

“We take intellectual property very seriously. Our bots are fully compliant with the robots.txt file, and follow the tips and practices published on the World Wide Web Consortium’s website for web scraping,” OpenAI wrote on its blog.

Despite OpenAI’s rebuttal, The New York Times was not swayed and still blocked OpenAI’s access from accessing the website. As a result, OpenAI has been forced to look elsewhere for public documents, but the precedent for other web crawlers to risk being blocked from the news outlet has been set.

It remains to be seen if The New York Times will stretch out its legal arm and take actions against other web crawlers or if this was an isolated incident. Nonetheless, OpenAI, and other similar companies, should be aware that, although legal, scraping the Times’ site for public documents without first gaining permission might result in facing a stiff penalty from the news outlet.

Leave a comment Cancel reply

Exit mobile version