Categories: The Verge

Cloudflare says Perplexity’s AI bots are ‘stealth crawling’ blocked sites

The AI search startup Perplexity is allegedly skirting restrictions meant to stop its AI web crawlers from accessing certain websites, according to a report from Cloudflare. In the report, Cloudflare claims that when Perplexity encounters a block, the startup will conceal its crawling identity “in an attempt to circumvent the website’s preferences.”

The report only adds to concerns about Perplexity vacuuming up content without permission, as the company got caught barging past paywalls and ignoring sites’ robots.txt files last year. At the time, Perplexity CEO Aravind Srinivas blamed the activity on third-party crawlers used by the site.

Now, Cloudflare, one of the world’s biggest internet architecture providers, says it received complaints from customers who claimed that Perplexity’s bots still had access to their websites even after putting their preference in their websites’ robots.txt file and by creating Web Application Firewall (WAF) rules to restrict access to the startup’s AI bots.

To test this, Cloudflare says it created new domains with similar restrictions against Perplexity’s AI scrapers. It found that the startup will first attempt to access the sites by identifying itself as the names of its crawlers: “PerplexityBot” or “Perplexity-User.”

But if the website has restrictions against AI scraping, Cloudflare claims Perplexity will change its user agent — the bit of information that tells a website what kind of browser and device you’re using, or if the visitor is a bot — to “impersonate Google Chrome on macOS.” Cloudflare says this “undeclared crawler” uses “rotating” IP addresses that the company doesn’t include on the list of IP addresses used by its bots.

Additionally, Cloudflare claims that Perplexity changes its autonomous system networks (ASN), a number used to identify groups of IP networks controlled by a single operator, to get around blocks as well. “This activity was observed across tens of thousands of domains and millions of requests per day,” Cloudflare writes.

In a statement to The Verge, Perplexity spokesperson Jesse Dwyer called Cloudflare’s report a “publicity stunt,” adding that “there are a lot of misunderstandings in the blog post.” Cloudflare has since de-listed Perplexity as a verified bot and has rolled out methods to block Perplexity’s “stealth crawling.” 

Cloudflare CEO Matthew Prince has been outspoken about AI’s “existential threat” to publishers. Last month, the company started letting websites ask AI companies to pay to crawl their content, and began blocking AI crawlers by default.

rssfeeds-admin

Share
Published by
rssfeeds-admin

Recent Posts

Exclusivity, Affordability, Third-Party Partnerships in Focus as New Xbox Leadership Vows to ‘Fix the Fundamentals’

In a new mission statement of sorts emailed to Xbox staff and posted on Xbox…

2 hours ago

Apex Review

Apex debuts on Netflix on April 24.If there's one thing Charlize Theron knows how to…

2 hours ago

The MSI Aegis Z2 RTX 5070 Ti Gaming PC Drops to $1,850 and Includes a Free Copy of Pragmata

For a limited time, B&H Photo is offering the powerful MSI Aegis Z2 RTX 5070…

2 hours ago

Son Arrested After Bloomington Fire

BLOOMINGTON, Ind. (WOWO) — A Bloomington man is facing multiple felony charges after police said…

3 hours ago

Bears Stay Push

ILLINOIS, (WOWO) — Political leaders moved Wednesday to block Indiana’s effort to lure the Chicago…

3 hours ago

Recovery efforts underway in Rock County after devastating flood damage

Rock County Emergency Management Director Kevin Burnett stated that his team has been working to…

3 hours ago

This website uses cookies.