Update (May 29, 5:44 pm ET): Google issued a statement, cautioning against assumptions based on “incomplete information.”
What you need to know
- Rand Fishkin of SparkToro received and published documents detailing Google Search’s internal APIs, search ranking factors, and Google’s data collection practices.
- Some leaked information contradicts Google’s public statements about search algorithms and ranking factors.
- The documents were accidentally made public on GitHub from March 27 to May 7 and later indexed by a third-party service.
A massive leak of what seems to be thousands of internal documents offers a rare glimpse into the inner workings of Google Search, suggesting that Google may have been misleading the public about its search engine operations for years.
The documents were handed over to Rand Fishkin of SparkToro, a software company, who then made them public. Fishkin, a seasoned SEO expert with over a decade of experience, says a source gave him 2,500 pages of documents, hoping to debunk the “lies” Google employees had said about how the search algorithm actually works (via The Verge).
The documents spill the beans on internal APIs and break down what impacts search results. From these leaked papers, you can get a general sense of what works and what doesn’t for ranking on Google, highlighting the key elements that matter most.
These leaks cover a wide range of topics, such as Google’s data collection, which sites get a boost for sensitive issues like elections, and how Google treats small websites.
Interestingly, some information conflicts with what Google has publicly said. For example, Google has denied treating subdomains differently in rankings and claimed they don’t use click-centric signals for content indexing, yet the leaks suggest otherwise, according to Fishkin.
Other surprises include using a sandbox for new sites, giving sites an “authority score” to bump them up in search results, and more.
Google has yet to respond to Android Central’s request for comments, but we’ll update this article when we do.
It looks like Google accidentally made these documents public on GitHub around March 27, and they were taken down by May 7. However, a third-party service indexed them, so they’re still accessible.
Even though these documents reveal potential ranking factors, they don’t specify the importance of each one in the final ranking, as SEO expert Mike King highlighted in his overview.
Earlier this year, Google launched a major Search update that prioritizes “helpful” content. The new algorithms are designed to determine if a webpage is made for search engines or real people.
Update
In an emailed statement to Android Central, a Google representative cautions the public not to jump to conclusions without all the facts.
“We would caution against making inaccurate assumptions about Search based on out-of-context, outdated, or incomplete information,” the spokesperson said. “We’ve shared extensive information about how Search works and the types of factors that our systems weigh, while also working to protect the integrity of our results from manipulation.”
Google also mentioned that it does not traditionally comment on the specifics of its ranking systems. Sharing such sensitive information could help spammers and bad actors manipulate the results, as per the company.
Search is always changing, and Google says it’s constantly tweaking its systems to provide the best results. The spokesperson added that while Google’s core ranking principles stay the same, individual signals can change often, be dropped, or just be tested and never used.
The search giant also reiterated its commitment to providing accurate information while protecting the integrity of search results. Finally, Google highlighted the potential for misinterpretation of the leaked documents.