Inside Search
The official Google Search blog
Another step to reward high-quality sites
April 24, 2012
(Cross-posted on the
Webmaster Central Blog
)
Google has said before that search engine optimization, or SEO, can be
positive and constructive
—and we're
not the only ones
. Effective search engine optimization can make a site more crawlable and make individual pages more accessible and easier to find. Search engine optimization includes things as simple as keyword research to ensure that the right words are on the page, not just industry jargon that normal people will never type.
“White hat” search engine optimizers often improve the usability of a site, help create great content, or make sites faster, which is good for both users and search engines. Good search engine optimization can also mean good marketing: thinking about creative ways to make a site more compelling, which can help with search engines as well as social media. The net result of making a great site is often greater awareness of that site on the web, which can translate into more people linking to or visiting a site.
The opposite of “white hat” SEO is something called “black hat webspam” (we say “webspam” to distinguish it from email spam). In the pursuit of higher rankings or traffic, a few sites use techniques that don’t benefit users, where the intent is to look for shortcuts or loopholes that would rank pages higher than they deserve to be ranked. We see all sorts of webspam techniques every day, from
keyword stuffing
to
link schemes
that attempt to propel sites higher in rankings.
The goal of many of our ranking changes is to help searchers find sites that provide a great user experience and fulfill their information needs. We also want the “good guys” making great sites for users, not just algorithms, to see their effort rewarded. To that end we’ve launched
Panda changes
that successfully
returned higher-quality sites in search results
. And earlier this year we launched a
page layout algorithm
that reduces rankings for sites that don’t make much content available “above the fold.”
In the next few days, we’re launching an important algorithm change targeted at webspam. The change will decrease rankings for sites that we believe are violating Google’s existing
quality guidelines
. We’ve always targeted webspam in our rankings, and this algorithm represents another improvement in our efforts to reduce webspam and promote high quality content. While we can't divulge specific signals because we don't want to give people a way to game our search results and worsen the experience for users, our advice for webmasters is to focus on
creating high quality sites
that create a good user experience and employ white hat SEO methods instead of engaging in aggressive webspam tactics.
Here’s an example of a webspam tactic like keyword stuffing taken from a site that will be affected by this change:
Of course, most sites affected by this change aren’t so blatant. Here’s an example of a site with unusual linking patterns that is also affected by this change. Notice that if you try to read the text aloud you’ll discover that the outgoing links are completely unrelated to the actual content, and in fact the page text has been “spun” beyond recognition:
Sites affected by this change might not be easily recognizable as spamming without deep analysis or expertise, but the common thread is that these sites are doing much more than white hat SEO; we believe they are engaging in webspam tactics to manipulate search engine rankings.
The change will go live for all languages at the same time. For context, the initial Panda change affected about 12% of queries to a significant degree; this algorithm affects about 3.1% of queries in English to a degree that a regular user might notice. The change affects roughly 3% of queries in languages such as German, Chinese, and Arabic, but the impact is higher in more heavily-spammed languages. For example, 5% of Polish queries change to a degree that a regular user might notice.
We want people doing white hat search engine optimization (or even no search engine optimization at all) to be free to focus on creating amazing, compelling web sites. As always, we’ll keep our ears open for feedback on ways to iterate and improve our ranking algorithms toward that goal.
Posted by Matt Cutts, Distinguished Engineer
Search quality highlights: 50 changes for March
April 3, 2012
Here’s our latest installment of search quality highlights, with another 50 changes to report for March. We’re starting to get into a groove with these posts, so we’re getting more and more comprehensive as the months go by. New for this month, we’ve published
uncut video
from our search quality meeting, which gives a great flavor for how these decisions get made.
Here’s the list for March:
Autocomplete with math symbols.
[launch codename "Blackboard", project codename "Suggest"] When we process queries to return predictions in autocomplete, we generally normalize them to match more relevant predictions in our database. This change incorporates several characters that were previously normalized: “+”, “-”, “*”, “/”, “^”, “(“, “)”, and “=”. This should make it easier to search for popular equations, for example [
e = mc2
] or [
y = mx+b
].
Improvements to handling of symbols for indexing.
[launch codename "Deep Maroon"] We generally ignore punctuation symbols in queries. Based on analysis of our query stream, we’ve now started to index the following heavily used symbols: “%”, “$”, “\”, “.”, “@”, “#”, and “+”. We’ll continue to index more symbols as usage warrants.
Better scoring of news groupings.
[launch codename "avenger_2"] News results on Google are organized into groups that are about the same story. We have scoring systems to determine the ordering of these groups for a given query. This subtle change slightly improves our scoring system, leading to better ranking of news clusters.
Sitelinks data refresh.
[launch codename "Saralee-76"] Sitelinks (the links that appear beneath some search results and link deeper into the respective site) are generated in part by an offline process that analyzes site structure and other data to determine the most relevant links to show users. We’ve recently updated the data through our offline process. These updates happen frequently (on the order of weeks).
Improvements to autocomplete backends, coverage.
[launch codename "sovereign", project codename "Suggest"] We’ve consolidated systems and reduced the number of backend calls required to prepare autocomplete predictions for your query. The result is more efficient CPU usage and more comprehensive predictions.
Better handling of password changes.
Our general approach is that when you change passwords, you’ll be signed out from your account on all machines. This change ensures that changing your password more consistently signs your account out of Search, everywhere.
Better indexing of profile pages.
[launch codename "Prof-2"] This change improves the comprehensiveness of public profile pages in our index from more than two-hundred social sites.
UI refresh for News Universal.
[launch codename "Cosmos Newsy", project codename "Cosmos"] We’ve refreshed the design of News Universal results by providing more results from the top cluster, unifying the UI treatment of clusters of different sizes, adding a larger font for the top article, adding larger images (from licensed sources), and adding author information.
Improvements to results for navigational queries.
[launch codename "IceMan5"] A “navigational query” is a search where it looks like the user is looking to navigate to a particular website, such as [New York Times] or [wikipedia.org]. While these searches may seem straightforward, there are still challenges to serving the best results. For example, what if the user doesn’t actually know the right URL? What if the URL they’re searching for seems to be a parked domain (with no content)? This change improves results for this kind of search.
High-quality sites algorithm data update and freshness improvements.
[launch codename “mm”, project codename "Panda"] Like many of the changes we make, aspects of our high-quality sites algorithm depend on processing that’s done offline and pushed on a periodic cycle. In the past month, we’ve pushed updated data for “Panda,” as we mentioned in a
recent tweet
. We’ve also made improvements to keep our database fresher overall.
Live results for UEFA Champions League and KHL.
We’ve added live-updating snippets in our search results for the KHL (Russian Hockey League) and UEFA Champions League, including scores and schedules. Now you can find live results from a variety of sports leagues, including the
NFL
,
NBA
,
NHL
and others.
Tennis search feature.
[launch codename "DoubleFault"] We’ve introduced a new search feature to provide realtime tennis scores at the top of the search results page. Try [
maria sharapova
] or [
sony ericsson open
].
More relevant image search results.
[launch codename "Lice"] This change tunes signals we use related to landing page quality for images. This makes it more likely that you’ll find highly relevant images, even if those images are on pages that are lower quality.
Fresher image predictions in all languages.
[launch codename "imagine2", project codename "Suggest"] We recently rolled out a change to surface more relevant image search predictions in autocomplete in English. This improvement extends the update to all languages.
SafeSearch algorithm tuning.
[launch codenames "Fiorentini", “SuperDyn”; project codename "SafeSearch"] This month we rolled out a couple of changes to our SafeSearch algorithm. We’ve updated our classifier to make it smarter and more precise, and we’ve found new ways to make adult content less likely to appear when a user isn't looking for it
Tweaks to handling of anchor text.
[launch codename "PC"] This month we turned off a classifier related to anchor text (the visible text appearing in links). Our experimental data suggested that other methods of anchor processing had greater success, so turning off this component made our scoring cleaner and more robust.
Simplification to Images Universal codebase.
[launch codename "Galactic Center"] We’ve made some improvements to simplify our codebase for Images Universal and to better utilize improvements in our general web ranking to also provide better image results.
Better application ranking and UI on mobile.
When you search for apps on your phone, you’ll now see richer results with app icons, star ratings, prices, and download buttons arranged to fit well on smaller screens. You’ll also see more relevant ranking of mobile applications based on your device platform, for example Android or iOS.
Improvements to freshness in Video Universal.
[launch codename "graphite", project codename "Freshness"] We’ve improved the freshness of video results to better detect stale videos and return fresh content.
Fewer undesired synonyms.
[project codename "Synonyms"] When you search on Google, we often identify other search terms that might have the same meaning as what you entered in the box (synonyms) and surface results for those terms as well when it might be helpful. This month we tweaked a classifier to prevent unhelpful synonyms from being introduced as content in the results set.
Better handling of queries with both navigational and local intent.
[launch codename "ShieldsUp"] Some queries have both local intent and are very navigational (directed towards a particular website). This change improves the balance of results we show, and helps ensure you’ll find highly relevant navigational results or local results towards the top of the page as appropriate for your query.
Improvements to freshness.
[launch codename "Abacus", project codename "Freshness"] We launched an improvement to freshness late last year that was very helpful, but it cost significant machine resources. At the time we decided to roll out the change only for news-related traffic. This month we rolled it out for all queries.
Improvements to processing for detection of site quality.
[launch codename "Curlup"] We’ve made some improvements to a longstanding system we have to detect site quality. This improvement allows us to get greater confidence in our classifications.
Better interpretation and use of anchor text.
We’ve improved systems we use to interpret and use anchor text, and determine how relevant a given anchor might be for a given query and website.
Better local results and sources in Google News.
[launch codename "barefoot", project codename "news search"] We’re deprecating a signal we had to help people find content from their local country, and we’re building similar logic into other signals we use. The result is more locally relevant Google News results and higher quality sources.
Deprecating signal related to ranking in a news cluster.
[launch codename "decaffeination", project codename "news search”] We’re deprecating a signal that’s no longer improving relevance in Google News. The signal was originally developed to help people find higher quality articles on Google News. (Note: Despite the launch codename, this project has nothing to do with Caffeine, our update to indexing in 2010).
Fewer “sibling” synonyms.
[launch codename "Gemini", project codename "Synonyms"] One of the main signals we look at to identify synonyms is context. For example, if the word “cat” often appears next to the term “pet” and “furry,” and so does the word “kitten”, our algorithms may guess that “cat” and “kitten” have similar meanings. The problem is that sometimes this method will introduce “synonyms” that actually are different entities in the same category. To continue the example, dogs are also “furry pets” -- so sometimes “dog” may be incorrectly introduced as a synonym for “cat”. We’ve been working for some time to appropriately ferret out these “sibling” synonyms, and our latest system is more maintainable, updatable, debuggable, and extensible to other systems.
Better synonym accuracy and performance.
[project codename "Synonyms"] We’ve made further improvements to our synonyms system by eliminating duplicate logic. We’ve also found ways to more accurately identify appropriate synonyms in cases where there are multiple synonym candidates with different contexts.
Retrieval system tuning.
[launch codename "emonga", project codename "Optionalization"] We’ve improved systems that identify terms in a query which are not necessarily required to retrieve relevant documents. This will make results more faithful to the original query.
Less aggressive synonyms.
[launch codename "zilong", project codename "Synonyms"] We’ve heard feedback from users that sometimes our algorithms are too aggressive at incorporating search results for other terms. The underlying cause is often our synonym system, which will include results for other terms in many cases. This change makes our synonym system less aggressive in the way it incorporates results for other query terms, putting greater weight on the original user query.
Update to systems relying on geographic data.
[launch codename "Maestro, Maitre"] We have a number of signals that rely on geographic data (similar to the data we surface in Google Earth and Maps). This change updates some of the geographic data we’re using.
Improvements to name detection.
[launch codename "edge", project codename "NameDetector"] We’ve improved a system for detecting names, particularly for celebrity names.
Updates to personalization signals.
[project codename "PSearch"] This change updates signals used to personalize search results.
Improvements to Image Search relevance.
[launch codename "sib"] We’ve updated signals to better promote reasonably sized images on high-quality landing pages.
Remove deprecated signal from site relevance signals.
[launch codename "Freedom"] We’ve removed a deprecated product-focused signal from a site-understanding algorithm.
More precise detection of old pages.
[launch codename "oldn23", project codename “Freshness"] This change improves detection of stale pages in our index by relying on more relevant signals. As a result, fewer stale pages are shown to users.
Tweaks to language detection in autocomplete.
[launch codename “Dejavu”, project codename "Suggest"] In general, autocomplete relies on the display language to determine what language predictions to show. For most languages, we also try to detect the user query language by analyzing the script, and this change extends that behavior to Chinese (Simplified and Traditional), Japanese and Korean. The net effect is that when users forget to turn off their IMEs, they’ll still get English predictions if they start typing English terms.
Improvements in date detection for blog/forum pages.
[launch codename "fibyen", project codename "Dates"] This change improves the algorithm that determines dates for blog and forum pages.
More predictions in autocomplete by live rewriting of query prefixes.
[launch codename "Lombart", project codename "Suggest”] In this change we’re rewriting partial queries on the fly to retrieve more potential matching predictions for the user query. We use synonyms and other features to get the best overall match. Rewritten prefixes can include term re-orderings, term additions, term removals and more.
Expanded sitelinks on mobile.
We’ve launched our
expanded sitelinks
feature for mobile browsers, providing better organization and presentation of sitelinks in search results.
More accurate short answers.
[project codename “Porky Pig”] We’ve updated the sources behind our
short answers feature
to rely on data from
Freebase
. This improves accuracy and makes it easier to fix bugs.
Migration of video advanced search backends.
We’ve migrated some backends used in video advanced search to our main search infrastructure.
+1 button in search for more countries and domains.
This month we’ve internationalized the +1 button on the search results page to additional languages and domains. The +1 button in search makes it easy to share recommendations with the world right from your search results. As we said in
our initial blog post
, the beauty of +1’s is their relevance—you get the right recommendations (because they come from people who matter to you), at the right time (when you are actually looking for information about that topic) and in the right format (your search results).
Local result UI refresh on tablet.
We’ve updated the user interface of local results on tablets to make them more compact and easier to scan.
And here are a few other changes we’ve blogged about since last time:
Flights to worldwide destinations
Redesigned Search App for Windows 7.5 phones
SSL search around the globe
“Recent” feature on mobile
Full-page themes in iGoogle
March Madness NCAA search feature
3D graphing calculator
Posted by Johanna Wright, Director of Product Management
Labels
flight search
images
knowledge graph
local
mobile
quick answers
Search Blog
search quality
search stories
search tips
trends
universal search
webmasters
Archive
2016
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2015
Dec
Nov
Oct
Sep
Jul
Jun
May
Apr
Mar
Feb
Jan
2014
Dec
Nov
Oct
Aug
Jul
Jun
Apr
Mar
Feb
Jan
2013
Dec
Nov
Sep
Aug
Jul
May
Apr
Mar
Feb
Jan
2012
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2011
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Feed
Google
on
Follow @google
Follow
Give us feedback in our
Product Forums
.