Originally published on Search Engine Watch (May 2012)

In Google’s recent earnings call, the question was posed, “If you think of the future of Internet search 3 or 4 years out, how important will the social signal be and how important (will) personalization be?”

CEO Larry Page responded by explaining how he might search for one of his friends who had a common name.

“For the first time, the search box isn’t really searching a string…it’s actually searching for that person that I know,” Page said. “Having real feedback from users…is very useful for search…we have a lot of those signals already, but we can always use more…we can always use better relevance and we can always use more data to generate that.”

Page’s response reveals two insights into how Google is thinking about big data:

  • Page’s anecdote is a direct reference to Google’s increasing focus on enhancing the search experience by leveraging semantic content.
  • It provides insight into how Google values the social web: as data – a means to an end; not as an end in and of itself.

What Does Big Data Mean Anyway?

Generally speaking, big data refers to the processing and analysis of large data sets to support better real world decision-making. Here is what makes big data both timely and unique:

  • Increased data generated by individuals via social media platforms like Facebook and Twitter.
  • Migration of data from local storage to the cloud, where it can be more easily shared and analyzed in aggregate.
  • A growing acceptance of openness and transparency, resulting in increased access to institutional and organizational data.
  • Increased potential for analyzing distinct data, resulting from the growing adoption of data standards.
  • Increased access to infrastructure needed to process large datasets.

Why Does Big Data Matter for SEO?

Big data has always been relevant to search marketers because Google is the original big data company. They have become the institution they are today by analyzing enormous sets of data, making automated inferences, and providing intelligence back to consumers. By studying Google’s methodology and applying their findings, search professionals have been intimately involved with big data for quite some time.

Google is a big data company. The future of search is going to be informed by the field of big data. This is important to understand.

Framed in this context, what can search professionals expect?

The Continued Evolution of Content Into Data

Content is published information. For a search engine to make sense of content, a lot of work has to be done. Data, on the other hand, makes analysis easier for search engines and brings them one step closer to an eventual answer.

Already, Google has begun using semantic information to analyze content and structure search results through rich snippets.


If they haven’t already, search marketers should begin familiarizing themselves with the various schemas that search engines have begun adopting and begin thinking about content in semantic terms.

Think Like An Editor – Not a Machine

There is debate over how Panda actually works but little debate over its purpose: to promote quality. As Google continues to tap into big data, they will make increasing use of the vast repository of user-driven data amassed by their portfolio. Basing your strategy around adding a Google+ widget to your site will be the same as doing nothing. Between Chrome, Android, DoubleClick and more, Google will be able to make far more accurate inferences to guide the answers they serve to search users.

As primary user data becomes more impactful, the onsite optimization playbooks that we’ve carried for years will become dead weight. The journalists and editors who once made a living by engaging and keeping readers will once again become relevant.

In an ironic twist, big data will make search less about machines and more about people.

Big Will Be the New Small

As the pendulum shifts from content to data, the logic that once guided our thinking about search will change and our hesitation to limit what we show Google will fade.

To help illustrate this point, let’s say a website contains information on more than 1.8 million businesses. A large percentage of the profile pages that have been set up for these businesses would qualify as original content; however the total volume of information available on these pages is limited.

By leveraging a local information API (e.g. Yelp, Google Places, CityGrid), this site could append a large percentage of those profile pages with review content that would significantly enhance the user experience while increasing the percentage of duplicate content that existed throughout the site.


Based on user-focused metrics like time-on-page and bounce rate, initial testing seems to indicate that users far prefer the version on the right. Yet the decision to layer in duplicate content seems to contradict what we have come to believe about how to structure content.

This doesn’t mean that content spamming or link farming will somehow become acceptable (it won’t). It means we should spend less time worrying about how to sculpt PageRank and hide duplicate content and more time thinking about how to maximize the user experience and structure the data held within the site properly.

Big data has the potential to change everything in our industry (again). If we wanted something safe and boring, we probably wouldn’t be in this field in the first place.