I’ve been doing a bit of digging on how Google News ranks their results and figured that the World Intellectual Property Web site was as good a place to start as any. According to the patent description, “In accordance with one implementation consistent with the principles of the invention, a method for ranking results is provided. The method includes receiving a list of links, identifying, for each of the links, a source with which the link is associated, and ranking the list of links based at least in part on a quality of the identified sources.”
Quality is correlated to the following variables:
- Number of articles produced by the news source during a given time period;
- average length of an article from the news source;
- the importance of coverage from the news source;
- a breaking news score;
- usage pattern;
- human opinion;
- circulation statistics;
- the size of the staff associated with the news source;
- the number of news bureaus associated with the news source;
- the number of original named entities the source news produces within a cluster of articles;
- the breath of coverage;
- international diversity;
- writing style;
Some of these definitely were a bit ambiguous and although it hurt my brain, I’ll attempt to clarify some of these:
- Importance of coverage – Google will rank the total size of a story based on how many articles were published pertaining to that story. A source can be ranked based on how many stories it had that were considered ‘important’ stories.
- Usage patterns can be roughly equated to click-throughs: if a source has stories that are often clicked through it will be viewed as more relevant.
- Human opinion – Google cites a number of methodologies for assessing the human opinion of a news source, including primary research through polling or focus groups; secondary research; and age of the news source as an indicator of public confidence.
- Number of original named entities – Okay, this is a sort of wierd one. Google wants to provide a favorable ranking for sources that produce original reporting on something topical, so within a cluster of documents, they are going to look for the source that provides unique named entities – corresponding to a person, place, or organization.
- International diversity refers to the number of countries from which the news site receives network traffic.
- Writing style can be evaluated according to automated tests for measuring “correctness, grammar, and reading levels.”
All this substantiates to me that Google is simply far and away the smartest search engine out there.