Thursday, November 03, 2005

Google's application 20050240580, Personalization of placed content ordering in search results

The first claim states: A method of personalizing placed content, comprising: determining an interest of a user; accessing a user profile associated with the user; identifying a set of placed content that matches the interest of the user; and ordering the set of placed content in accordance with the user profile.

Note that this application is a continuation-in-part: This application is a continuation-in-part of U.S. patent application Ser. No. 10/676,711, filed Sep. 30, 2003, which application is incorporated by reference herein in its entirety.

The '580 application discusses PageRank:

A more detailed description of the PageRank algorithm can be found in the article "The Anatomy of a Large-Scale Hypertextual Search Engine" by S. Brin and L. Page, International World Wide Web Conference, Brisbane, Australia and U.S. Pat. No. 6,285,999, both of which are hereby incorporated by reference as background information.

[0005] An important assumption in the PageRank algorithm is that there is a "random surfer" who starts his web surfing journey at a randomly picked web page and keeps clicking on the links embedded in the web pages, never hitting the "back" button. Eventually, when this random surfer gets bored of the journey, he may re-start a new journey by randomly picking another web page. The probability that the random surfer visits (i.e., views or downloads) a web page depends on the web page's page rank.

[0006] From an end user's perspective, a search engine using the PageRank algorithm treats a search query the same way no matter who submits the query, because the search engine does not ask the user to provide any information that can uniquely identify the user. The only factor that affects the search results is the search query itself, e.g., how many terms are in the query and in what order. The search results are a best fit for the interest of an abstract user, the "random surfer", and they are not be adjusted to fit a specific user's preferences or interests.

[0007] In reality, a user like the random surfer never exists. Every user has his own preferences when he submits a query to a search engine. The quality of the search results returned by the engine has to be evaluated by its users' satisfaction. When a user's preferences can be well defined by the query itself, or when the user's preference is similar to the random surfer's preference with respect to a specific query, the user is more likely to be satisfied with the search results. However, if the user's preference is significantly biased by some personal factors that are not clearly reflected in a search query itself, or if the user's preference is quite different from the random user's preference, the search results from the same search engine may be less useful to the user, if not useless.

This particular text does not address the concerns I had in my search

+"patent reform" +2795

wherein the ranking of search results changed in the course of 12 hours (as well as over longer periods) even though this change reflected merely a re-ordering of old web pages (not the introduction of new ones).

Google is right when it says: The probability that the random surfer visits (i.e., views or downloads) a web page depends on the web page's page rank. and if Google is changing the page rank (arbitrarily) Google is changing the probability of visitation by the random surfer.

from SearchEngineJournal:

Google Patent : Organic Results Ranked by User Profiling

Google has filed for an organic search patent which is a bit different than what we’re used to seeing in Google search, and perhaps something we may expect from their AdWords or AdSense divisions. Google has applied for a patent, termed Personalization of placed content ordering in search results, to serve organic search results based on user profiles. Google has also applied for a similar behavioral targeting patent for its advertising network, but this seems to be a first from Google with plans to integrate user profiling into natural search results.

Such profiles are created by Google and gathered from previous queries, web navigation behavior via tracked links and possibly sites visited which serve Google ads, computers with Google Applications installed such as Desktop Search, Google Wi-fi Connection or Sidebar, and personal information which Google identifies which may be “implicitly or explicitly provided by the user.”

This new ranking system, which is a spin off of PageRank and the current Google ranking algorithm, could be referred to as Profile Rank. What is the difference between this new ranking system and Google Personalized Search? Personalized Search was beta tested by Google users who have opted in to Google profile building while the new Profile Rank is based upon user profiles built by tracking a users web habits in and outside of Google Search, even if the user has not opted in to be served personalized results or is a registered Google Account member.

In the patent application Google explains that when a search engine generates search results in response to a search query, a listed site which satisfies the query is assigned a query score, QueryScore, in accordance with the search query. This query score is then modulated by the site’s PageRank, to generate a generic score, GenericScore, that is expressed as : GenericScore=QueryScore*PageRank.

However, Google states that the GenericScore system may not be relevant enough and proposes a more in depth Profile Rank (PersonalizedScore) : This GenericScore may not appropriately reflect the site’s importance to a particular user if the user’s interests or preferences are dramatically different from that of the random surfer. The relevance of a site to user can be accurately characterized by a set of profile ranks, based on the correlation between a sites content and the user’s term-based profile, herein called the TermScore, the correlation between one or more categories associated with a site and user’s category-based profile, herein called the CategoryScore, and the correlation between the URL and/or host of the site and user’s link-based profile, herein called the LinkScore. Therefore, the site may be assigned a personalized rank that is a function of both the document’s generic score and the user profile scores. This personalized score can be expressed as: PersonalizedScore=GenericScore*(TermScore+CategoryScore+LinkScore).

Google gives an example of a listing based upon user profiling mixed with information given by the user : a user may choose to offer personal information, including demographic and geographic information associated with the user, such as the user’s age or age range, educational level or range, income level or range, language preferences, marital status, geographic location (e.g., the city, state and country in which the user resides, and possibly also including additional information such as street address, zip code, and telephone area code), cultural background or preferences, or any subset of these.

Compared with other types of personal information such as a user’s favorite sports or movies that are often time varying, this personal information is more static and more difficult to infer from the user’s search queries and search results, but may be crucial in correctly interpreting certain queries submitted by the user.

For example, if a user submits a query containing “Japanese restaurant", it is very likely that he may be searching for a local Japanese restaurant for dinner. Without knowing the user’s geographical location, it is hard to order the search results so as to bring to the top those items that are most relevant to the user’s true intention. In certain cases, however, it is possible to infer this information. For example, users often select results associated with a specific region corresponding to where they live.

What about shared machines? If the one computer is shared by various users with different web behavior, how is Google to define Profile Rank in its organic search results? Google has thought this though :
Sometimes, multiple users may share a machine, e.g., in a public library. These users may have different interests and preferences. In one embodiment, a user may explicitly login to the service so the system knows his identity. Alternatively, different users can be automatically recognized based on the items they access or other characteristics of their access patterns. For example, different users may move the mouse in different ways, type differently, and use different applications and features of those applications. Based on a corpus of events on a client and/or server, it is possible to create a model for identifying users, and for then using that identification to select an appropriate “user” profile. In such circumstances, the “user” may actually be a group of people having somewhat similar computer usage patterns, interests and the like.

Users identified by the way they move a mouse or typing style? Amazing.

The patent, Personalization of placed content ordering in search results, is pretty detailed and deep. I suggest running over it a couple of times, printing it out and breaking out the highlite marker from college because there is a lot to it and a handfull of clues as to the future of Google and its ranking system.


Post a Comment

<< Home