**Scaling Personalized Web Search**

### Glen Jeh

### [email protected]

### Jennifer Widom [email protected]

**Abstract**

Recent web search techniques augment traditional text matching with a global notion of

*“importance” based on the linkage structure of the web, such as in Google’s PageRank algo-*
rithm. For more refined searches, this global notion of importance can be specialized to create
personalized views of importance—for example, importance scores can be biased according
to a user-specified set of initially-interesting pages. Computing and storing all possible per-
sonalized views in advance is impractical, as is computing personalized views at query time,
since the computation of each view requires an iterative computation over the web graph. We
present new graph-theoretical results, and a new technique based on these results, that encode
*personalized views as partial vectors. Partial vectors are shared across multiple personalized*
views, and their computation and storage costs scale well with the number of views. Our ap-
proach enables incremental computation, so that the construction of personalized views from
partial vectors is practical at query time. We present efficient dynamic programming algo-
rithms for computing partial vectors, an algorithm for constructing personalized views from
partial vectors, and experimental results demonstrating the effectiveness and scalability of our
techniques.

**1** **Introduction and Motivation**

General web search is performed predominantly through text queries to search engines. Because of
the enormous size of the web, text alone is usually not selective enough to limit the number of query
*results to a manageable size. The PageRank algorithm [10], among others [8], has been proposed*
*(and implemented in Google [1]) to exploit the linkage structure of the web to compute global*

“importance” scores that can be used to influence the ranking of search results. To encompass different notions of importance for different users and queries, the basic PageRank algorithm can be modified to create “personalized views” of the web, redefining importance according to user preference. For example, a user may wish to specify his bookmarks as a set of preferred pages, so that any query results that are important with respect to his bookmarked pages would be ranked higher. While experimentation with the use of personalized PageRank has shown its utility and

promise [5, 10], the size of the web makes its practical realization extremely difficult. To see why, let us review the intuition behind the PageRank algorithm and its extension for personalization.

The fundamental motivation underlying PageRank is the recursive notion that important pages
are those linked-to by many important pages. A page with only two in-links, for example, may
seem unlikely to be an important page, but it may be important if the two referencing pages are
*Yahoo! and Netscape, which themselves are important pages because they have numerous in-links.*

One way to formalize this recursive notion is to use the “random surfer” model introduced in [10].

*Imagine that trillions of random surfers are browsing the web: if at a certain time step a surfer is*
looking at pagep, at the next time step he looks at a random out-neighbor of p. As time goes on,
the expected percentage of surfers at each page pconverges (under certain conditions) to a limit
r(p)that is independent of the distribution of starting points. Intuitively, this limit is the PageRank
ofp, and is taken to be an importance score forp, since it reflects the number of people expected
to be looking atpat any one time.

The PageRank score r(p) reflects a “democratic” importance that has no preference for any
particular pages. In reality, a user may have a setP of preferred pages (such as his bookmarks)
which he considers more interesting. We can account for preferred pages in the random surfer
model by introducing a “teleportation” probabilityc: at each step, a surfer jumps back to a random
page in P with probability c, and with probability 1−ccontinues forth along a hyperlink. The
limit distribution of surfers in this model would favor pages in P, pages linked-to by P, pages
*linked-to in turn, etc. We represent this distribution as a personalized PageRank vector (PPV)*
personalized on the setP. Informally, a PPV is a personalized view of the importance of pages on
the web. Rankings of a user’s text-based query results can be biased according to a PPV instead of
the global importance distribution.

Each PPV is of length n, where n is the number of pages on the web. Computing a PPV
naively using a fixed-point iteration requires multiple scans of the web graph [10], which makes
it impossible to carry out online in response to a user query. On the other hand, PPV’s for all
preference sets, of which there are 2^{n}, is far too large to compute and store offline. We present
a method for encoding PPV’s as partially-computed, shared vectors that are practical to compute
and store offline, and from which PPV’s can be computed quickly at query time.

In our approach we restrict preference sets P *to subsets of a set of hub pages* H, selected as
those of greater interest for personalization. In practice, we expect H to be a set of pages with
*high PageRank (“important pages”), pages in a human-constructed directory such as Yahoo! or*
*Open Directory [2], or pages important to a particular enterprise or application. The size of* H
can be thought of as the available degree of personalization. We present algorithms that, unlike
previous work [5, 10], scale well with the size ofH. Moreover, the same techniques we introduce
can yield approximations on the much broader set of all PPV’s, allowing at least some level of
personalization on arbitrary preference sets.

The main contributions of this paper are as follows.

• *A method, based on new graph-theoretical results (listed next), of encoding PPV’s as partial*
*quantities, enabling an efficient, scalable computation that can be divided between precom-*
putation time and query time, in a customized fashion according to available resources and
application requirements.

• *Three main theorems: The Linearity Theorem allows every PPV to be represented as a linear*
*combination of basis vectors, yielding a natural way to construct PPV’s from shared compo-*
*nents. The Hubs Theorem allows basis vectors to be encoded as partial vectors and a hubs*
*skeleton, enabling basis vectors themselves to be constructed from common components.*

*The Decomposition Theorem establishes a linear relationship among basis vectors, which is*
exploited to minimize redundant computation.

• Several algorithms for computing basis vectors, specializations of these algorithms for com- puting partial vectors and the hubs skeleton, and an algorithm for constructing PPV’s from partial vectors using the hubs skeleton.

• Experimental results on real web data demonstrating the effectiveness and scalability of our techniques.

In Section 2 we introduce the notation used in this paper and formalize personalized PageRank mathematically. Section 3 presents basis vectors, the first step towards encoding PPV’s as shared components. The full encoding is presented in Section 4. Section 5 discusses the computation of partial quantities. Experimental results are presented in Section 6. Related work is discussed in Section 7. Section 8 summarizes the contributions of this paper. Additional material, primarily proofs of theorems, appears in a set of appendices.

**2** **Preliminaries**

LetG= (V, E)*denote the web graph, where*V is the set of all web pages andEcontains a directed
edge hp, qi iff page p links to page q. For a page p, we denote byI(p) and O(p) the set of in-
neighbors and out-neighbors ofp, respectively. Individual in-neighbors are denoted asI_{i}(p)(1 ≤
i ≤ |I(p)|), and individual out-neighbors are denoted analogously. For convenience, pages are
numbered from1ton, and we refer to a pagepand its associated numberiinterchangeably. For a
vectorv,v(p)*denotes entry*p, thep-th component ofv. We always typeset vectors in boldface and
scalars (e.g.,v(p)) in normal font. All vectors in this paper aren-dimensional and have nonnegative
*entries. They should be thought of as distributions rather than arrows. The magnitude of a vector*v
is defined to bePn

i=1v(i)and is written|v|. In this paper, vector magnitudes are always in[0,1].

In an implementation, a vector may be represented as a list of its nonzero entries, so another useful
*measure is the size of*v, the number of nonzero entries inv.

We generalize the preference set P *discussed in Section 1 to a preference vector* u, where

|u| = 1andu(p)denotes the amount of preference for pagep. For example, a user who wants to
personalize on his bookmarked pagesP uniformly would have auwhereu(p) = _{|P}^{1}_{|}ifp∈P, and
u(p) = 0ifp /∈ P. We formalize personalized PageRank scoring using matrix-vector equations.

Let A be the matrix corresponding to the web graph G, where A_{ij} = _{|O(j)|}^{1} if page j links to
pagei, and A_{ij} = 0 otherwise. For simplicity of presentation, we assume that every page has at
least one out-neighbor, as can be enforced by adding self-links to pages without out-links. The
resulting scores can be adjusted to account for the (minor) effects of this modification, as specified
in Appendix C.2.

For a givenu, the personalized PageRank equation can be written as

v = (1−c)Av+cu (1)

wherec ∈ (0,1)is the “teleportation” constant discussed in Section 1. Typicallyc ≈ 0.15, and
experiments have shown that small changes inchave little effect in practice [10]. A solutionv to
equation (1) is a steady-state distribution of random surfers under the model discussed in Section
1, where at each step a surfer teleports to pagepwith probabilityc·u(p), or moves to a random out-
neighbor otherwise [10]. By a theorem of Markov Theory, a solutionvwith|v|= 1always exists
and is unique [9].^{1} The solutionv*is the personalized PageRank vector (PPV) for preference vector*
u. Ifuis the uniform distribution vectoru= [1/n, . . . ,1/n], then the corresponding solutionvis
*the global PageRank vector [10], which gives no preference to any pages.*

For the reader’s convenience, Table 1 on the next page lists terminology that will be used extensively in the coming sections.

**3** **Basis Vectors**

We present the first step towards encoding PPV’s as shared components. The motivation behind
the encoding is a simple observation about the linearity^{2} of PPV’s, formalized by the following
theorem.

* Theorem (Linearity). For any preference vectors*u

_{1}

*and*u

_{2}

*, if*v

_{1}

*and*v

_{2}

*are the two correspond-*

*ing PPV’s, then for any constants*α

_{1}, α

_{2}≥0

*such that*α

_{1}+α

_{2}= 1,

α_{1}v_{1}+α_{2}v_{2} = (1−c)A(α_{1}v_{1}+α_{2}v_{2}) +c(α_{1}u_{1}+α_{2}u_{2}) (2)

1Specifically,v*corresponds to the steady-state distribution of an ergodic, aperiodic Markov chain.*

2More precisely, the transformation from personalization vectorsuto their corresponding solution vectorsv is linear.

**Term** **Description** **Section**

Hub SetH A subset of web pages. 1

Preference SetP Set of pages on which to personalize 1

(restricted in this paper to subsets ofH).

Preference Vectoru Preference set with weights. 2

Personalized PageRank Vector Importance distribution induced by a preference vector. 2 (PPV)

Basis Vectorrp(orr_{i}) PPV for a preference vector with a single nonzero entry 3
atp(ori).

Hub Vectorr_{p} Basis vector for a hub pagep∈H. 3

Partial Vector(r_{p}−r^{H}_{p} ) Used with the hubs skeleton to construct a hub vector. 4.2
Hubs SkeletonS Used with partial vectors to construct a hub vector. 4.3
Web Skeleton Extension of the hubs skeleton to include pages not inH. 4.4.3
Partial Quantities Partial vectors and the hubs, web skeletons.

Intermediate Results Maintained during iterative computations. 5.2

Table 1: Summary of terms.

Informally, the Linearity Theorem says that the solution to a linear combination of preference
vectors u_{1} and u_{2} is the same linear combination of the corresponding PPV’s v_{1} and v_{2}. The
proof is in Appendix A.

Letx_{1}, . . . ,x_{n}be the unit vectors in each dimension, so that for eachi,x_{i}has value1at entry
iand 0everywhere else. Let r_{i} be the PPV corresponding to x_{i}*. Each basis vector* r_{i} gives the
distribution of random surfers under the model that at each step, surfers teleport back to page i
with probability c. It can be thought of as representing pagei’s view of the web, where entry j
ofr_{i} isj’s importance ini’s view. Note that the global PageRank vector is ^{1}_{n}(r_{1} +· · ·+r_{n}), the
average of every page’s view.

An arbitrary personalization vectorucan be written as a weighted sum of the unit vectorsx_{i}:
u=

n

X

i=1

α_{i}x_{i} (3)

for some constantsα_{1}, . . . , α_{n}. By the Linearity Theorem,
v=

n

X

i=1

αiri (4)

is the corresponding PPV, expressed as a linear combination of the basis vectorsri.

Recall from Section 1 that preference sets (now preference vectors) are restricted to subsets

computed and stored, then any PPV corresponding to a preference set P of sizek (a preference
vector withk nonzero entries) can be computed by adding up thek corresponding hub vectorsr_{p}
with the appropriate weightsα_{p}.

Each hub vector can be computed naively using the fixed-point computation in [10]. However, each fixed-point computation is expensive, requiring multiple scans of the web graph, and the computation time (as well as storage cost) grows linearly with the number of hub vectors|H|. In the next section, we enable a more scalable computation by constructing hub vectors from shared components.

**4** **Decomposition of Basis Vectors**

In Section 3 we represented PPV’s as a linear combination of |H| hub vectors r_{p}, one for each
p ∈ H. Any PPV based on hub pages can be constructed quickly from the set of precomputed
hub vectors, but computing and storing all hub vectors is impractical. To compute a large number
*of hub vectors efficiently, we further decompose them into partial vectors and the hubs skeleton,*
components from which hub vectors can be constructed quickly at query time. The representation
of hub vectors as partial vectors and the hubs skeleton saves both computation time and storage due
to sharing of components among hub vectors. Note, however, that depending on available resources
and application requirements, hub vectors can be constructed offline as well. Thus “query time”

can be thought of more generally as “construction time”.

We compute one partial vector for each hub page p, which essentially encodes the part of the
hub vector r_{p} unique to p, so that components shared among hub vectors are not computed and
stored redundantly. The complement to the partial vectors is the hubs skeleton, which succinctly
captures the interrelationships among hub vectors. It is the “blueprint” by which partial vectors are
assembled to form a hub vector, as we will see in Section 4.3.

The mathematical tools used in the formalization of this decomposition are presented next.^{3}

**4.1** **Inverse P-distance**

To formalize the relationship among hub vectors, we relate the personalized PageRank scores
*represented by PPV’s to inverse P-distances in the web graph, a concept based on expected-f*
*distances as introduced in [7].*

3Note that while the mathematics and computation strategies in this paper are presented in the specific context of the web graph, they are general graph-theoretical results that may be applicable in other scenarios involving stochastic processes, of which PageRank is one example.

Letp, q ∈V*. We define the inverse P-distance*r_{p}^{0}(q)fromptoqas
r_{p}^{0}(q) = X

t:p q

P[t]c(1−c)^{l(t)} (5)

*where the summation is taken over all tours* t (paths that may contain cycles) starting at p and
ending atq, possibly touchingporqmultiple times. For a tourt =hw_{1}, . . . , w_{k}i, the lengthl(t)is
k−1, the number of edges int. The termP[t], which should be interpreted as “the probability of
travelingt”, is defined asQk−1

i=1 1

|O(w_{i})|, or1ifl(t) = 0. If there is no tour fromptoq, the summation
is taken to be0.^{4} Note thatr^{0}_{p}(q)measures distances inversely: it is higher for nodesq “closer” to
p. As suggested by the notation and proven in Appendix C,r^{0}_{p}(q) = r_{p}(q)for allp, q ∈ V, so we
will use r_{p}(q) to denote both the inverse P-distance and the personalized PageRank score. Thus
PageRank scores can be viewed as an inverse measure of distance.

LetH ⊆V be some nonempty set of pages. For p, q ∈V, we definer_{p}^{H}(q)as a restriction of
r_{p}(q) that considers only tours which pass through some pageh ∈ H in equation (5). That is, a
pageh∈H must occur ontsomewhere other than the endpoints. Precisely,r^{H}_{p} (q)is written as

r_{p}^{H}(q) = X

t:p H q

P[t]c(1−c)^{l(t)} (6)

where the notationt : p H q reminds us thattpasses through some page inH. Note thatt must be of length at least2. In this paper,H is always the set of hub pages, andpis usually a hub page (until we discuss the web skeleton in Section 4.4.3).

**4.2** **Partial Vectors**

Intuitively, r_{p}^{H}(q), defined in equation (6), is the influence of ponq throughH. In particular, if
all paths fromptoq pass through a page inH, thenH separatespandq, andr^{H}_{p} (q) =r_{p}(q). For
well-chosen setsH (discussed in Section 4.4.2), it will be true thatr_{p}(q)−r^{H}_{p} (q) = 0 for many
pagesp, q. Our strategy is to take advantage of this property by breakingr_{p}into two components:

(r_{p}−r_{p}^{H})andr_{p}^{H}, using the equation

r_{p}= (r_{p}−r_{p}^{H}) +r^{H}_{p} (7)
*We first precompute and store the partial vector*(rp−r^{H}_{p} )instead of the full hub vectorrp. Partial
vectors are cheaper to compute and store than full hub vectors, assuming they are represented as a
list of their nonzero entries. Moreover, the size of each partial vector decreases as|H|increases,
making this approach particularly scalable. We then addr_{p}^{H} back at query time to compute the full
hub vector. However, computing and storingr_{p}^{H} explicitly could be as expensive as rp itself. In
the next section we show how to encoder_{p}^{H} so it can be computed and stored efficiently.

4The definition here of inverse P-distance differs slightly from the concept of expected-f distance in [7], where tours are not allowed to visitqmultiple times. Note that general expected-f distances have the formP

tP[t]f(l(t));

**4.3** **Hubs Skeleton**

Let us briefly review where we are: In Section 3 we represented PPV’s as linear combinations
of hub vectorsrp, one for each p ∈ H, so that we can construct PPV’s quickly at query time if
we have precomputed the hub vectors, a relatively small subset of PPV’s. To encode hub vectors
efficiently, in Section 4.2 we said that instead of full hub vectors rp, we first compute and store
only partial vectors(rp−r^{H}_{p} ), which intuitively account only for paths that do not pass through a
page ofH (i.e., the distribution is “blocked” byH). Computing and storing the difference vector
r_{p}^{H} efficiently is the topic of this section.

It turns out that the vectorr^{H}_{p} can be be expressed in terms of the partial vectors(rh−r_{h}^{H}),
forh∈H, as shown by the following theorem. Recall from Section 3 thatxhhas value1athand
0everywhere else.

* Theorem (Hubs). For any*p∈V

*,*H ⊆V

*,*r

_{p}

^{H}= 1

c X

h∈H

(r_{p}(h)−cx_{p}(h)) r_{h}−r_{h}^{H} −cx_{h}

(8)
In terms of inverse P-distances (Section 4.1), the Hubs Theorem says roughly that the distance
from pagepto any pageq ∈ V throughH is the distancer_{p}(h)frompto eachh ∈ H times the
distancer_{h}(q)fromhto q, correcting for the paths among hubs byr^{H}_{h}(q). The terms cx_{p}(h)and
cx_{h}deal with the special cases whenporqis itself inH. The proof, which is quite involved, is in
Appendix D.

The quantity r_{h}−r_{h}^{H}

appearing on the right-hand side of (8) is exactly the partial vectors
discussed in Section 4.2. Suppose we have computed r_{p}(H) = {(h, r_{p}(h))|h ∈ H} for a hub
pagep. Substituting the Hubs Theorem into equation 7, we have the following Hubs Equation for
constructing the hub vectorr_{p}from partial vectors:

rp= (rp−r^{H}_{p} ) + 1
c

X

h∈H

(rp(h)−cxp(h))

rh−r_{h}^{H}

−cxh

(9) This equation is central to the construction of hub vectors from partial vectors.

The setr_{p}(H)has size at most|H|, much smaller than the full hub vectorr_{p}, which can have
up tonnonzero entries. Furthermore, the contribution of each entryrp(h)to the sum is no greater
thanrp(h)(and usually much smaller), so that small values ofrp(h)can be omitted with minimal
loss of precision (Section 6). The setS = {rp(H)|p ∈ H}*forms the hubs skeleton, giving the*
interrelationships among partial vectors.

An intuitive view of the encoding and construction suggested by the Hubs Equation (9) is
shown in Figure 1. At the top, each partial vector(rh−r_{h}^{H}), including(rp−r_{p}^{H}), is depicted as
a notched triangle labeledhat the tip. The triangle can be thought of as representing paths starting
ath, although, more accurately, it represents the distribution of importance scores computed based

h_{1}

h2

h_{3}

h_{4}
h2

h_{1} h_{3}

h_{4}
h_{5}

h_{5}

Hub Vector p

p p

Hubs Skeleton Partial Vectors

0.03 0.16

0.06

0.003

0.001 0.0002

### +

### =

Figure 1: Intuitive view of the construction of hub vectors from partial vectors and the hubs skele- ton.

on the paths, as discussed in Section 4.1. A notch in the triangle shows where the computation of
a partial vector “stopped” at another hub page. At the center, a part r_{p}(H)of the hubs skeleton
is depicted as a tree so the “assembly” of the hub vector can be visualized. The hub vector is
constructed by logically assembling the partial vectors using the corresponding weights in the
hubs skeleton, as shown at the bottom.

**4.4** **Discussion**

**4.4.1** **Summary**

In summary, hub vectors are building blocks for PPV’s corresponding to preference vectors based on hub pages. Partial vectors, together with the hubs skeleton, are building blocks for hub vectors.

Transitively, partial vectors and the hubs skeleton are building blocks for PPV’s: they can be used to construct PPV’s without first materializing hub vectors as an intermediate step (Section 5.4).

Note that for preference vectors based on multiple hub pages, constructing the corresponding PPV from partial vectors directly can result in significant savings versus constructing from hub vectors, since partial vectors are shared across multiple hub vectors.

**4.4.2** **Choice of**H

So far we have made no assumptions about the set of hub pagesH. Not surprisingly, the choice of hub pages can have a significant impact on performance, depending on the location of hub pages within the overall graph structure. In particular, the size of partial vectors is smaller when pages in H have higher PageRank, since high-PageRank pages are on average close to other pages in terms of inverse P-distance (Section 4.1), and the size of the partial vectors is related to the inverse P-distance between hub pages and other pages according to the Hubs Theorem. Our intuition is that high-PageRank pages are generally more interesting for personalization anyway, but in cases where the intended hub pages do not have high PageRank, it may be beneficial to include some high-PageRank pages inH to improve performance. We ran experiments confirming that the size of partial vectors is much smaller using high-PageRank pages as hubs than using random pages.

**4.4.3** **Web Skeleton**

The techniques used in the construction of hub vectors can be extended to enable at least approxi-
mate personalization on arbitrary preference vectors that are not necessarily based onH. Suppose
we want to personalize on a pagep /∈ H. The Hubs Equation can be used to constructr_{p}^{H} from
partial vectors, given that we have computedr_{p}(H). As discussed in Section 4.3, the cost of com-
puting and storingr_{p}(H)is orders of magnitude less thanr_{p}. Thoughr^{H}_{p} is only an approximation
tor_{p}, it may still capture significant personalization information for a properly-chosen hub setH,
asr^{H}_{p} can be thought of as a “projection” ofr_{p} ontoH. For example, ifH contains pages from
*Open Directory,*r^{H}_{p} can capture information about the broad topic ofr_{p}. Exploring the utility of
*the web skeleton*W ={r_{p}(H)|p∈V}is an area of future work.

**5** **Computation**

In Section 4 we presented a way to construct hub vectors from partial vectors (r_{p} − r_{p}^{H}), for
p ∈ H, and the hubs skeleton S = {r_{p}(H)|p ∈ H}. We also discussed the web skeleton
W = {r_{p}(H)|p ∈ V}. Computing these partial quantities naively using a fixed-point itera-
tion [10] for eachpwould scale poorly with the number of hub pages. Here we present scalable
algorithms that compute these quantities efficiently by using dynamic programming to leverage
the interrelationships among them. We also show how PPV’s can be constructed from partial vec-
tors and the hubs skeleton at query time. All of our algorithms have the property that they can
be stopped at any time (e.g., when resources are depleted), so that the current “best results” can
be used as an approximation, or the computation can be resumed later for increased precision if
resources permit.

We begin in Section 5.1 by presenting a theorem underlying all of the algorithms presented (as

well as the connection between PageRank and inverse P-distance, as shown in Appendix C). In Section 5.2, we present three algorithms, based on this theorem, for computing general basis vec- tors. The algorithms in Section 5.2 are not meant to be deployed, but are used as foundations for the algorithms in Section 5.3 for computing partial quantities. Section 5.4 discusses the construction of PPV’s from partial vectors and the hubs skeleton.

**5.1** **Decomposition Theorem**

Recall the random surfer model of Section 1, instantiated for preference vectoru =x_{p}(for page
p’s view of the web). At each step, a surfers teleports to pagepwith some probabilityc. Ifs is
atp, then at the next step,swith probability1−cwill be at a random out-neighbor ofp. That is,
a fraction (1−c)_{|O(p)|}^{1} of the time, surfers will be at any given out-neighbor ofpone step after
teleporting top. This behavior is strikingly similar to the model instantiated for preference vector
u^{0} = _{|O(p)|}^{1} P|O(p)|

i=1 x_{O}_{i}_{(p)}, where surfers teleport directly to each O_{i}(p) with equal probability

1

|O(p)|. The similarity is formalized by the following theorem.

* Theorem (Decomposition). For any*p∈V

*,*

r_{p}= (1−c)

|O(p)|

|O(p)|

X

i=1

r_{O}_{i}_{(p)}+cx_{p} (10)
The Decomposition Theorem says that the basis vectorr_{p} forpis an average of the basis vectors
r_{O}_{i}_{(p)}for its out-neighbors, plus a compensation factorcx_{p}. The proof is in Appendix B.

The Decomposition Theorem gives another way to think about PPV’s. It says thatp’s view of
the web (r_{p}) is the average of the views of its out-neighbors, but with extra importance given to
pitself. That is, pages important inp’s view are eitherpitself, or pages important in the view of
p’s out-neighbors, which are themselves “endorsed” byp. In fact, this recursive intuition yields
an equivalent way of formalizing personalized PageRank scoring: basis vectors can be defined as
vectors satisfying the Decomposition Theorem.

While the Decomposition Theorem identifies relationships among basis vectors, a division
of the computation of a basis vector r_{p} into related subproblems for dynamic programming is
not inherent in the relationships. For example, it is possible to compute some basis vectors first
and then to compute the rest using the former as solved subproblems. However, the presence of
cycles in the graph makes this approach ineffective. Instead, our approach is to consider as a
subproblem the computation of a vector to less precision. For example, having computedr_{O}_{i}_{(p)}to
a certain precision, we can use the Decomposition Theorem to combine ther_{O}_{i}_{(p)}’s to computer_{p}
to greater precision. This approach has the advantage that precision needs not be fixed in advance:

the process can be stopped at any time for the current best answer.

**5.2** **Algorithms for Computing Basis Vectors**

We present three algorithms in the general context of computing full basis vectors. These algo-
rithms are presented primarily to develop our algorithms for computing partial quantities, presented
in Section 5.3. All three algorithms are iterative fixed-point computations that maintain a set of
*intermediate results* (Dk[∗], Ek[∗]). For eachp, Dk[p]is a lower-approximation of rp on iter-
ation k, i.e., Dk[p](q) ≤ rp(q) for all q ∈ V. We build solutions Dk[p](k = 0,1,2, . . .) that
are successively better approximations to rp, and simultaneously compute the error components
Ek[p], whereEk[p]is the “projection” of the vector(rp−Dk[p])onto the (actual) basis vectors.

That is, we maintain the invariant that for allk≥0and allp∈V,
D_{k}[p]+X

q∈V

E_{k}[p](q)r_{q} =r_{p} (11)

Thus, D_{k}[p]is a lower-approximation ofr_{p}with error

P

q∈V E_{k}[p](q)r_{q}

=|E_{k}[p]|. We begin
withD_{0}[p] =0 andE_{0}[p] = x_{p}, so that logically, the approximation is initially0and the error
isr_{p}. To storeE_{k}[p]andD_{k}[p]efficiently, we can represent them in an implementation as a list
of their nonzero entries. While all three algorithms have in common the use of these intermediate
results, they differ in how they use the Decomposition Theorem to refine intermediate results on
successive iterations.

It is important to note that the algorithms presented in this section and their derivatives in Section 5.3 compute vectors to arbitrary precision; they are not approximations. In practice, the precision desired may vary depending on the application. Our focus is on algorithms that are efficient and scalable with the number of hub vectors, regardless of the precision to which vectors are computed.

**5.2.1** **Basic Dynamic Programming Algorithm**

*In the basic dynamic programming algorithm, a new basis vector for each page*pis computed on
each iteration using the vectors computed forp’s out-neighbors on the previous iteration, via the
Decomposition Theorem. On iteration k, we derive (D_{k+1}[p],E_{k+1}[p]) from (D_{k}[p],E_{k}[p])
using the equations:

D_{k+1}[p] = 1−c

|O(a)|

|O(p)|

X

i=1

D_{k}[O_{i}(p)]+cx_{p} (12)

Ek+1[p] = 1−c

|O(a)|

|O(p)|

X

i=1

Ek[Oi(p)] (13) A proof of the algorithm’s correctness is given in Appendix E, where the error|Ek[p]|is shown to be reduced by a factor of1−con each iteration.

Note that although the E_{k}[∗]values help us to see the correctness of the algorithm, they are
not used here in the computation of D_{k}[∗] and can be omitted in an implementation (although
they will be used to compute partial quantities in Section 5.3). The sizes of D_{k}[p] and E_{k}[p]

grow with the number of iterations, and in the limit they can be up to the size ofr_{p}, which is the
number of pages reachable fromp. Intermediate scores(D_{k}[∗],E_{k}[∗])will likely be much larger
than available main memory, and in an implementation(D_{k}[∗],E_{k}[∗])could be read off disk and
(D_{k+1}[∗],E_{k+1}[∗])written to disk on each iteration. When the data for one iteration has been
computed, data from the previous iteration may be deleted. Specific details of our implementation
are discussed in Section 6.

**5.2.2** **Selective Expansion Algorithm**

*The selective expansion algorithm is essentially a version of the naive algorithm that can readily*
be modified to compute partial vectors, as we will see in Section 5.3.1.

We derive (D_{k+1}[p], E_{k+1}[p]) by “distributing” the error at each page q (that is, E_{k}[p](q))
to its out-neighbors via the Decomposition Theorem. Precisely, we compute results on iteration-k
using the equations:

D_{k+1}[p]=D_{k}[p]+ X

q∈Q_{k}(p)

cE_{k}[p](q)x_{q} (14)

E_{k+1}[p]=E_{k}[p]− X

q∈Q_{k}(p)

E_{k}[p](q)x_{q}+ X

q∈Q_{k}(p)

1−c

|O(q)|

|O(q)|

X

i=1

E_{k}[p](q)x_{O}_{i}_{(q)} (15)

for a subset Q_{k}(p) ⊆ V. IfQ_{k}(p) = V for allk, then the error is reduced by a factor of 1−c
on each iteration, as in the basic dynamic programming algorithm. However, it is often useful to
choose a selected subset ofV asQ_{k}(p). For example, ifQ_{k}(p)contains them pagesqfor which
the errorE_{k}[p](q)*is highest, then this top-m* scheme limits the number of expansions and delays
the growth in size of the intermediate results while still reducing much of the error. In Section
5.3.1, we will compute the hub vectors by choosing Q_{k}(p) = H. The correctness of selective
expansion is proven in Appendix F.

**5.2.3** **Repeated Squaring Algorithm**

*The repeated squaring algorithm is similar to the selective expansion algorithm, except that instead*
of extending (D_{k+1}[∗], E_{k+1}[∗]) one step using equations (14) and (15), we compute what are

essentially iteration-2kresults using the equations
D_{2k}[p]=D_{k}[p]+ X

q∈Q_{k}(p)

E_{k}[p](q)D_{k}[q] (16)

E_{2k}[p]=E_{k}[p]− X

q∈Q_{k}(p)

E_{k}[p](q)x_{q}+ X

q∈Q_{k}(p)

E_{k}[p](q)E_{k}[q]

(17) where Qk(p) ⊆ V. For now we can assume that Qk(p) = V for all p; we will set Qk(p) = H to compute the hubs skeleton in Section 5.3.2. The correctness of these equations is proven in Appendix G, where it is shown that repeated squaring reduces the error much faster than the basic dynamic programming or selective expansion algorithms. If Qk(p) = V, the error is squared on each iteration, as equation (17) reduces to:

E_{2k}[p]=X

q∈V

E_{k}[p](q)E_{k}[q] (18)

As an alternative to takingQ_{k}(p) = V, we can also use the top-mscheme of Section 5.2.2.

Note that while all three algorithms presented can be used to compute the set of all basis
vectors, they differ in their requirements on the computation of other vectors when computing
r_{p}: the basic dynamic programming algorithm requires the vectors of out-neighbors of p to be
computed as well, repeated squaring requires results (D_{k}[q], E_{k}[q]) to be computed for q such
thatE_{k}[p](q)>0, and selective expansion computesr_{p}independently.

**5.3** **Computing Partial Quantities**

In Section 5.2 we presented iterative algorithms for computing full basis vectors to arbitrary preci- sion. Here we present modifications to these algorithms to compute the partial quantities:

• Partial vectors(r_{p}−r_{p}^{H}),p∈H.

• The hubs skeleton S = {r_{p}(H)|p ∈ H}(which can be computed more efficiently by itself
than as part of the entire web skeleton).

• The web skeletonW ={r_{p}(H)|p∈V}.

Each partial quantity can be computed in time no greater than its size, which is far less than the size of the hub vectors.

**5.3.1** **Partial Vectors**

Partial vectors can be computed using a simple specialization of the selective expansion algorithm
(Section 5.2.2): we take Q0(p) = V and Qk(p) = V −H for k > 0, for all p ∈ V. That is,
we never “expand” hub pages after the first step, so tours passing through a hub pageHare never
considered. Under this choice ofQk(p), Dk[p]+cEk[p]converges to(rp−r_{p}^{H})for allp∈ V.

Of course, only the intermediate results(D_{k}[p],E_{k}[p])forp ∈ Hshould be computed. A proof
is presented in Appendix H.

This algorithm makes it clear why using high-PageRank pages as hub pages improves perfor- mance: from a pagepwe expect to reach a high-PageRank pageqsooner than a random page, so the expansion frompwill stop sooner and result in a shorter partial vector.

**5.3.2** **Hubs Skeleton**

While the hubs skeleton is a subset of the complete web skeleton and can be computed as such using the technique to be presented in Section 5.3.3, it can be computed much faster by itself if we are not interested in the entire web skeleton, or if higher precision is desired for the hubs skeleton than can be computed for the entire web skeleton.

We use a specialization of the repeated squaring algorithm (Section 5.2.3) to compute the
hubs skeleton, using the intermediate results from the computation of partial vectors. Suppose
(D_{k}[p],E_{k}[p]), for k ≥ 1, have been computed by the algorithm of Section 5.3.1, so that
P

q /∈H E_{k}[p](q) < , for some error . We apply the repeated squaring algorithm on these re-
sults usingQ_{k}(p) = H for all successive iterations. As shown in Appendix I, afteriiterations of
repeated squaring, the total error|E_{i}[p]|is bounded by(1−c)^{2}^{i} +/c. Thus, by varyingkandi,
r_{p}(H)can be computed to arbitrary precision.

Notice that only the intermediate results(D_{k}[h], E_{k}[h])forh∈ Hare ever needed to update
scores for D_{k}[p], and of the former, only the entriesD_{k}[h](q), E_{k}[h](q), for q ∈ H, are used to
computeD_{k}[p](q). Since we are only interested in the hub scoresD_{k}[p](q), we can simply drop all
non-hub entries from the intermediate results. The running time and storage would then depend
only on the size of r_{p}(H) and not on the length of the entire hub vectors r_{p}. If the restricted
intermediate results fit in main memory, it is possible to defer the computation of the hubs skeleton
to query time.

**5.3.3** **Web Skeleton**

To compute the entire web skeleton, we modify the basic dynamic programming algorithm (Section
5.2.1) to compute only the hub scores r_{p}(H), with corresponding savings in time and memory
usage. We restrict the computation by eliminating entries q /∈ H from the intermediate results
(D_{k}[p],E_{k}[p]), similar to the technique used in computing the hubs skeleton.

The justification for this modification is that the hub scoreD_{k+1}[p](h)is affected only by the
hub scoresD_{k}[∗](h)of the previous iteration, so thatD_{k+1}[p](h)in the modified algorithm is equal
to that in the basic algorithm. Since|H|is likely to be orders of magnitude less thann, the size of
the intermediate results is reduced significantly.

**5.4** **Construction of PPV’s**

Finally, let us see how a PPV for preference vector u can be constructed directly from partial vectors and the hubs skeleton using the Hubs Equation. (Construction of a single hub vector is a specialization of the algorithm outlined here.) Letu =α1p1+· · ·+αzpz be a preference vector, wherepi ∈H for1≤i≤z. LetQ⊆H, and let

r_{u}(h) =

z

X

i=1

α_{i}(r_{p}_{i}(h)−cx_{p}_{i}(h)) (19)
which can be computed from the hubs skeleton. Then the PPVvforucan be constructed as

v =

z

X

i=1

α_{i}(r_{p}_{i}−r_{p}^{H}

i) + 1 c

X

h∈Q ru(h)>0

r_{u}(h)

(r_{h}−r_{h}^{H})−cx_{h}

(20)

Both the terms (r_{p}_{i}−r_{p}^{H}

i)and(r_{h}−r^{H}_{h})are partial vectors, which we assume have been pre-
computed. The termcx_{h}represents a simple subtraction from(r_{h}−r^{H}_{h}). IfQ = H, then(20)
represents a full construction of v. However, for some applications, it may suffice to use only
parts of the hubs skeleton to computev to less precision. For example, we can takeQto be the
m hubs h for whichr_{u}(h) is highest. Experimentation with this scheme is discussed in Section
6.3. Alternatively, the result can be improved incrementally (e.g., as time permits) by using a small
subsetQeach time and accumulating the results.

**6** **Experiments**

*We performed experiments using real web data from Stanford’s WebBase [6], a crawl of the web*
*containing 120 million pages. Since the iterative computation of PageRank is unaffected by leaf*
*pages (i.e., those with no out-neighbors), they can be removed from the graph and added back in*
after the computation [10]. After removing leaf pages, the graph consisted of 80 million pages

Both the web graph and the intermediate results (D_{k}[∗], E_{k}[∗])were too large to fit in main
memory, and a partitioning strategy, based on that presented in [4], was used to divide the computa-
tion into portions that can be carried out in memory. Specifically, the set of pagesV was partitioned
intok arbitrary setsP1, . . . , Pk of equal size (k = 10in our experiments). The web graph, repre-
sented as an edge-listE, is partitioned intokchunksEi (1≤i ≤k), whereEi contains all edges
hp, qifor whichp∈ Pi. Intermediate resultsDk[p]andEk[p]were represented together as a list
Lk[p]=h(q1, d1, e1),(q2, d2, e2), . . .iwhereDk[p](qz) =dzandEk[p](qz) =ez, forz = 1,2, . . ..
Only pages qz for which eitherdz > 0or ez > 0 were included. The set of intermediate results
Lk[∗]was partitioned intok^{2} chunksL^{i,j}_{k} [∗], so thatL^{i,j}_{k} [p]contains triples(qz, dz, ez)ofLk[p]

for whichp ∈ Pi andqz ∈ Pj. In each of the algorithms for computing partial quantities, only a

0 20000 40000 60000 80000 100000 120000

1000 2000 5000 10000 20000 50000 100000
**Number of Hubs (log scale)**

**Average Vector Size**

Partial Vectors Full Hub Vectors

Figure 2: Average Vector Size vs. Number of Hubs

single columnL^{∗,j}_{k} [∗]was kept in memory at any one time, and part of the next-iteration results
L_{k+1}[∗]were computed by successively reading in individual blocks of the graph or intermediate
results as appropriate. Each iteration requires only one linear scan of the intermediate results and
web graph, except for repeated squaring, which does not use the web graph explicitly.

**6.1** **Computing Partial Vectors**

For comparison, we computed both (full) hub vectors and partial vectors for various sizes of H, using the selective expansion algorithm with Qk(p) = V (full hub vectors) and Qk(p) = V − H (partial vectors). As discussed in Section 4.4.2, we found the partial vectors approach to be much more effective when H contains high-PageRank pages rather than random pages. In our experiments H ranged from the top 1000 to top 100,000 pages with the highest PageRank. The constantcwas set to0.15.

To evaluate the performance and scalability of our strategy independently of implementation and platform, we focus on the size of the results rather than computation time, which is linear in the size of the results. Because of the number of trials we had to perform and limitations on resources, we computed results only up to 6 iterations, for |H| up to 100,000. Figure 2 plots the average size of (full) hub vectors and partial vectors (recall that size is the number of nonzero entries), as computed after 6 iterations of the selective expansion algorithm, which for computing full hub vectors is equivalent to the basic dynamic programming algorithm. Note that the x-axis plots|H|

in logarithmic scale.

Experiments were run using a 1.4 gigahertz CPU on a machine with 3.5 gigabytes of mem- ory. For |H| = 50,000, the computation of full hub vectors took about 2.8 seconds per vector, and about 0.33seconds for each partial vector. We were unable to compute full hub vectors for

|H| = 100,000due to the time required, although the average vector size is expected not to vary significantly with|H|for full hub vectors. In Figure 2 we see that the reduction in size from using our technique becomes more significant as|H|increases, suggesting that our technique scales well with|H|.

**6.2** **Computing the Hubs Skeleton**

We computed the hubs skeleton for|H|= 10,000by running the selective expansion algorithm for
6iterations usingQ_{k}(p) = H, and then running the repeated squaring algorithm for10iterations
(Section 5.3.2), whereQ_{k}(p)is chosen to be the top 50 entries under the top-mscheme (Section
5.2.2). The average size of the hubs skeleton is9021entries. Each iteration of the repeated squaring
algorithm took about an hour, a cost that depends only on|H|and is constant with respect to the
precision to which the partial vectors are computed.

**6.3** **Constructing Hub Vectors from Partial Vectors**

Next we measured the construction of (full) hub vectors from partial vectors and the hubs skeleton.

Note that in practice we may construct PPV’s directly from partial vectors, as discussed in Section 5.4. However, performance of the construction would depend heavily on the user’s preference vector. We consider hub vector computation because it better measures the performance benefits of our partial vectors approach.

As suggested in Section 4.3, the precision of the hub vectors constructed from partial vectors
can be varied at query time according to application and performance demands. That is, instead
of using the entire setr_{p}(H)in the construction of r_{p}, we can use only the highestm entries, for
m ≤ |H|. Figure 3 plots the average size and time required to construct a full hub vector from
partial vectors in memory versus m, for|H| = 10,000. Results are averaged over50randomly-
chosen hub vectors. Note that the x-axis is in logarithmic scale.

Recall from Section 6.1 that the partial vectors from which the hubs vector is constructed were
computed using 6 iterations, limiting the precision. Thus, the error values in Figure 3 are roughly
16%(ranging from0.166form= 100to0.163form = 10,000). Nonetheless, this error is much
smaller than that of the iteration-6 full hub vectors computed in Section 6.1, which have error
(1−c)^{6} = 38%. Note, however, that the size of a vector is a better indicator of precision than the
magnitude, since we are usually most interested in the number of pages with nonzero entries in the
distribution vector. An iteration-6 full hub vector (from Section 6.1) for pagepcontains nonzero

0 5000000 10000000 15000000 20000000 25000000 30000000 35000000 40000000 45000000

100 200 500 1000 2000 5000 10000
**m (log scale)**

**Average Constructed Vector Size**

0 10 20 30 40 50 60 70

**Average Construction Time (seconds)**

Constructed Vector Size Construction Time

Figure 3: Construction Time and Size vs. Hubs Skeleton Portion (m)

entries for pages at most 6 links away fromp, 93,993 pages on average. In contrast, from Figure 3 we see that a hub vector containing 14 million nonzero entries can be constructed from partial vectors in 6 seconds.

**7** **Related Work**

The use of personalized PageRank to enable personalized web search was first proposed in [10], where it was suggested as a modification of the global PageRank algorithm, which computes a universal notion of importance. The computation of (personalized) PageRank scores was not ad- dressed beyond the naive algorithm.

In [5], personalized PageRank scores were used to enable “topic-sensitive” web search. Specif-
*ically, precomputed hub vectors corresponding to broad categories in Open Directory were used to*
bias importance scores, where the vectors and weights were selected according to the text query.

Experiments in [5] concluded that the use of personalized PageRank scores can improve web search, but the number of hub vectors used was limited to 16 due to the computational require- ments, which were not addressed in that work. Scaling the number of hub pages beyond 16 for finer-grained personalization is a direct application of our work.

*Another technique for computing web-page importance, HITS, was presented in [8]. In HITS,*
an iterative computation similar in spirit to PageRank is applied at query time on a subgraph con-
sisting of pages matching a text query and those “nearby”. Personalizing based on user-specified

web pages (and their linkage structure in the web graph) is not addressed by HITS. Moreover, the number of pages in the subgraphs used by HITS (order of thousands) is much smaller than that we consider in this paper (order of millions), and the computation from scratch at query time makes the HITS approach difficult to scale.

Another algorithm that uses query-dependent importance scores to improve upon a global ver- sion of importance was presented in [11]. Like HITS, it first restricts the computation to a subgraph derived from text matching. (Personalizing based on user-specified web pages is not addressed.) Unlike HITS, [11] suggested that importance scores be precomputed offline for every possible text query, but the enormous number of possibilities makes this approach difficult to scale.

The concept of using “hub nodes” in a graph to enable partial computation of solutions to the shortest-path problem was used in [3] in the context of database search. That work deals with searches within databases, and on a scale far smaller than that of the web.

Some system aspects of (global) PageRank computation were addressed in [4]. The disk- based data-partitioning strategy used in the implementation of our algorithm is adopted from that presented therein.

Finally, the concept of inverse P-distance used in this paper is based on the concept of expected- f distance introduced in [7], where it was presented as an intuitive model for a similarity measure in graph structures.

**8** **Summary**

We have addressed the problem of scaling personalized web search:

• We started by identifying a linear relationship that allows personalized PageRank vectors to
*be expressed as a linear combination of basis vectors. Personalized vectors corresponding*
*to arbitrary preference sets drawn from a hub set*Hcan be constructed quickly from the set
*of precomputed basis hub vectors, one for each hub*h∈H.

• We laid the mathematical foundations for constructing hub vectors efficiently by relating
*personalized PageRank scores to inverse P-distances, an intuitive notion of distance in arbi-*
trary directed graphs. We used this notion of distance to identify interrelationships among
basis vectors.

• *We presented a method of encoding hub vectors as partial vectors and the hubs skeleton.*

Redundancy is minimized under this representation: each partial vector for a hub page p represents the part ofp’s hub vector unique to itself, while the skeleton specifies how partial vectors are assembled into full vectors.

• We presented algorithms for computing basis vectors, and showed how they can be modified to compute partial vectors and the hubs skeleton efficiently.

• We ran experiments on real web data showing the effectiveness of our approach. Results showed that our strategy results in significant resource reduction over full vectors, and scales well with|H|, the degree of personalization.

**9** **Acknowledgment**

The authors thank Taher Haveliwala for many useful discussions and extensive help with imple- mentation.

**References**

[1] http://www.google.com.

[2] http://dmoz.org.

[3] Roy Goldman, Narayanan Shivakumar, Suresh Venkatasubramanian, and Hector Garcia-
*Molina. Proximity search in databases. In Proceedings of the Twenty-Fourth International*
*Conference on Very Large Databases, New York, New York, August 1998.*

[4] Taher H. Haveliwala. Efficient computation of PageRank. Technical report, Stanford Univer- sity Database Group, 1999. http://dbpubs.stanford.edu/pub/1999-31.

*[5] Taher H. Haveliwala. Topic-sensitive PageRank. In Proceedings of the Eleventh International*
*World Wide Web Conference, Honolulu, Hawaii, May 2002.*

[6] Jun Hirai, Sriram Raghavan, Andreas Paepcke, and Hector Garcia-Molina. WebBase: A
*repository of web pages. In Proceedings of the Ninth International World Wide Web Confer-*
*ence, Amsterdam, Netherlands, May 2000.* http://www-diglib.stanford.edu/

˜testbed/doc2/WebBase/.

[7] Glen Jeh and Jennifer Widom. SimRank: A measure of structural-context similarity. In
*Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery*
*and Data Mining, Edmonton, Alberta, Canada, July 2002.*

*[8] Jon M. Kleinberg. Authoritative sources in a hyperlinked environment. In Proceedings of*
*the Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, San Francisco, California,*
January 1998.

*[9] Rajeev Motwani and Prabhakar Raghavan. Randomized Algorithms. Cambridge University*
Press, United Kingdom, 1995.

[10] Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The PageRank citation ranking: Bringing order to the Web. Technical report, Stanford University Database Group, 1998. http://citeseer.nj.nec.com/368196.html.

[11] Matthew Richardson and Pedro Domingos. The intelligent surfer: Probabilistic combina-
*tion of link and content information in PageRank. In Proceedings of Advances in Neural*
*Information Processing Systems 14, Cambridge, Massachusetts, December 2002.*

**APPENDIX**

**A** **Proof: Linearity Theorem**

* Theorem (Linearity). For any preference vectors*u

_{1}

*and*u

_{2}

*, if*v

_{1}

*and*v

_{2}

*are the two correspond-*

*ing PPV’s, then for any constants*α

_{1}, α

_{2}≥0

*such that*α

_{1}+α

_{2}= 1,

α_{1}v_{1}+α_{2}v_{2} = (1−c)A(α_{1}v_{1}+α_{2}v_{2}) +c(α_{1}u_{1}+α_{2}u_{2})
**Proof:**

α_{1}v_{1}+α_{2}v_{2} = α_{1}((1−c)Av_{1}+cu_{1}) +α_{2}((1−c)Av_{2}+cu_{2})

= α_{1}(1−c)Av_{1} +α_{1}cu_{1}+α_{2}(1−c)Av_{2}+α_{2}cu_{2}

= (1−c)A(α_{1}v_{1}+α_{2}v_{2}) +c(α_{1}u_{1}+α_{2}u_{2})

**B** **Proof: Decomposition Theorem**

* Theorem (Decomposition). For any*p∈V

*,*r

_{p}= (1−c)

|O(p)|

|O(p)|

X

i=1

r_{O}_{i}_{(p)}+cx_{p}

**Proof: First we rewrite equation (1) in an equivalent form. For a given preference vector** u, we
*define the derived matrix*A_{u}as

A_{u}= (1−c)A+cU (21)
whereU is the n×n matrix with U_{ij} = u_{i} for all i, j. If we require that|v| = 1, we can write
equation (1) as

v =A_{u}v

Without loss of generality, let the out-neighbors ofpbe1, . . . , k. LetA_{p}be the derived matrix
corresponding tox_{p}, and letA_{1}, . . . ,A_{k}be the derived matrices foru=x_{1}, . . . ,x_{k}, respectively.

LetU_{p}andU_{1}, . . . ,U_{k}be the correspondingU’s in equation (21).

Let

v_{p}= (1−c)
k

k

X

i=1

r_{i}+cx_{p}

Clearly, |v_{p}| = 1. We need to show that A_{p}v_{p} = v_{p}, in which case v_{p} = r_{p}, since PPV’s are
unique (Section 1). First we have that:

A_{p}v_{p} = A_{p} 1−c
k

k

X

i=1

r_{i}+cx_{p}

!

= 1−cX^{k}

A_{p}r_{i}+cA_{p}x_{p}

Using the identity

A_{p} =A_{i}−cU_{i}+cU_{p}
we have:

A_{p}v_{p} = 1−c
k

k

X

i=1

(A_{i}−cU_{i}+cU_{p})r_{i}+cA_{p}x_{p}

= 1−c k

k

X

i=1

A_{i}r_{i}−1−c
k c

k

X

i=1

U_{i}r_{i}+1−c
k c

k

X

i=1

U_{p}r_{i}+cA_{p}x_{p}

= 1−c k

k

X

i=1

ri− 1−c k c

k

X

i=1

xi+ 1−c k c

k

X

i=1

xp+cApxp

= 1−c k

k

X

i=1

ri− 1−c k c

k

X

i=1

xi+ (1−c)cxp+c((1−c)A+cUp)xp

= 1−c k

k

X

i=1

ri− 1−c k c

k

X

i=1

xi+ (1−c)cxp+ (1−c)cAxp+c^{2}xp

= 1−c k

k

X

i=1

ri+ (1−c)cxp+c^{2}xp+ (1−c)c Axp− 1
k

k

X

i=1

xi

!

= 1−c k

k

X

i=1

r_{i}+ (1−c)cx_{p}+c^{2}x_{p}

= 1−c k

k

X

i=1

r_{i}+cx_{p}

=v_{p}

**C** **Inverse P-distance**

**C.1** **Relation to Personalized PageRank**

The relationship between inverse P-distances and personalized PageRank scores is given by the following theorem.

* Theorem. For all*p, q ∈V

*,*

r_{p}(q) = r^{0}_{p}(q)

**Proof: Writing the Decomposition Theorem in scalar form for page**p, we get a set ofnequations,
one for eachq ∈V, of the form

rp(q) =

(1−c)

|O(p)|

P

i=1

r_{O}_{i}_{(p)}(q) (ifp6=q)
(1−c)

|O(p)|

P

i=1

r_{O}_{i}_{(p)}(q) +c (ifp=q)