Effective Missing Data Prediction | My Assignment Tutor

Effective Missing Data Prediction for CollaborativeFilteringHao Ma, Irwin King and Michael R. LyuDept. of Computer Science and EngineeringThe Chinese University of Hong KongShatin, N.T., Hong Kong{ hma, king, lyu }@cse.cuhk.edu.hkABSTRACTMemory-based collaborative filtering algorithms have beenwidely adopted in many popular recommender systems, although these approaches all suffer from data sparsity andpoor prediction quality problems. Usually, the user-itemmatrix is quite sparse, which directly leads to inaccurate recommendations. This paper focuses the memory-based collaborative filtering problems on two crucial factors: (1) similarity computation between users or items and (2) missing data prediction algorithms. First, we use the enhancedPearson Correlation Coefficient (PCC) algorithm by addingone parameter which overcomes the potential decrease ofaccuracy when computing the similarity of users or items.Second, we propose an effective missing data prediction algorithm, in which information of both users and items istaken into account. In this algorithm, we set the similaritythreshold for users and items respectively, and the prediction algorithm will determine whether predicting the missingdata or not. We also address how to predict the missing databy employing a combination of user and item information.Finally, empirical studies on dataset MovieLens have shownthat our newly proposed method outperforms other stateof-the-art collaborative filtering algorithms and it is morerobust against data sparsity.Categories and Subject Descriptors: H.3.3 [Information Systems]: Information Search and Retrieval – Information FilteringGeneral Terms: Algorithm, Performance, Experimentation.Keywords: Collaborative Filtering, Recommender System,Data Prediction, Data Sparsity.1. INTRODUCTIONCollaborative filtering is the method which automaticallypredicts the interest of an active user by collecting ratinginformation from other similar users or items, and relatedtechniques have been widely employed in some large, faPermission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.SIGIR’07, July23–27,2007,Amsterdam,TheNetherlands.Copyright 2007 ACM 978-1-59593-597-7/07/0007 …$5.00.mous commercial systems, such as Amazon1, Ebay2. Theunderlying assumption of collaborative filtering is that theactive user will prefer those items which the similar usersprefer. The research of collaborative filtering started frommemory-based approaches which utilize the entire user-itemdatabase to generate a prediction based on user or item similarity. Two types of memory-based methods have beenstudied: user-based [2, 7, 10, 22] and item-based [5, 12,17]. User-based methods first look for some similar userswho have similar rating styles with the active user and thenemploy the ratings from those similar users to predict theratings for the active user. Item-based methods share thesame idea with user-based methods. The only differenceis user-based methods try to find the similar users for anactive user but item-based methods try to find the similar items for each item. Whether in user-based approachesor in item-based approaches, the computation of similaritybetween users or items is a very critical step. Notable similarity computation algorithms include Pearson CorrelationCoefficient (PCC) [16] and Vector Space Similarity (VSS)algorithm [2].Although memory-based approaches have been widely usedin recommendation systems [12, 16], the problem of inaccurate recommendation results still exists in both user-basedand item-based approaches. The fundamental problem ofmemory-based approaches is the data sparsity of the useritem matrix. Many recent algorithms have been proposedto alleviate the data sparsity problem. In [21], Wang etal. proposed a generative probabilistic framework to exploitmore of the data available in the user-item matrix by fusingall ratings with a predictive value for a recommendation tobe made. Xue et al. [22] proposed a framework for collaborative filtering which combines the strengths of memorybased approaches and model-based approaches by introducing a smoothing-based method, and solved the data sparsityproblem by predicting all the missing data in a user-item matrix. Although the simulation showed that this approach canachieve better performance than other collaborative filteringalgorithms, the cluster-based smoothing algorithm limitedthe diversity of users in each cluster and predicting all themissing data in the user-item matrix could bring negativeinfluence for the recommendation of active users.In this paper, we first use PCC-based significance weighting to compute similarity between users and items, whichovercomes the potential decrease of similarity accuracy. Second, we propose an effective missing data prediction algo-1http://www.amazon.com/.2http://www.half.ebay.com/.SIGIR 2007 Proceedings Session 2: Routing and Filtering39rithm which exploits the information both from users anditems. Moreover, this algorithm will predict the missingdata of a user-item matrix if and only if we think it will bringpositive influence for the recommendation of active usersinstead of predicting every missing data of the user-itemmatrix. The simulation shows our novel approach achievesbetter performance than other state-of-the-art collaborativefiltering approaches.The remainder of this paper is organized as follows. InSection 2, we provide an overview of several major approachesfor collaborative filtering. Section 3 shows the method ofsimilarity computation. The framework of our missing dataprediction and collaborative filtering is introduced in Section 4. The results of an empirical analysis are presented inSection 5, followed by a conclusion in Section 6.2. RELATED WORKIn this section, we review several major approaches forcollaborative filtering. Two types of collaborative filteringapproaches are widely studied: memory-based and modelbased.2.1 Memory-based approachesThe memory-based approaches are the most popular prediction methods and are widely adopted in commercial collaborative filtering systems [12, 16]. The most analyzed examples of memory-based collaborative filtering include userbased approaches [2, 7, 10, 22] and item-based approaches [5,12, 17]. User-based approaches predict the ratings of activeusers based on the ratings of similar users found, and itembased approaches predict the ratings of active users basedon the information of similar items computed. User-basedand item-based approaches often use PCC algorithm [16]and VSS algorithm [2] as the similarity computation methods. PCC-based collaborative filtering generally can achievehigher performance than the other popular algorithm VSS,since it considers the differences of user rating styles.2.2 Model-based ApproachesIn the model-based approaches, training datasets are usedto train a predefined model. Examples of model-based approaches include clustering models [11, 20, 22], aspect models [8, 9, 19] and latent factor model [3]. [11] presentedan algorithm for collaborative filtering based on hierarchicalclustering, which tried to balance robustness and accuracy ofpredictions, especially when little data were available. Authors in [8] proposed an algorithm based on a generalization of probabilistic latent semantic analysis to continuousvalued response variables. The model-based approaches areoften time-consuming to build and update, and cannot coveras diverse a user range as the memory-based approachesdo [22].2.3 Other Related ApproachesIn order to take the advantages of memory-based andmodel-based approaches, hybrid collaborative filtering methods have been studied recently [14, 22]. [1, 4] unified collaborative filtering and content-based filtering, which achievedsignificant improvements over the standard approaches. Atthe same time, in order to solve the data sparsity problem,researchers proposed dimensionality reduction approachesin [15]. The dimensionality-reduction approach addressedthe sparsity problem by deleting unrelated or insignificantusers or items, which would discard some information of theuser-item matrix.3. SIMILARITY COMPUTATIONThis section briefly introduces the similarity computationmethods in traditional user-based and item-based collaborative filtering [2, 5, 7, 17] as well as the method proposed inthis paper. Given a recommendation system consists of Musers and N items, the relationship between users and itemsis denoted by an M ×N matrix, called the user-item matrix.Every entry in this matrix rm,n represents the score value,r, that user m rates an item n, where r ∈ {1, 2, …, rmax}. Ifuser m does not rate the item n, then rm,n = 0.3.1 Pearson Correlation CoefficientUser-based collaborative filtering engaging PCC was usedin a number of recommendation systems [18], since it canbe easily implemented and can achieve high accuracy whencomparing with other similarity computation methods. Inuser-based collaborative filtering, PCC is employed to definethe similarity between two users a and u based on the itemsthey rated in common:Sim(a, u) =!i∈I(a)∩I(u)(ra,i – ra) · (ru,i – ru) “!(ra,i – ra)2 · “!(ru,i – ru)2i∈I(a)∩I(u)i∈I(a)∩I(u) ,(1)where Sim(a, u) denotes the similarity between user a anduser u, and i belongs to the subset of items which user a anduser u both rated. ra,i is the rate user a gave item i, and rarepresents the average rate of user a. From this definition,user similarity Sim(a, u) is ranging from [0, 1], and a largervalue means users a and u are more similar.Item-based methods such as [5, 17] are similar to userbased approaches, and the difference is that item-based methods employ the similarity between the items instead of users.The basic idea in similarity computation between two itemsi and j is to first isolate the users who have rated both ofthese items and then apply a similarity computation technique to determine the similarity Sim(i, j) [17]. The PCCbased similarity computation between two items i and j canbe described as:Sim(i, j) =!u∈U(i)∩U(j)(ru,i – ri) · (ru,j – rj) “!(ru,i – ri)2 · “!(ru,j – rj)2u∈U(i)∩U(j)u∈U(i)∩U(j) ,(2)where Sim(i, j) is the similarity between item i and item j,and u belongs to the subset of users who both rated itemi and item j. ru,i is the rate user u gave item i, and rirepresents the average rate of item i. Like user similarity,item similarity Sim(i, j) is also ranging from [0, 1].3.2 Significance WeightingPCC-based collaborative filtering generally can achievehigher performance than other popular algorithms like VSS [2],since it considers the factor of the differences of user ratingstyles. However PCC will overestimate the similarities ofusers who happen to have rated a few items identically, butmay not have similar overall preferences [13]. Herlocker etSIGIR 2007 Proceedings Session 2: Routing and Filtering40al. [6, 7] proposed to add a correlation significance weighting factor that would devalue similarity weights that werebased on a small number of co-rated items. Herlocker’s latest research work [13] proposed to use the following modifiedsimilarity computation equation:Sim′(a, u) = Max(|Ia ∩Iu|, γ)γ· Sim(a, u). (3)This equation overcomes the problem when only few itemsare rated in common but in case that when |Ia ∩Iu| is muchhigher than γ, the similarity Sim′(a, u) will be larger than 1,and even surpass 2 or 3 in worse cases. We use the followingequation to solve this problem:Sim′(a, u) = Min(|Ia ∩Iu|, γ)γ· Sim(a, u), (4)where |Ia ∩ Iu| is the number of items which user a anduser u rated in common. This change bounds the similaritySim′(a, u) to the interval [0, 1]. Then the similarity betweenitems could be defined as:Sim′(i, j) = Min(|Ui ∩Uj|, δ)δ · Sim(i, j), (5)where |Ui ∩Uj| is the number of users who rated both itemi and item j.4. COLLABORATIVE FILTERINGFRAMEWORKIn pratice, the user-item matrix of commercial recommendation system is very sparse and the density of availableratings is often less than 1% [17]. Sparse matrix directlyleads to the prediction inaccuracy in traditional user-basedor item-based collaborative filtering. Some work appliesdata smoothing methods to fill the missing values of theuser-item matrix. In [22], Xue et al. proposed a clusterbased smoothing method which clusters the users using Kmeans first, and then predicts all the missing data basedon the ratings of Top-N most similar users in the similarclusters. The simulation shows this method could generatebetter results than other collaborative filtering algorithms.But cluster-based method limits the diversity of users ineach cluster, and the clustering results of K-means relies onthe pre-selected K users. Furthermore, if a user does nothave enough similar users, then Top-N algorithm generatesa lot of dissimilar users which definitely will decrease theprediction accuracy of the active users.According to the analysis above, we propose a novel effective missing data prediction algorithm which predicts themissing data when it fits the criteria we set. Otherwise, wewill not predict the missing data and keep the value of themissing data to be zero. As illustrated in Fig. 1(a), beforewe predict the missing data, the user-item matrix is a verysparse matrix and every user only rates few items with ru,i;at the same time, other unrated data are covered with shade.Using this sparse matrix to predict ratings for active usersalways results in giving bad recommendations to the active users. In our approach, we evaluate every shaded block(missing data) using the available information in Fig. 1(a).For every shaded block, if our algorithm achieves confidencein the prediction, then we give this shaded block a predictedrating value r#u,i. Otherwise, we set the value of this missingdata to zero, as seen in Fig. 1(b).Accordingly, the collaborative filtering is simplified intotwo simple questions. The first is “Under what circumstancedoes our algorithm have confidence to predict the shadedblock?” and the second is “How to predict?”. The followingsubsections will answer these two questions.4.1 Similar Neighbors SelectionSimilar neighbors selection is a very important step inpredicting missing data. If selected neighbors are dissimilarwith the current user, then the prediction of missing dataof this user is inaccurate and will finally affect the prediction results of the active users. In order to overcome theflaws of Top-N neighbors selection algorithms, we introducea threshold η. If the similarity between the neighbor and thecurrent user is larger than η, then this neighbor is selectedas the similar user.For every missing data ru,i, a set of similar users S(u)towards user u can be generated according to:S(u) = {ua|Sim′(ua, u) > η, ua = u}, (6)where Sim′(ua, u) is computed using Eq. (4). At the sametime, for every missing data ru,i, a set of similar items S(i)towards item i can be generated according to:S(i) = {ik|Sim′(ik, i) > θ, ik = i}, (7)where θ is the item similarity threshold, and Sim′(ik, i) iscomputed by Eq. (5). The selection of η and θ is an important step since a very big value will always cause theshortage of similar users or items, and a relative small valuewill bring too many similar users or items.According to Eqs.(6) and (7), we define that our algorithmwill lack enough confidence to predict the missing data ru,i ifand only if S(u) = ∅∧ S(i) = ∅, which means that user u doesnot have similar users and item i does not have similar itemseither. Then our algorithm sets the value of this missingdata to zero. Otherwise, it will predict the missing data ru,ifollowing the algorithm described in Subsection Missing Data PredictionUser-based collaborative filtering predicts the missing datausing the ratings of similar users and item-based collaborative filtering predicts the missing data using the ratings ofsimilar items. Actually, although users have their own rating style, if an item is a very popular item and has obtaineda very high average rating from other users, then the active user will have a high probability to give this item agood rating too. Hence, predicting missing data only usinguser-based approaches or only using item-based approacheswill potentially ignore valuable information that will makethe prediction more accurate. We propose to systematicallycombine user-based and item-based approaches, and takeadvantage of user correlations and item correlations in theuser-item matrix.Given the missing data ru,i, according to Eq. (6) andEq. (7), if S(u) = ∅ ∧ S(i) = ∅, the prediction of missingSIGIR 2007 Proceedings Session 2: Routing and Filtering41i1 i2 i3 i4 i5 i6 i7 i8 inu1u 2u 3u 4u 5u m r1,1r1,4r2,2r2,8r3,6r4 ,4r4 ,nr5,3r5,7r6 ,9rm,nrm,2 u 6i9i1 i2 i3 i4 i5 i6 i7 i8 inu1u 2u 3u 4u 5u m r1,1r1,4rˆ1,3rˆ1,6rˆ1,8rˆ1,9r2,2r2,8rˆ2,4rˆ2,5rˆ2,7rˆ2,nr3,6rˆ3,1rˆ3,3rˆ3,4rˆ3,5rˆ3,8rˆ3,9r4 ,4r4 ,nrˆ4 ,1rˆ4 ,2rˆ4 ,5rˆ4 ,6rˆ4 ,7rˆ4 ,9r5,3r5,7rˆ5,1rˆ5,2rˆ5,5rˆ5,8rˆ5,9rˆ5,nr6 ,9rˆ6 ,1rˆ6 ,2rˆ6 ,4rˆ6 ,5rˆ6 ,6rˆ6 ,7rˆ6 ,nrm,2rm,nrˆm,1rˆm,4rˆm,6rˆm,8rˆm,9 u 6i9Figure 1: (a) The user-item matrix (m × n) before missing data prediction. (b) The user-item matrix (m × n)after missing data prediction.data P(ru,i) is defined as:P(ru,i) = λ × (u +!ua∈S(u)Sim′(ua, u) · (rua,i – ua)!ua∈S(u)Sim′(ua, u)) +(1 – λ) × (i +!ik∈S(i)Sim′(ik, i) · (ru,ik – ik)!ik∈S(i)Sim′(ik, i)), (8)where λ is the parameter in the range of [0, 1]. The use ofparameter λ allows us to determine how the prediction relieson user-based prediction and item-based prediction. λ = 1states that P(ru,i) depends completely upon ratings fromuser-based prediction and λ = 0 states that P(ru,i) dependscompletely upon ratings from item-based prediction.In practice, some users do not have similar users and thesimilarities between these users and all other users are lessthan the threshold η. Top-N algorithms will ignore thisproblem and still choose the top n most similar users topredict the missing data. This will definitely decrease theprediction quality of the missing data. In order to predictthe missing data as accurate as possible, in case some usersdo not have similar users, we use the information of similaritems instead of users to predict the missing data, and viceversa, as seen in Eq. (9) and Eq. (10). This consideration inspires us to fully utilize the information of user-item matrixas follows:If S(u) = ∅ ∧ S(i) = ∅, the prediction of missing dataP(ru,i) is defined as:P(ru,i) = u +!ua∈S(u)Sim′(ua, u) · (rua,i – ua)!ua∈S(u)Sim′(ua, u). (9)If S(u) = ∅ ∧ S(i) = ∅, the prediction of missing dataP(ru,i) is defined as:P(ru,i) = i +!ik∈S(i)Sim′(ik, i) · (ru,ik – ik)!ik∈S(i)Sim′(ik, i). (10)The last possibility is given the missing data ru,i, user udoes not have similar users and at the same time, item i alsodoes not have similar items. In this situation, we choose notto predict the missing data; otherwise, it will bring negativeinfluence to the prediction of the missing data ru,i. That is:If S(u) = ∅ ∧ S(i) = ∅, the prediction of missing dataP(ru,i) is defined as:P(ru,i) = 0. (11)This consideration is different from all other existing prediction or smoothing methods. They always try to predictall the missing data in the user-item matrix, which will predict some missing data with bad quality.4.3 Prediction for Active UsersAfter the missing data is predicted in the user-item matrix, the next step is to predict the ratings for the activeusers. The prediction process is almost the same as predicting the missing data, and the only difference is in the casefor a given active user a; namely, if S(a) = ∅ ∧ S(i) = ∅,then predicts the missing data using the following equation:P(ra,i) = λ × ra + (1 – λ) × ri. (12)In other situations, if (1) S(u) = ∅ ∧ S(i) = ∅, (2) S(u) =∅ ∧ S(i) = ∅ or (3) S(u) = ∅ ∧ S(i) = ∅, we use Eq. (8),Eq. (9) and Eq. (10) to predict ra,i, respectively.4.4 Parameter DiscussionThe thresholds γ and δ introduced in Section 3 are employed to avoid overestimating the users similarity and itemssimilarity, when there are only few ratings in common. If weset γ and δ too high, most of the similarities between usersor items need to be multiplied with the significance weight,and it is not the results we expect. However, if we set γ andδ too low, it is also not reasonable because the overestimateproblem still exists. Tuning these parameters is importantto achieving a good prediction results.The thresholds η and θ introduced in Section 4.1 also playan important role in our collaborative filtering algorithm. Ifη and θ are set too high, less missing data need to be predicted; if they are set too low, a lot of missing data needto be predicted. In the case when η = 1 and θ = 1, ourapproach will not predict any missing data, and this algorithm becomes the general collaborative filtering withoutdata smoothing. In the case when η = 0 and θ = 0, our approach will predict all the missing data, and this algorithmSIGIR 2007 Proceedings Session 2: Routing and Filtering42Table 1: The relationship between parameters with other CF approaches LambdaEtaThetaRelated CF Approaches111User-based CF without missing data prediction011Item-based CF without missing data prediction100User-based CF with all the missing data predicted000Item-based CF with all the missing data predicted converges to the Top-N neighbors selection algorithms, except the number N here includes all the neighbors. In orderto simplify our model, we set η = θ in all the simulations.Finally, parameter λ introduced in Section 4.2 is the lastparameter we need to tune, and it is also the most importantone. λ determines how closely the rating prediction relies onuser information or item information. As discussed before,λ = 1 states that P(ru,i) depends completely upon ratingsfrom user-based prediction and λ = 0 states that P(ru,i)depends completely upon ratings from item-based prediction. This physical interpretation also helps us to tune λaccordingly.With the changes of parameters, several other famous collaborative filtering methods become special cases in our approach as illustrated in Table 1.5. EMPIRICAL ANALYSISWe conduct several experiments to measure the recommendation quality of our new approach for collaborative filtering with other methods, and address the experiments asthe following questions: (1) How does our approach comparewith traditional user-based and item-based collaborative filtering methods? (2) What is the performance comparisonbetween our effective missing data prediction approach andother algorithms which predict every missing data? (3) Howdoes significance weighting affect the accuracy of prediction?(4) How do the thresholds η and θ affect the accuracy of prediction? How many missing data are predicted by our algorithm, and what is the comparison of our algorithm withthe algorithms that predict all the missing data or no missing data? (5) How does the parameter λ affect the accuracyof prediction? and (6) How does our approach comparewith the published state-of-the-art collaborative filtering algorithms?In the following, Section 5.3 gives answers to questions 1and 6, Section 5.4 addresses question 2, and Section 5.5describes experiment for the questions 3 to 5.5.1 DatasetTwo datasets from movie rating are applied in our experiments: MovieLens3 and EachMovie4. We only reportthe simulation results of MovieLens due to the space limitation. Similar results can be observed from the EachMovieapplication.MovieLens is a famous Web-based research recommendersystem. It contains 100,000 ratings (1-5 scales) rated by 943users on 1682 movies, and each user at least rated 20 movies.The density of the user-item matrix is:100000943 × 1682= 6.30%.3http://www.cs.umn.edu/Research/GroupLens/.4http://www.research.digital.com/SRC/EachMovie/. It isretired by Hewlett-Packard (HP), but a postprocessed copycan be found on http://guir.berkeley.edu/projects/swami/.Table 2: Statistics of Dataset MovieLens StatisticsUserItemMin. Num. of Ratings201Max. Num. of Ratings737583Avg. Num. of Ratings106.0459.45 Table 3: MAE comparison with other approaches (Asmaller MAE value means a better performance). Training UsersMethodsGiven5Given10Given20EMDP0.7840.7650.755MovieLens 300UPCC0.8380.8140.802IPCC0.8700.8380.813EMDP0.7960.7700.761MovieLens 200UPCC0.8430.8220.807IPCC0.8550.8340.812EMDP0.8110.7780.769MovieLens 100UPCC0.8760.8470.811IPCC0.8900.8500.824 The statistics of dataset MovieLens is summarized in Table 2.We extract a subset of 500 users from the dataset, anddivide it into two parts: select 300 users as the trainingusers (100, 200, 300 users respectively), and the rest 200users as the active (testing) users. As to the active users,we vary the number of rated items provided by the activeusers from 5, 10, to 20, and give the name Given5, Given10and Given20, respectively.5.2 MetricsWe use the Mean Absolute Error (MAE) metrics to measure the prediction quality of our proposed approach withother collaborative filtering methods. MAE is defined as:MAE =$u,i |ru,i – r#u,i|N , (13)where ru,i denotes the rating that user u gave to item i, andr#u,i denotes the rating that user u gave to item i which ispredicted by our approach, and N denotes the number oftested ratings.5.3 ComparisonIn order to show the performance increase of our effectivemissing data prediction (EMDP) algorithm, we compare ouralgorithm with some traditional algorithms: user-based algorithm using PCC (UPCC) and item-based algorithm usingPCC (IPCC). The parameters or thresholds for the experiments are empirically set as follows: λ = 0.7, γ = 30, δ = 25,η = θ = 0.4.In Table 3, we observe that our new approach significantlyimproves the recommendation quality of collaborative filtering, and outperforms UPCC and IPCC consistently.SIGIR 2007 Proceedings Session 2: Routing and Filtering43Table 4: MAE comparison with state-of-the-arts algorithms (A smaller MAE value means a better performance). Num. of Training Users100200300Ratings Given for Active Users510205102051020EMDP0.8070.7690.7650.7930.7600.7510.7880.7540.746SF0.8470.7740.7920.8270.7730.7830.8040.7610.769SCBPCC0.8480.8190.7890.8310.8130.7840.8220.8100.778AM0.9630.9220.8870.8490.8370.8150.8200.8220.796PD0.8490.8170.8080.8360.8150.7920.8270.8150.789PCC0.8740.8360.8180.8590.8290.8130.8490.8410.820 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10.740.750.760.770.780.790.80.810.820.83LambdaMAEEMDP-Given20PEMD-Given20EMDP-Given10PEMD-Given10EMDP-Given5PEMD-Given5Figure 2: MAE Comparison of EMDP and PEMD(A smaller MAE value means a better performance).Next, in order to compare our approach with other stateof-the-arts algorithms, we follow the exact evaluation procedures which were described in [21, 22] by extracting asubset of 500 users with more than 40 ratings. Table 4summarizes our experimental results. We compare with thefollowing algorithms: Similarity Fusion (SF) [21], Smoothing and Cluster-Based PCC (SCBPCC) [22], the AspectModel (AM) [9], Personality Diagnosis (PD) [14] and theuser-based PCC [2]. Our method outperforms all other competitive algorithms in various configurations.5.4 Impact of Missing Data PredictionOur algorithm incorporates the option not to predict themissing data if it does not meet the criteria set in Section4.1 and Section 4.2. In addition, it alleviates the potentialnegative influences from bad prediction on the missing data.To demonstrate the effectiveness of our approach, we firstconduct a set of simulations on our effective missing dataprediction approach. The number of training users is 300,where we set γ = 30, δ = 25, η = θ = 0.5, and vary λfrom zero to one with a step value of 0.05. We then plot thegraph with the ratings of active users of Given5, Given10and Given20, respectively. As to the method in predictingevery missing data (PEMD), we use the same algorithm,and keep the configurations the same as EMDP except forEq. (11). In PEMD, when S(u) = ∅ and S(i) = ∅, wepredict the missing data ru,i using the nearest neighbors ofthe missing data instead of setting the value to zero. Inthis experiment, we set the number of nearest neighborsto 10. The intention of this experiment is to compare theperformance of our EMDP algorithm with PEMD under thesame configurations. In other words, we intend to determinethe effectiveness of our missing data prediction algorithm,and whether our approach is better than the approach whichwill predict every missing data or not.In Fig. 2, the star, up triangle, and diamond in solid linerepresent the EMDP algorithm in Given20, Given10 andGiven5 ratings respectively, and the circle, down triangle,and square in dashed line represent the PEMD algorithm inGiven20, Given10 and Given5 ratings respectively. All thesolid lines are below the respectively comparative dashedlines, indicating our effective missing data prediction algorithm performs better than the algorithm which predict every missing data, and predicting missing data selectively isindeed a more effective method.5.5 Impact of Parameters5.5.1 γ and δ in Significance WeightingSignificance weighting makes the similarity computationmore reasonable in practice and devalues some similaritieswhich look similar but are actually not, and the simulationresults in Fig. 3 shows the significance weighting will promote the collaborative filtering performance.In this experiment, we first evaluate the influence of γ,and select 300 training users, then set λ = 0.7, η = θ = 0.5,δ = 26. We vary the range of γ from 0 to 50 with a step valueof 2. Fig. 3(a),(b),(c) shows how γ affects MAE when givenratings 20, 10, 5 respectively, and Fig. 3(d) shows that thevalue of γ also impacts the density of the user-item matrixin the process of missing data prediction. The density ofthe user-item matrix will decrease according to the increaseof the value of γ. More experiments show that δ has thesame features and impacts on MAE and matrix density asγ; however, we do not include the simulation results due tothe space limitation.5.5.2 Impact of λParameter λ balances the information from users and items.It takes advantages from these two types of collaborative filtering methods. If λ = 1, we only extract information fromusers, and if λ = 0, we only mine valuable information fromitems. In other cases, we fuse information from users anditems to predict the missing data and furthermore, to predict for active users.Fig. 4 shows the impacts of λ on MAE. In this experiment,we test 300 training users, 200 training users and 100 training users and report the experiment results in Fig. 4(a),Fig. 4(b) and Fig. 4(c) respectively. The initial values ofother parameters or thresholds are: η = θ = 0.5, γ = 30,δ = 25.SIGIR 2007 Proceedings Session 2: Routing and Filtering440 10 20 30 40 500.74640.74660.74680.7470.74720.74740.74760.74780.748GammaMAEGiven200 10 20 30 40 500.75380.7540.75420.75440.75460.75480.7550.75520.75540.75560.7558GammaMAEGiven100 10 20 30 40 500.7750.77550.7760.77650.7770.77750.7780.77850.779GammaMAE Given5 0 10 20 30 40 500.860.880.90.920.940.960.981GammaDensity(a) (b) (c) (d)Figure 3: Impact of Gamma on MAE and Matrix Density0 0.2 0.4 0.6 0.8 10.740.760.780.80.820.84LambdaMAETraining Users = 300Given20Given10Given50 0.2 0.4 0.6 0.8 10.740.760.780.80.820.84LambdaMAETraining Users = 200Given20Given10Given50 0.2 0.4 0.6 0.8 10.760.780.80.820.840.86LambdaMAETraining Users = 100Given20Given10Given5(a) (b) (c)Figure 4: Impact of Lambda on MAE0 0.2 0.4 0.6 0.8 10.740.760.780.80.820.84Eta and ThetaMAETraining Users = 300Given20Given10Given50 0.2 0.4 0.6 0.8 and ThetaDensityTraining Users = 3000 0.2 0.4 0.6 0.8 10.740.760.780.80.820.84Eta and ThetaMAETraining Users = 200Given20Given10Given50 0.2 0.4 0.6 0.8 and ThetaDensityTraining Users = 2000 0.2 0.4 0.6 0.8 10.760.780.80.820.84Eta and ThetaMAETraining Users = 100Given20Given10Given50 0.2 0.4 0.6 0.8 and ThetaDensityTraining Users = 100(a) (b) (c)(d) (f) (g)Figure 5: Impact of Eta and Theta on MAE and DensitySIGIR 2007 Proceedings Session 2: Routing and Filtering45Observed from Fig. 4, we draw the conclusion that thevalue of λ impacts the recommendation results significantly,which demonstrates that combining the user-based methodwith the item-based method will greatly improve the recommendation accuracy. Another interesting observation iswhen following the increase of the number of ratings given(from 5 to 10, and from 10 to 20), the value of arg minλ(MAE)of each curve in Fig. 4 shifts from 0.3 to 0.8 smoothly. Thisimplies the information for users is more important thanthat for items if more ratings for active users are given. Onthe other hand, the information for items would be more important if less ratings for active users are available; however,less ratings for active users will lead to more inaccuracy ofthe recommendation results.5.5.3 Impact of η and θη and θ also play a very important role in our collaborative filtering approach. As discussed in Section 4, η and θdirectly determine how many missing data need to be predicted. If η and θ are set too high, most of the missing datacannot be predicted since many users will not have similarusers, and many items will not have similar items either. Onthe other hand, if η and θ are set too low, every user or itemwill obtain too many similar users or items, which causes thecomputation inaccuracy and increases the computing cost.Accordingly, selecting proper values for η and θ is as critical as determining the value for λ. In order to simplify ourmodel, we set η = θ as employed in our experiments.In the next experiment, we select 500 users from MovieLens dataset and extract 300 users for training users andother 200 as the active users. The initial values for everyparameter and threshold are: λ = 0.7, γ = 30, δ = 25. Wevary the values of η and θ from 0 to 1 with a step value of0.05. For each training user set (100, 200, 300 users respectively), we compute the MAE and density of the user-itemmatrix. The results are showed in Fig. 5.As showed in Fig. 5(a), given 300 training users and given20 ratings for every active user, this algorithm will achievethe best performance around η = θ = 0.50, and the relateddensity of user-item matrix in Fig. 5(d) is 92.64% whichshows that 7.36% missing data of this user-item matrix arenot predicted. In this experiment, the number of data thatwas not predicted is 0.0736×500×1000 = 36800. We observethat around η = θ = 0.70, this algorithm already achievesa very good MAE value which is almost the same as thebest MAE values in Fig. 5(b). The related matrix density is29.00%, which illustrates that more than 70% data of useritem matrix are not predicted. Nevertheless, the algorithmcan already achieve satisfactory performance.6. CONCLUSIONSIn this paper, we propose an effective missing data prediction algorithm for collaborative filtering. By judgingwhether a user (an item) has other similar users (items), ourapproach determines whether to predict the missing dataand how to predict the missing data by using information ofusers, items or both. Traditional user-based collaborative filtering and item-based collaborative filtering approaches aretwo subsets of our new approach. Empirical analysis showsthat our proposed EMDP algorithm for collaborative filtering outperforms other state-of-the-art collaborative filteringapproaches.For future work, we plan to conduct more research onthe relationship between user information and item information since our simulations show the algorithm combiningthese two kinds of information generates better performance.Lastly, another research topic worthy of studying is the scalability analysis of our algorithm.7. ACKNOWLEDGMENTSWe thank Mr. Shikui Tu, Ms. Tu Zhou and Mr. Haixuan Yang for many valuable discussions on this topic. Thiswork is fully supported by two grants from the ResearchGrants Council of the Hong Kong Special AdministrativeRegion, China (Project No. CUHK4205/04E and ProjectNo. CUHK4235/04E).8. REFERENCES[1] J. Basilico and T. Hofmann. Unifying collaborative andcontent-based filtering. In Proc. of ICML, 2004.[2] J. S. Breese, D. Heckerman, and C. Kadie. Empirical analysisof predictive algorithms for collaborative filtering. In Proc. ofUAI, 1998.[3] J. Canny. Collaborative filtering with privacy via factoranalysis. In Proc. of SIGIR, 2002.[4] M. Claypool, A. Gokhale, T. Miranda, P. Murnikov, D. Netes,and M. Sartin. Combining content-based and collaborativefilters in an online newspaper. In Proc. of SIGIR, 1999.[5] M. Deshpande and G. Karypis. Item-based top-nrecommendation. ACM Trans. Inf. Syst., 22(1):143–177, 2004.[6] J. Herlocker, J. A. Konstan, and J. Riedl. An empiricalanalysis of design choices in neighborhood-based collaborativefiltering algorithms. Information Retrieval, 5:287–310, 2002.[7] J. L. Herlocker, J. A. Konstan, A. Borchers, and J. Riedl. Analgorithmic framework for performing collaborative filtering. InProc. of SIGIR, 1999.[8] T. Hofmann. Collaborative filtering via gaussian probabilisticlatent semantic analysis. In Proc. of SIGIR, 2003.[9] T. Hofmann. Latent semantic models for collaborativefiltering. ACM Trans. Inf. Syst., 22(1):89–115, 2004.[10] R. Jin, J. Y. Chai, and L. Si. An automatic weighting schemefor collaborative filtering. In Proc. of SIGIR, 2004.[11] A. Kohrs and B. Merialdo. Clustering for collaborative filteringapplications. In Proc. of CIMCA, 1999.[12] G. Linden, B. Smith, and J. York. Amazon.comrecommendations: Item-to-item collaborative filtering. IEEEInternet Computing, pages 76–80, Jan/Feb 2003.[13] M. R. McLaughlin and J. L. Herlocker. A collaborativefiltering algorithm and evaluation metric that accurately modelthe user experience. In Proc. of SIGIR, 2004.[14] D. M. Pennock, E. Horvitz, S. Lawrence, and C. L. Giles.Collaborative filtering by personality diagnosis: A hybridmemory- and model-based approach. In Proc. of UAI, 2000.[15] J. D. M. Rennie and N. Srebro. Fast maximum margin matrixfactorization for collaborative prediction. In Proc. of ICML,2005.[16] P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, and J. Riedl.Grouplens: An open architecture for collaborative filtering ofnetnews. In Proc. of ACM Conference on ComputerSupported Cooperative Work, 1994.[17] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl. Item-basedcollaborative filtering recommendation algorithms. In Proc. ofthe WWW Conference, 2001.[18] U. Shardanand and P. Maes. Social information filtering:Algorithms for automating word of mouth. In Proc. of SIGCHIConference on Human Factors in Computing Systems, 1995.[19] L. Si and R. Jin. Flexible mixture model for collaborativefiltering. In Proc. of ICML, 2003.[20] L. H. Ungar and D. P. Foster. Clustering methods forcollaborative filtering. In Proc. Workshop onRecommendation System at the 15th National Conf. onArtificial Intelligence, 1998.[21] J. Wang, A. P. de Vries, and M. J. Reinders. Unifyinguser-based and item-based collaborative filtering approachesby similarity fusion. In Proc. of SIGIR, 2006.[22] G.-R. Xue, C. Lin, Q. Yang, W. Xi, H.-J. Zeng, Y. Yu, andZ. Chen. Scalable collaborative filtering using cluster-basedsmoothing. In Proc. of SIGIR, 2005.SIGIR 2007 Proceedings Session 2: Routing and Filtering46View publication stats


Leave a Reply

Your email address will not be published. Required fields are marked *