版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請(qǐng)進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡(jiǎn)介
1、A Collaborative Filtering Recommendation Algorithm Based on Interest Forgetting CurveAbstractCollaborative filtering algorithm is one of the most successful technologies used in personalized recommendation system. However, traditional algorithms focus only on the user ratings and do not take the cha
2、nges of user interest into account, which affect recommeandation quality seriously. Based on experiment, this paper first explored the law of user interest changinginterest forgetting curve. Then, use lately rated items to represent the user current interest; for each historily visited item, calcula
3、te the integrated data weight based on interest forgetting curve and the rating matrix; for each item without the user score, calculate prediction based on item similarity and item integrated data weights. While calculating items similarity, this paper compounded item attribute similarity and item s
4、core similarity, which was more comprehensive and accurate. The experimental results show that the proposed algorithm can provide better recommandation precision and recall ratio.Keywords: collaborative filtering (CF), similarity, Interest forgetting curve1. IntroductionWith the development of infor
5、mation technology, E-commerce has become an integral aspect of doing business. Because of the convenience of the internet, a tremendous amount of product-related information is available to customers at the very low cost, which make the users even hard to choose the products according to their refer
6、ences. This is called information overload. To address the issue, recommendation systems have been widely used at many large electronic commerce sites to suggest products, services to potential customers. For example, companies such as A, N, H, and CDNOW have successfully implemented commercial reco
7、mmendation systems1.Two main technologies are usually adopted in personalized service systems: content-based filtering and collaboraive filtering(CF). Content-based filtering methods provide recommendations by comparing the similarity between items and the users interest in the same feature space2.
8、In contrast, collaborative filtering methods offer recommendations to users based on their previously expressed preferences and those of other similar users3. Collaborative filtering (CF) is one of the most promising recommending techniques. Traditional collaborative filtering system is the user-bas
9、ed one. At first, it calculates users similarity based on users historic grade matrix about items; then ranks the similarity and selects the biggest M ones to construct the nearest-neighbor set; then calculates this M nearest neighbors grades on the target item with the similarity as the weights to
10、get user evaluating grade, at last recommends items by these evaluating grades4. However, in traditional CF algorithm, users' interest is considered to be static. That means, ratings produced at different times are weighted equally, and changes in user purchase interest are not taken into consid
11、eration. For this reason, the system may recommend unsatisfactory items when users' interest have changed. To solve this problem, the time factor have been brought into some improved CF algorithm in some researchers working.Zhimin Chen et al.5 considered the changes of user interest and the cred
12、ibility of rating data and gave the user rating a weight by a gradual time decrease and credit assessment in the course of user similarity measurement.Xing Chunxiao et al.6 proposed time-based data weight and item similarity-based data weight to adaptively track the change of user interest· The
13、y used the liner function: y4(x)=48.6-1.09*x as the time weight function, y4(x) is shown by curve y4(x) in figure 1.Zhang Y C et al. 7 put forward the concept of time window, divided users' rating history into several periods, analyzed users' interest distribution in these periods and quanti
14、zed every user's interest. They used the index function: y3(x)=50+50*e-x as the time function, y3(x) is shown by curve y3(x) in figure 1.Yu Hong et al.8 divided the users interest into the long-term interest and the short-term interest and proposed to adapt and trace the drifting of user interes
15、t based on the H.Ebbinghaus forgetting curve. The H.Ebbinghaus forgetting curve is shown by curve y6(x) in figure 1. They got the fitting curvepower function: y5(x)=31.8*x-0.125 based on y6(x), as is shown by curve y5(x) in figure 1.Generally, recently rated items may play a more important role in p
16、redicting the users current interest item, while the early rated datas have relatively little contribution to the final recommendation. So. practices show that all above improved algorithm can enhance the recommending accuracy in some degree. But all above works, no matter which kind of time weight
17、fuction has been employed, the function itself is not proved by experiments, only with assumption by the researchers.H.Ebbinghaus forgetting law points out that forgetting beggin immediately after the study, and the process of forgetting is not uniform, first fast, then slowing down gradually, keepi
18、ng and forgetting is a function of time. In the laboratory, he used meaningless syllables as memory materials, the experimental data was drawn as a curve, as is called H.Ebbinghaus forgetting curve. Later, other researchers use other memory materials to repeat the same expeiment, such as prose, rhyt
19、hm poem. Experimental results show that different memory materials have different forgetting curve, prose is forgotten more slowly than meaningless syllables, rhythm poem is even more slowly than the other two. Figure 1. Different forgetting curvesIn the collaborative filtering system, the historily
20、 visited items represent the user interest. Because of the keeping of the interest, the user tends to choose similar ones with the historily visited items. Because of the forgetting of the interest, items choosen by the user later is more and more dissimilar with the given historily visited one as t
21、ime goes on. This is the law of forgetting about interestthe special memory material. This paper will explore the interest forgetting law based on experiments, modify the traditional collaborative filtering algorithm, take the change of user interest into account while recommending items to improve
22、the system prediction quality.2. Items similarityAccurate similarity calculation method is the foundation of successful CF recommendation system. Traditional CF algorithm calculate similarity only based on item scores. In our recommendation system, we propose a kind of more comprehensive method of s
23、imilarity calculation. We do that from two aspects: item score similarity and item attribute similarity.2.1. Item score similarityThere are m users U1,U2,Um, n items I1,I2,In in a CF recommendation system. We use r(p,q) to indicate use Up score on item Iq, 1<=p<=m, 1<=q<=n. Usually we us
24、e a m * n matrix Rm * n to express the user-item score set.There are many kinds of similarity calculation method, such as cosine similarity, Pearson correlation similarity, condition probability 6, etc. This paper chose condition probability to calculate item score similarity. For items Ii and Ij, c
25、onsidering that similarity is symmetrical, we use to express the condition probability visited by the same users. It can be used to measure item similarity. Formula to calculate the item score similarity is as below: (1)Where, sims(Ii,Ij) is the item score similarity between items Ii and Ij, freq(i)
26、 denotes the user number who rated item Ii , freq(ij) denotes the user number who rated both item Ii and item Ij , , z is a correction factor, in order to prevent that freq(i) or freq(j) is too big to make the similarity calculation result too small.2.2. item attribute similarityGenerally speaking,
27、a recommendation system must set up a number of attributes to describe items. For example, in the experimental dataset of Movielens, the system sets up such attributes for film items as movie title, release date,video release date, IMDb URL and action, adventure, Animation, et al. We choose some of
28、them to construct a attribute set A1,A2,AK, a(q,t) denotes the value of item Iq on attribute At, 1<=q<=n,1<=t<=k. if item Iq has attribute At, then a(q,t)=1, on the contrary, if item Iq has not attribute At, then a(q,t)=0. Formula to calculate the item attribute similarity is as below: (
29、2)Where, sima(Ii,Ij) is the item attribute similarity between items Ii and Ij, bt is the weight of attribute At, indicating the importance of At , iif(a(i,t)=a(j,t),1,0) indicates that if a(i,t)=a(j,t),then the value is 1, else 0.2.3. Item comprehensive similarityWe combine above item score similari
30、ty and item attribute similarity to get item comprehensive similarity, the computation formula is as follow: (3)Where, sim(Ii,Ij) is the item comprehensive similarity between items Ii and Ij, a is a weight coefficient of item attribute similarity.3. Interest forgetting curveSuppose user Uu visited i
31、tem Ii at the moment T0, it indicates that user Uu had the interest expressed by item Ii. Because of the momory of this kind of interest, user Uu tended to select items with high similarity with item Ii after the moment T0. With the passage of time, because of the forgetting of this kind of interest
32、, items selected by user Uu should have lower and lower similarity with item Ii. Suppose user Uu visited item Ij1 at the moment T1, T1=T0+t1, item Ij2 at the moment T2, T2=T0+t2, as shown in figure 2. From the point of view of statistics , if t2>t1>0, we should get sim (Ii, Ij1) > sim (Ii,
33、Ij2). We research the change rule of sim(Ii,Ij) by t based on experiment to explore the interest forgetting law.TT0T1T2User:UuItem: IiItem:Ij1Item:Ij2t1t2sim(Ii,Ij1)sim(Ii,Ij2)Figure 2. The order of the user visiting items and item similarityH.Ebbinghaus pointed out in his forgetting law that forget
34、ting is “ fast first,slowly later”. In the same way, the process of interest forgetting should be “ fast first,slowly later” too. In order to complete the study about the forgetting law of “interest”the special memory material, we applied the “time window” concept put forward by Zhang Y C et al.7. B
35、uild a “short first, long later” time window series, namely divide the time axis into some increasing periods. The length of the first time window is T1=a, one of the second is T2=a+q, ones of the later are increased stepped by q. So, the length of the Kth time window is Tk=a+(k-1) q. On the time ax
36、is, it occupies from (k-1)a+(k-1)(k-2) q/2 to ka+k(k-1) q/2, as is shown in figure 3.Twk0Tw1Tw2aaa+qa+2qa+(k-1)qa+3qTw3Tw4Ta2a+q3a+2q4a+3q(k-1)a+(k-1)(k-2) q/2ka+k(k-1) q/2Figure 3. Time windowsThis paper used the film review scoresMovie Lens as the experimental dataset which is supplied by GroupLen
37、s research project (Http:/). This dataset owns 943 users grades on 1682 movies. This paper has chosen the dataset with 100,000 evaluation records for experiments. We sorted the whole dataset by the user ID and the score time, then selected the first 80% as the training set, the last
38、 20% as the test set.We calculated comprehensive similarity for all items based on formula (1), (2), (3), (here, a=0.2). For any an item pair (Ii, Ij), its similarity is sim(Ii,Ij), IJ is the user set who have visited both of them. For any user Uk (UkÎIJ), calculate the time interval user Uk vi
39、sited item pair (Ii, Ij), t(Uk,Ii,Ij)=|T(Uk,Ii)-T(Uk,Ij)|. Then according to the division rule of time window, find the corresponding time window TW(Uk ,Ii,Ij).For each such item pair (Ii, Ij) and each such user Uk (UkÎIJ), assemble item pairs similarity value sim(Ii,Ij) repeatedly into corresp
40、onding TW(Ii,Ij,Uk), then average the similarity value for each time window and calculate its relative value to the first time window.In this way, we got the change curve of sim(Ii,Ij) by t, as is shown by curve y1(x) in figure 1. Use the software Zgrapher to get the fitting curve y2(x) of y1(x), y2
41、(x)=90.27-2.04lnx, as is shown by curve y2(x) in figure 1.By comparing curve y6(x) and y2(x), we find that, the forgetting process of nonsense syllables and that of interest both comply with the “fast first, slowly later” rule, but the former is much faster than the later. After a long time, the for
42、mer memory amount remain stable at about 20%; on the contrary, the later at about 80%.4. Collaborative filtering algorithm based on interest forgetting curveThe main point of interest forgetting curve based CF algorithm is as below: In a recommendation system, there are m users and n items. There is
43、 a historily visited item set u for each user Uu, it represents user Uus historic interest. There is another item set u-T consisting of items visited by user Uu in the last period of time T, it represents user Uus current interest. The algorithm takes the change of user interest into consideration.
44、For any item Ii (IiÎu), calculate the similarity weight between item Ii and current interest set u-T from two aspects: First, based on item-item similarity, calculate average similarity gui between Ii and items belonging to set u-T. This weight coefficient gui has nothing to do with the time us
45、er Uu visiting item Ii. We name it similarity-based data weight. Then, calculate the coefficient fui indicating user interest change based on the time use Uu visiting item Ii and the interest forgetting curve. We name the coefficient fui as time-based data weight. At last, we compound these two data
46、 weight coefficients into the integrated data weight wui between item Ii and item set u-T.After that, for each target item It (ItÏu), calculate the average of item similarity between item It and item Ii (IiÎu) with wui as weight coefficient to get the recommendation degree, and recommend i
47、tems based on the recommendation degree.4.1. Calculate item similarity and construct items nearest-neighbor modelIn order to produce high-quality recommendation and guarantee the real-time requirements, the system doesnt need to calculate recommendation degree for each item without user rating. On t
48、he contrary, the system calculate item-item similarity off-line, sort such similarity from high to low, select several items with high similarity to construct the nearest-neighbor model for each item. Then the system only has to calculate recommendation for items belong to the nearest-neighbor set o
49、f target user historily-visiting items. In this way, the system workload is reduced and the real-time response performance is improved. Items nearest-neighbor model is shown in table 1. Table 1. Items nearest-neighbor model ItemsNeighbors and similarity(sorted by similarity descendingly)I1I1,1,sim(I
50、1,I1,1)I1,2,sim(I1,I1,2)I1,k,sim(I1,I1,k)I1,nn,sim(I1,I1,nn)I2I2,1,sim(I2,I2,1)I2,2,sim(I2,I2,2)I2,k,sim(I2,I2,k)I2,nn,sim(I2,I2,nn)InIn,1,sim(In,In,1)In,2,sim(In,In,2)In,k,sim(In,In,k)In,nn,sim(In,In,nn)Where, In,k is the kth nearest-neighbor of item In, nn is the neighbor number for each item. Alg
51、orithm to construct items nearest-neighbor model is as below.Algorithm 1.Step 1: calculate the item score similarity sims(Ii,Ij) between items Ii and Ij;Step 2: calculate the item attribute similarity sima(Ii,Ij) between items Ii and Ij;Step 3: calculate the item comprehensive similarity sim(Ii,Ij)
52、between items Ii and Ij;Step 4: for each item, sort sim(Ii,Ij) from high to low, select the first nn to construct the nearest-neighbor set.4.2. Calculate data weights between items and users current interest4.2.1. similarity-based data weightDetermine the time length T according to the system's
53、actual need. For each use Uu, select items visited by Uu during the last T to construct the item set u-T. While the ratings are very sparse, we select recently-visited n items to construct set u-T. We use set u-T to represent use Uu s current interest.For each item Ii (IiÎu) visited by user Uu,
54、 calculate the similarity-based data weight gui based on formula (4). (4)Where, |u-T| denotes the item number belonging to set u-T .4.2.2. Time based data weightFor use Uu, suppose the time Tu-last when use Uu visited the last item as a benchmark of time, suppose the time when use Uu visited item Ii
55、 as Tui. We calculate item Iis time-based data weight fui based on interest forgetting curve and formula (5). (5)4.2.3. Integrated data weightFor each item Ii (IiÎu) visited by use Uu, we calculate the integrated data weight wui based on formula (6). (6)4.3. Calculate recommendation degree and
56、recommend itemsFor any target item It ( ItÏu), basing on the similarity between item It and item Ii ( IiÎu)and item Iis integrated similarity weight, calculate item Its recommendation degree, sort recommendation degree from high to low, select the first N items to construct the top-N recom
57、mendation set. We calculate recommendation degree based on formula (7): (7)Where |u| denotes the item number of set uAlgorithm to calculate recommendation degree and construct recommending set is as below.Algorithm 2.Step 1: According to the nearest-neighbor model produced by Algorithm 1., select al
58、l neighbors of user Uu historily-visiting items Ii (Iiu), Remove repetitive items, get item set c.Step 2: Remove items already rated by user Uu from setc to get item set ct, namely ct = c-u. We name ct as user Uus candidate recommending set.Step 3: For each item It belonging to set ct, calculate recommendation degree recommend(Uu,It) based on formul
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請(qǐng)下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請(qǐng)聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲(chǔ)空間,僅對(duì)用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對(duì)用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對(duì)任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請(qǐng)與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對(duì)自己和他人造成任何形式的傷害或損失。
最新文檔
- 個(gè)人安全防護(hù)用品管理制度(二篇)
- 供應(yīng)部業(yè)務(wù)主管安全生產(chǎn)崗位責(zé)任制范文(2篇)
- 2024年企業(yè)員工五一勞動(dòng)節(jié)演講稿例文(3篇)
- 2021年10月廣西玉林市玉州區(qū)重點(diǎn)產(chǎn)業(yè)招商工作專班人員公開招聘模擬卷(一)
- 啟動(dòng)儀式活動(dòng)方案樣本(3篇)
- 2024年高考動(dòng)員會(huì)校長發(fā)言稿(3篇)
- 2024年公司年會(huì)慶典致辭(6篇)
- 廠內(nèi)機(jī)動(dòng)車輛安全管理制度樣本(4篇)
- 干吸崗位操作規(guī)程(3篇)
- 2024年幼兒園中班保育員工作計(jì)劃樣本(2篇)
- 初高中銜接的初步調(diào)查與研究
- 廣東省深圳市2023-2024學(xué)年高一物理上學(xué)期1月期末考試含解析
- 兒童藝術(shù)療愈課程設(shè)計(jì)
- (高清版)DB37T 5007-2024 建筑光伏一體化應(yīng)用技術(shù)規(guī)程
- 部編人教版語文九年級(jí)上冊(cè)教案(全冊(cè))
- 2024年貴州省高考物理試卷
- 2024至2030年中國青海省旅游金融行業(yè)運(yùn)行態(tài)勢(shì)及未來發(fā)展趨勢(shì)預(yù)測(cè)報(bào)告
- 疫苗免疫原性增強(qiáng)與表位優(yōu)化
- 《抗美援朝》教案課件
- 蘇科版八年級(jí)數(shù)學(xué)上冊(cè)講練專題復(fù)習(xí)實(shí)數(shù)章末重難點(diǎn)題型(原卷版+解析)
- CJT 437-2013 垃圾填埋場(chǎng)用土工濾網(wǎng)
評(píng)論
0/150
提交評(píng)論