Efficient Recommendation of New Triangular Distance for Multi-channel Feature Vectors

LYU Yalan; ZHANG Hengru; QIN Qin; XU Yuanyuan

doi:10.13718/j.cnki.xdzk.2021.10.003

2021 Volume 43 Issue 10

Article Contents

Previous Article Next Article

LYU Yalan, ZHANG Hengru, QIN Qin, et al. Efficient Recommendation of New Triangular Distance for Multi-channel Feature Vectors[J]. Journal of Southwest University Natural Science Edition, 2021, 43(10): 19-28. doi: 10.13718/j.cnki.xdzk.2021.10.003

Citation:

LYU Yalan, ZHANG Hengru, QIN Qin, et al. Efficient Recommendation of New Triangular Distance for Multi-channel Feature Vectors[J]. Journal of Southwest University Natural Science Edition, 2021, 43(10): 19-28. doi: 10.13718/j.cnki.xdzk.2021.10.003

Efficient Recommendation of New Triangular Distance for Multi-channel Feature Vectors

College of Computer Science, Southwest Petroleum University, Chengdu 610500, China

More Information

Corresponding author: ZHANG Hengru ;
Received Date: 25/05/2021
Available Online: 20/10/2021
MSC: TP391

Abstract

In order to simultaneously enhance the accuracy and efficiency of the recommender system, this paper designs a new triangular distance recommendation algorithm for multi-channel feature vectors. Firstly, we use item's rating matrix to extract multi-channel feature vectors. Secondly, we combine the triangular distance and Jaccard similarity coefficient to conduct a new triangular distance. Finally, we apply this distance to the k-nearest neighbor algorithm to characterize the similarity between two items. The experimental results on 4 real datasets show that the proposed algorithm is more efficient and better accuracy.
- multi-channel feature vector,
- new triangular distance,
- efficient recommendation,
- k-nearest neighbor algorithm

References

[1]	EKSTRAND M D. Collaborative Filtering Recommender Systems[J]. Foundations and Trends © in Human-Computer Interaction, 2011, 4(2): 81-173. doi: 10.1561/1100000009 CrossRef Google Scholar
[2]	ZHANG S C, LI X L, ZONG M, et al. Efficient kNN Classification with Different Numbers of Nearest Neighbors[J]. IEEE Transactions on Neural Networks and Learning Systems, 2018, 29(5): 1774-1785. doi: 10.1109/TNNLS.2017.2673241 CrossRef Google Scholar
[3]	ZHANG S, LIU L X, CHEN Z L, et al. Probabilistic Matrix Factorization with Personalized Differential Privacy[J]. Knowledge-Based Systems, 2019, 183: 104864. doi: 10.1016/j.knosys.2019.07.035 CrossRef Google Scholar
[4]	WEI J, HE J H, CHEN K, et al. Collaborative Filtering and Deep Learning Based Recommendation System for Cold Start Items[J]. Expert Systems With Applications, 2017, 69: 29-39. doi: 10.1016/j.eswa.2016.09.040 CrossRef Google Scholar
[5]	STRANG G. The Discrete Cosine Transform[J]. SIAM Review, 1999, 41(1): 135-147. doi: 10.1137/S0036144598336745 CrossRef Google Scholar
[6]	HU J Y, GAO Z W, PAN W S. Multiangle Social Network Recommendation Algorithms and Similarity Network Evaluation[J]. Journal of Applied Mathematics, 2013, 2013: 248084. Google Scholar
[7]	RICCI F, ROKACH L, SHAPIRA B. Introduction to Recommender Systems Handbook[M]. Springer: Recommender Systems Handbook, 2011: 1-35. Google Scholar
[8]	LI R Z, ZHONG W, ZHU L P. Feature Screening via Distance Correlation Learning[J]. Journal of the American Statistical Association, 2012, 107(499): 1129-1139. doi: 10.1080/01621459.2012.695654 CrossRef Google Scholar
[9]	ZHANG H R, MIN F, ZHANG Z H, et al. Efficient Collaborative Filtering Recommendations with Multi-Channel Feature Vectors[J]. International Journal of Machine Learning and Cybernetics, 2019, 10(5): 1165-1172. doi: 10.1007/s13042-018-0795-8 CrossRef Google Scholar
[10]	ZHANG H R, MIN F, SHI B. Regression-Based Three-Way Recommendation[J]. Information Sciences, 2017, 378: 444-461. doi: 10.1016/j.ins.2016.03.019 CrossRef Google Scholar
[11]	QIAN G, SURAL S, GU Y L, et al. Similarity Between Euclidean and Cosine Angle Distance for Nearest Neighbor Queries[C] // Proceedings of the 2004 ACM Symposium on Applied Computing. New York: ACM Press, 2004: 1232-1237. Google Scholar
[12]	PATRA B K, LAUNONEN R, OLLIKAINEN V, et al. A New Similarity Measure Using Bhattacharyya Coefficient for Collaborative Filtering in Sparse Data[J]. Knowledge-Based Systems, 2015, 82: 163-177. doi: 10.1016/j.knosys.2015.03.001 CrossRef Google Scholar
[13]	CHANG D J, DESOKY A H, MING O Y, et al. Compute Pairwise Manhattan Distance and Pearson Correlation Coefficient of Data Points with GPU[C] //2009 10th ACIS International Conference on Software Engineering, Artificial Intelligences, Networking and Parallel/Distributed Computing. Daegu, Korea (South): IEEE, 2009: 501-506. Google Scholar
[14]	JIA X Y, LI W W, LIU J Y, et al. Label Distribution Learning by Exploiting Label Correlations[C] // Proceedings of the 32nd AAAI Conference on Artificial Intelligence. Menlo park, CA, AAAI, 2018: 3310-3317. Google Scholar
[15]	GENG X, HOU P. Pre-Release Prediction of Crowd Opinion on Movies by Label Distribution Learning[C] //IJCAI'15: Proceedings of the 24th International Conference on Artificial Intelligence, 2015: 3511-3517. Google Scholar
[16]	CHA S H. Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions[J]. International Journal of Mathematical Models and Methods in Applied Sciences, 2007, 1(4): 300-307. Google Scholar
[17]	DEZA E, DEZA M M. Distances in Geometry[M] //Dictionary of Distances. Amsterdam: Elsevier, 2006: 62-80. Google Scholar
[18]	COX T, COX M. Multidimensional Scaling[M]. Chapman and Hall/CRC, 2000. Google Scholar
[19]	WANG J, DE VRIES A P, REINDERS M J T. Unified Relevance Models for Rating Prediction in Collaborative Filtering[J]. ACM Transactions on Information Systems, 2008, 26(3): 1-42. Google Scholar
[20]	SARWAR B, KARYPIS G, KONSTAN J, et al. Item-Based Collaborative Filtering Recommendation Algorithms[C] //Proceedings of the Tenth International Conference on World Wide Web. New York: ACM Press, 2001: 285-295. Google Scholar
[21]	WILLMOTT C J, MATSUURA K. Advantages of the Mean Absolute Error (MAE)over the Root Mean Square Error (RMSE) in Assessing Average Model Performance[J]. Climate Research, 2005, 30: 79-82. doi: 10.3354/cr030079 CrossRef Google Scholar
[22]	BERGER A, GUDA S. Threshold Optimization for F Measure of Macro-Averaged Precision and Recall[J]. Pattern Recognition, 2020, 102: 107250. doi: 10.1016/j.patcog.2020.107250 CrossRef Google Scholar

Access History

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(2) / Tables(6)

Export Citation

PDF

XML

Article Metrics

Article views(1611) PDF downloads(130) Cited by(0)

Access History

Other Articles By Authors

on this site
on Google Scholar

HTML

开放科学(资源服务)标志码(OSID)：
推荐系统是目前解决信息过载的有效手段. 协同过滤^[1]是主流的推荐算法之一，它利用历史评分数据来获取用户对项目的偏好. 协同过滤按照不同的实现方式可以分为基于k近邻^[2]、基于矩阵分解^[3]以及基于神经网络的协同过滤算法^[4]等. k近邻利用历史评分获取k个具有相似偏好的用户或者具有相似属性的项目^[2]，常用表征用户或项目相似度的距离有：Cosine^[5]，PCC(pearson correlation coefficient)^[6]，Jaccard^[7]和CPC(constrained pearson correlation)^[8]. 然而这些算法大都采用用户或者项目的全局评分来计算相似度，导致其时间复杂度较高，推荐效率低.

本文提出了一种多通道特征向量的新三角距离推荐算法(new triangular distance recommendation algorithm for multi-channel feature vector，NTRFC). 算法的输入为从原始评分矩阵中提取的多通道特征向量(简称特征向量)，在k近邻算法中采用新三角距离，从而提高推荐效率并保持较好的推荐准确度.

首先，从原始评分矩阵中提取得到特征向量，其通道数目为原始评分矩阵中评分等级的数目^[9]，将其作为输入，可有效降低算法的复杂度. 假定评分矩阵有n个用户，m个项目，以及l个评分等级. 以原始评分矩阵为输入，计算相似度的时间复杂度为O(nm)，而以多通道特征向量为输入，计算相似度的时间复杂度是O(lm). 评分矩阵中用户数目n远远大于评分等级数目l，故O(lm)远远小于O(nm). 例如，数据集Amazon(http://snap.stanford.edu/data/web-Amazonlinks.html)和Movielens943u (https://grouplens.org/datasets/movielens/100k/)的评分等级均为1~5分，故它们的通道数目为5，即每个项目的特征向量长度为5.

其次，利用两个项目的特征向量构建新三角距离. 该距离将三角距离和Jaccard系数结合. 这是因为在提取特征向量后，损失了用户、项目以及评分之间的关系信息，仅保留用户对项目评分的数量信息. 若仅考虑三角距离，则无法精确判断项目之间的相似度. 考虑到Jaccard系数能充分利用共同评分项目数占所有项目数的比值信息，故结合Jaccard系数，从而在一定程度上弥补了原始评分信息.

最后，将设计的新三角距离用于k近邻算法中，以判断两个项目的相似度. 本文提出的NTRFC算法与基于其他距离的k近邻算法在4个真实数据集上进行对比实验，利用6种准确度指标和运行时间进行评价. 实验结果表明：NTRFC算法运行时间低于已有算法，并在大部分准确度指标上占优.

1. 相关工作

本节介绍评分系统^[10]定义和常见的几种距离，本文使用的符号见表 1.

1.1. 评分系统

现回顾评分系统^[10]的定义，令U={u₁，u₂，…，u_n}为一个推荐系统的用户集合，令T={t₁，t₂，…，t_m}为推荐给用户的项目集合，由此，评分函数定义为

其中，R为一个n×m的评分矩阵；R=(r_ip)_n×m；C表示用户评价每个项目的评分等级构成的集合，如C={1，2，3，4，5}.

表 2给出了一个用户数为5和项目数为6的评分矩阵. 评级为1~5分，则通道数为5. 评分反映出用户对项目的喜爱程度，分值越高表示用户越喜爱该项目，0表示用户未给项目评分. r_ip表示用户u_i给项目t_p的实际评分，G(t_p，t_q)表示对项目t_p和t_q共同评分的用户集合. 例如，r₁₂=3表示用户u₁给项目t₂评分为3分，G(t₁，t₂)={u₁，u₄}表示对项目t₁和t₂共同评分的用户是u₁和u₄.

1.2. 已有的距离

k近邻算法通常计算用户或项目之间的距离来寻找用户或项目的邻居，从而预测用户对项目的评分. 表 3列出了9个常用距离度量公式，并分析它们的时间复杂度.

表 3中，Cosine^[5]，ED^[11]，BC^[12]，PCC^[6]，MD^[13]，SØrensen^[14-15]，Canberra^[16]，Lorentzian^[17]和Divergence^[18]距离的时间复杂度均为O(n)，但BC^[12]距离的时间复杂度为O(l). 其中，n表示输入向量的长度，l表示评分的等级数.

4. 结论

本文提出了基于多通道特征向量的新三角距离高效推荐算法. 多通道特征向量能降低算法的时间复杂度，新三角距离能更精准地描述特征向量之间的相似度. 在4个真实数据集上的实验结果表明，本文算法比已有算法在多个指标上表现得更好.

下一步工作拟替换Jaccard系数，使用其他距离公式与三角距离结合，构建新的距离公式. 另外，考虑将新三角距离应用到可解释性推荐系统中，以期提升可解释性推荐系统的性能.

Figure (2) Table (6) Reference (22)

Name
	Name cannot be empty!
E-mail
	Mailbox cannot be empty! Mailbox cannot be empty!
Telephone
	Mobile number cannot be empty! Please enter a valid mobile number!
Title

Content
Verification Code

Message Board

Efficient Recommendation of New Triangular Distance for Multi-channel Feature Vectors