基于不平衡情感分类的Lasso-Lars特征选择方法研究
Feature Selection in Imbalanced Sentiment Classification: A Method Using Lasso-Lars
-
摘要: 基于Lasso回归和支持向量机分类器,首先利用Lasso回归具有变量筛选的特点,过滤部分不重要的特征,然后利用支持向量机分类器做情感提取.在某化妆品品牌的评论数据实验中,利用基础情感词典和领域情感词典构建待选择高维特征集,通过对比特征选择前后的G-means,精确度和召回率等,均取得显著效果.Abstract: The characteristics of textual emotion analysis are usually of high dimension and sparseness. Lasso has a simple and efficient trait in feature selection. This paper introduces the Lasso regression into the unbalanced emotion analysis and achieves remarkable results. Applying emotional analysis in e-commerce plays an important role in improving product quality and improving service, which attracts many researchers and has high research value. In fact, the number of positive comments on e-commerce data generally exceeds the number of bad reviews. If the feature selection is not reasonable, it is easy to ignore the bad reviews, and the bad reviews are the key to analyzing the problems. Based on the Lasso regression and SVM classifier, this paper first uses Lasso regression to filter the features that have variable screening, filters some unimportant features, and then makes use of SVM classifier to extract the emotion. In a cosmetic brand's reviewing data experiment, the basic emotion dictionary and domain sentiment lexicon are used to construct the high-dimensional feature set to be selected, and the significant effects are achieved by comparing G-means before and after feature selection, accuracy and recall.
-
Key words:
- imbalanced sentiments classification /
- feature selection /
- Lasso .
-
-
[1] 张林, 钱冠群, 樊卫国, 等.轻型评论的情感分析研究[J].软件学报,2014, 25(12):2790-2807. [2] 张紫琼, 叶强,李一军.互联网商品评论情感分析研究综述[J].管理科学学报, 2010,13(6):84-96. [3] 赵妍妍, 秦兵, 刘挺. 文本情感分析[J].软件学报, 2010,21(8):1834-1848. [4] 娄德成, 姚天昉.汉语句子语义极性分析和观点抽取方法的研究[J].计算机应用, 2006,26(11):2622-2625. [5] 朱嫣岚,闵锦,周雅倩, 等.基于HowNet的词汇语义倾向计算[J].中文信息学报, 2006,20(1):14-20. [6] 魏韡, 向阳, 陈千.中文文本情感分析综述[J].计算机应用, 2011,31(12):3321-3323. [7] MEDHAT W,HASSAN A,KORASHY H.Sentiment Analysis Algorithms and Applications:A Survey[J].Ain Shams Engineering Journal,2014,5(4):1093-1113. [8] TIBSHIRANIR.Regression Shrinkage and Selection via the Lasso[J].Journal of the Royal Statistical Society,1996,58(1):267-288. [9] WANG Z,SHOUSHAN L I,ZHU Q,et al.Chinese Sentiment Classification on Imbalanced Data Distribution[J].Journal of Chinese Information Processing,2012,26(3):33-32. [10] 王志昊,王中卿,李寿山, 等.不平衡情感分类中的特征选择方法研究[J].中文信息学报,2013,27(4):113-118. [11] TONG S,KOLLER D.Support Vector Machine Active Learning with Applications to Text Classification[J].Journal of Machine Learning Research,2001,2(1):999-1006. [12] 许建豪.采用向量空间模型的个性化信息检索方法[J].华侨大学学报(自然科学版), 2016,37(2):175-178. [13] BRADLEY A P.The Use of the Area Under the ROC Curve in the Evaluation of Machine Learning Algorithms[J].Pattern Recognition,1997,30(7),1145-1159. -
计量
- 文章访问数: 786
- HTML全文浏览数: 512
- PDF下载数: 46
- 施引文献: 0