Cost-sensitive Label Distribution Learning for Non-Uniform Distributed Data

FAN Jun; ZHANG Hengru; YU Yifan; MIN Fan

doi:10.13718/j.cnki.xdzk.2024.05.004

2024 Volume 46 Issue 5

Article Contents

Previous Article Next Article

FAN Jun, ZHANG Hengru, YU Yifan, et al. Cost-sensitive Label Distribution Learning for Non-Uniform Distributed Data[J]. Journal of Southwest University Natural Science Edition, 2024, 46(5): 40-50. doi: 10.13718/j.cnki.xdzk.2024.05.004

Citation:

FAN Jun, ZHANG Hengru, YU Yifan, et al. Cost-sensitive Label Distribution Learning for Non-Uniform Distributed Data[J]. Journal of Southwest University Natural Science Edition, 2024, 46(5): 40-50. doi: 10.13718/j.cnki.xdzk.2024.05.004

Cost-sensitive Label Distribution Learning for Non-Uniform Distributed Data

1.
College of Computer Science, Southwest Petroleum University, Chengdu 610500, China
2.
Lab of Machine Learning, Southwest Petroleum University, Chengdu 610500, China

More Information

Corresponding author: ZHANG Hengru ;
Received Date: 24/05/2023
Available Online: 20/05/2024
MSC: TP391

Abstract

Learning with label ambiguity has recently been a popular topic in machine learning and data mining research. Label distribution learning (LDL) deals with label ambiguity by assigning probabilistic labels to each instance. Existing LDL methods are designed for training data that is uniformly distributed. However, in real-world applications, the training data is typically not uniformly distributed. In this paper, we propose a cost-sensitive method for label distribution learning (CSLDL) to deal with the non-uniformly distributed training data. We designed a novel loss function by applying the density information of instances. The descriptive set was firstly averaged over multiple bins. The empirical density vector for each class label was then derived by counting the number of instances in these bins. Secondly, in order to construct the continuity between different bins, we employed neighbor samples to modify the empirical density of the target bins. Specifically, we convolved the empirical density vector with a symmetric kernel so that each bin took into account not just the current bin but also nearby bins. Finally, a cost matrix was constructed using the modified density vectors, combined with Kullback-Leibler (K-L) divergence to deal with non-uniformly distributed training data. Experiments were undertaken on ten real-world datasets compared with six state-of-the-art algorithms. Results demonstrate the effectiveness and superiority of our proposed algorithm.
- label distribution learning,
- label ambiguity,
- non-uniformly distributed data,
- cost-sensitive,
- density of instances

References

[1]	CHEN C H, PATEL V M, CHELLAPPA R. Matrix Completion for Resolving Label Ambiguity[C] //2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE, 2015: 4110-4118. Google Scholar
[2]	GAO B B, XING C, XIE C W, et al. Deep Label Distribution Learning with Label Ambiguity[J]. IEEE Transactions on Image Processing, 2017, 26(6): 2825-2838. doi: 10.1109/TIP.2017.2689998 CrossRef Google Scholar
[3]	周亮, 陈辰, 李宁. 基于机器学习和经验模态分解的跨期套利研究[J]. 西南大学学报(自然科学版), 2022, 44(1): 148-159. Google Scholar
[4]	杭立, 车进, 宋培源, 等. 基于机器学习和图像处理技术的病虫害预测[J]. 西南大学学报(自然科学版), 2020, 42(1): 134-141. Google Scholar
[5]	COUR T, SAPP B, TASKAR B. Learning from Partial Labels[J]. Journal of Machine Learning Research, 2011, 12: 1501-1536. Google Scholar
[6]	TSOUMAKAS G, KATAKIS I. Multi-Label Classification[J]. International Journal of Data Warehousing and Mining, 2007, 3(3): 1-13. doi: 10.4018/jdwm.2007070101 CrossRef Google Scholar
[7]	ZHU Y, KWOK J T, ZHOU Z H. Multi-Label Learning with Global and Local Label Correlation[J]. IEEE Transactions on Knowledge and Data Engineering, 2018, 30(6): 1081-1094. doi: 10.1109/TKDE.2017.2785795 CrossRef Google Scholar
[8]	ZHANG M L, WU L. Lift: Multi-Label Learning with Label-Specific Features[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(1): 107-120. doi: 10.1109/TPAMI.2014.2339815 CrossRef Google Scholar
[9]	GENG X. Label Distribution Learning[J]. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(7): 1734-1748. doi: 10.1109/TKDE.2016.2545658 CrossRef Google Scholar
[10]	JIA X Y, LI W W, LIU J Y, et al. Label Distribution Learning by Exploiting Label Correlations[C] //Proceedings of the AAAI Conference on Artificial Intelligence. New Orleans, Lousiana, USA: AAAI Press, 2018: 3310-3317. Google Scholar
[11]	ZHENG X, JIA X Y, LI W W. Label Distribution Learning by Exploiting Sample Correlations Locally[C] //Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, Lousiana, USA: AAAI Press, 2018: 4556-4563. Google Scholar
[12]	容斌元, 徐媛媛, 吕亚兰, 等. 融合标签局部相关性的标签分布学习[J]. 山东大学学报(理学版), 2022, 57(7): 53-64. Google Scholar
[13]	WANG J, GENG X. Label Distribution Learning by Exploiting Label Distribution Manifold[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(2): 839-852. doi: 10.1109/TNNLS.2021.3103178 CrossRef Google Scholar
[14]	KRAWCZYK B. Learning from Imbalanced Data: Open Challenges and Future Directions[J]. Progress in Artificial Intelligence, 2016, 5(4): 221-232. doi: 10.1007/s13748-016-0094-0 CrossRef Google Scholar
[15]	REN T T, JIA X Y, LI W W, et al. Label Distribution Learning with Label-Specific Features[C] //Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence. California: AAAI Press, 2019: 3318-3324. Google Scholar
[16]	CHARTE F, RIVERA A J, DEL JESUS M J, et al. MLSMOTE: Approaching Imbalanced Multilabel Learning through Synthetic Instance Generation[J]. Knowledge-Based Systems, 2015, 89: 385-397. Google Scholar
[17]	TORGO L, RIBEIRO R P, PFAHRINGER B, et al. SMOTE for Regression[C] //Portuguese Conference on Artificial Intelligence. Berlin, Heidelberg: Springer, 2013: 378-389. Google Scholar
[18]	Branco P, Torgo L, Ribeiro R P. Smogn: A Pre-processing Approach for Imbalanced Regression[C] //Proceedings of the First International Workshop on Learning with Imbalanced Domains: Theory and Applications. Skopje, Macedonia: Microtome Publishing, 2017: 36-50. Google Scholar
[19]	WU G Q, TIAN Y J, LIU D L. Cost-Sensitive Multi-Label Learning with Positive and Negative Label Pairwise Correlations[J]. Neural Networks, 2018, 108: 411-423. Google Scholar
[20]	ZHAO X Y, AN Y X, XU N, et al. Continuous Label Distribution Learning[J]. Pattern Recognition, 2023, 133: 109056. Google Scholar
[21]	YANG Y Z, ZHA K W, CHEN Y C, et al. Delving into Deep Imbalanced Regression[C] //International Conference on Machine Learning. Virtual Event: Microtome Publishing, 2021: 11842-11851. Google Scholar
[22]	黄雨婷, 徐媛媛, 张恒汝, 等. 融合标签结构依赖性的标签分布学习[J]. 南京大学学报(自然科学), 2020, 56(4): 524-532. Google Scholar
[23]	NOCEDAL J, WRIGHT S J. Numerical optimization[M]. 2nd ed. New York: Springer, 2006. Google Scholar
[24]	R Y, G X. Sense Beauty by Label Distribution Learning[C] //Proceedings of the 32th International Joint Conference on Artificial Intelligence. Melbourne, Australia: AAAI Press, 2017: 2648-2654. Google Scholar
[25]	NGUYEN T V, LIU S, NI B B, et al. Sense Beauty via Face, Dressing, and/or Voice[C] //Proceedings of the 20th ACM International Conference on Multimedia. Nara, Japan: ACM, 2012: 239-248. Google Scholar
[26]	PENG K C, CHEN T, SADOVNIK A, et al. A Mixed Bag of Emotions: Model, Predict, and Transfer Emotion Distributions[C] //2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, MA, USA: IEEE, 2015: 860-868. Google Scholar
[27]	YANG J F, SUN M, SUN X X. Learning Visual Sentiment Distributions via Augmented Conditional Probability Neural Network[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2017, 31(1): 224-230. Google Scholar
[28]	DALAL N, TRIGGS B. Histograms of Oriented Gradients for Human Detection[C] //2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego, CA, USA: IEEE, 2005: 886-893. Google Scholar
[29]	STRICKER M A, ORENGO M. Similarity of Color Images[C] //Proc SPIE 2420, Storage and Retrieval for Image and Video Databases Ⅲ. San Jose, CA, United States: SPIE Press, 1995: 381-392. Google Scholar
[30]	LIANG L Y, LIN L J, JIN L W, et al. SCUT-FBP5500: a Diverse Benchmark Dataset for Multi-Paradigm Facial Beauty Prediction[C] //2018 24th International Conference on Pattern Recognition (ICPR). Beijing, China: IEEE, 2018: 1598-1603. Google Scholar
[31]	LYONS M, AKAMATSU S, KAMACHI M, et al. Coding Facial Expressions with Gabor Wavelets[C] //Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition. Nara, Japan: IEEE, 1998: 200-205. Google Scholar
[32]	JIA X Y, LU Y N, ZHANG F W. Label Enhancement by Maintaining Positive and Negative Label Relation[J]. IEEE Transactions on Knowledge and Data Engineering, 2023, 35(2): 1708-1720. Google Scholar
[33]	GENG X, YIN C, ZHOU Z H. Facial Age Estimation by Learning from Label Distributions[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(10): 2401-2412. Google Scholar

Access History

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(4) / Tables(4)

Export Citation

PDF

XML

Article Metrics

Article views(9807) PDF downloads(985) Cited by(0)

Access History

Other Articles By Authors

on this site
- FAN Jun
- ZHANG Hengru
- YU Yifan
- MIN Fan
on Google Scholar
- FAN Jun
- ZHANG Hengru
- YU Yifan
- MIN Fan

HTML

开放科学(资源服务)标识码(OSID):
标记歧义问题^[1-2]是当前机器学习和数据挖掘^[3-4]领域中备受关注的热门话题. 单标记学习(Single-Label Learning，SLL)^[5]和多标记学习(Multi-Label Learning，MLL)^[6-8]是当前处理标记歧义问题的2种成熟的学习范式. 单标记学习假设每个样本只能与一种逻辑标记相关联，拥有该标记则为1，否则为0. 而多标记学习则假设每个样本能与一组逻辑标记相关联. 尽管多标记学习相比于单标记学习能够解决更复杂的标记歧义问题，但是这2种学习范式都只能回答标记是否属于一个样本，而无法回答标记对样本的描述程度. 为了解决这一问题，Geng等人^[9]提出了标记分布学习(Label Distribution Learning，LDL). 该学习范式为每个样本分配一组描述度，并假设所有标记的描述度之和为1. 这种学习范式能够准确区分标记对于样本的描述度差异，可适用于更加广泛的应用场景.

现有的LDL算法已经应用了多种方法来提高预测性能. 最初，LDLLC^[10]通过应用全局标记相关性来扩展特征空间以提高性能. 随后，LDL-SCL^[11]、LDL-FLC^[12]建议利用局部标记相关性以提升性能. 最后，LDL-LDM^[13]提出了同时利用局部和全局标记相关性的方法. 除了利用标记相关性之外，LDL-HR^[14]提议首先学习描述度最高的标记，然后学习其他标记，以避免目标不匹配. 而LDLSF^[15]则提出了一种为每个标记选择特定特征的方法，而不是共享所有特征. 然而，这些算法都是在假设训练集是均匀分布的前提下构建的. 而在实际应用中，情况往往并非如此. 一种标记所对应的训练样本的分布通常是非均匀分布的，如图 1所示，其中横坐标表示描述度区间，纵坐标表示在该区间内样本的数量，浅绿色为低密度区域，深绿色为高密度区域. 作为一种数据驱动的学习范式，在使用非均匀数据时往往会忽略来自低密度区域的样本，从而导致预测出现错误^[16-18].

为了处理非均匀分布数据，本文提出了一种新的基于代价敏感的标记分布学习算法(Cost-Sensitive Method for Label Distribution Learning，CSLDL). 我们根据样本的密度构建代价矩阵^[19]，并用它来对传统的K-L散度目标函数进行优化，使其适用于非均匀分布数据. 首先，将描述度集等分为多个区间，并为每一种标记分别统计其在这些区间中样本的经验密度. 标记分布数据作为一种连续型数据，同一种标记的相邻描述度之间不存在严格的界限^[20-21]. 然而，在划分子区间之后，会破坏相邻区间之间的连续性. 为了重构相邻区间之间的连续性，我们利用邻居对经验密度进行修正，使其不仅考虑当前区间的经验密度，还要考虑邻居区间的经验密度. 例如，区间同时考虑了自身和邻居的经验密度，其中设置贡献权重以考虑与目标区间的距离，距离越远权重越小. 修正后，不仅重建了相邻区间的连续性，还得到了更加平滑的分布，如图 1(b)所示. 在现实生活中，连续性数据的变化应该是平滑的，例如人口统计数据，正常情况下，每年的人口只会小幅度增加或减少. 修正后的密度分布也变得更加平滑，可以更好地反映现实世界数据的真实情况，从而增加算法的泛化性. 最后，利用平滑后的密度信息构建代价矩阵，并用它对传统的K-L散度目标函数进行改造，对处于高密度区域的样本设置较小的代价，反之设置较大的代价.

为了验证本文提出的CSLDL方法，我们对多个数据集进行了广泛的实验. 实验结果表明，与6种先进的方法相比，CSLDL预测的结果最贴近真实数据. 本文的主要贡献包括3个方面：①提出了一种代价敏感的标记分布学习算法(CSLDL)，通过样本密度信息以更均衡的方式处理模型对高密度区域样本的偏重；②通过利用近邻修正目标区间的经验密度，不仅重建了相邻区间的连续性，而且得到了更符合现实规律的分布数据；③应用不同数据集的实验结果证明了提出的算法的有效性和优越性.

1. 相关工作

1.1. 符号系统

令X=[x₁；x₂；…；x_n]∈R^n×q表示输入特征空间，Y={y₁，y₂，…，y_o}表示标记集合，D=[d₁；d₂；…；d_n]∈[0, 1]^n×o表示与输入空间相对应的标记空间. 其中，n为样本数量，q为特征维度，o为标记数量. 第i个样本表示为x_i∈X，其第j个标记的描述度表示为d_ij. 每个样本的描述度应满足

1.2. 标记分布学习算法

当前常见的标记学习算法设计主要基于以下3种策略^[22]：

1) 问题转化(Problem Transformation，PT)：这种策略将标记分布学习问题转化为单标记学习问题. 具体而言，训练样本被转化为加权的单标记样本，权重可以由标记的描述度得到，转化后的样本就可以使用单标记学习算法进行处理. 基于此策略的代表算法有PT-Bayes和PT-SVM等.

2) 算法改造(Algorithm Adaptation，AA)：这种策略将传统单标记或多标记学习算法改造为能够处理标记分布问题的算法. 该策略的代表算法有基于k近邻改造的AA-kNN算法以及基于反向传播改造的AA-BP算法.

3) 专用算法(Specialized Algorithm，SA)：这种策略是根据标记分布学习本身固有特性设计的专用算法. 专用算法通常由3个模块组成，包括输出模型、目标函数和优化方法. 现有的专用算法大多采用K-L散度作为目标函数，并假设所有标记的贡献相等来进行模型优化. 然而在实际应用中，训练数据通常呈现不均匀分布，使用传统K-L散度为目标函数训练的模型会忽略来自低密度区域的样本. 为了解决这个问题，本文提出了一种基于代价敏感的标记分布学习算法，该算法利用样本的密度信息来平衡模型对高密度区域样本的偏重.

2. 面向非均匀分布数据的代价敏感标记分布学习算法

2.1. 代价矩阵

为了平衡传统模型对高密度区域样本的偏重，本文采用了对高密度区域样本设置较小代价、对低密度区域样本设置较大代价的方法. 为了满足这一需求，我们使用样本密度的倒数或开方后的倒数作为代价. 经过实验验证，后者的效果更好. 需要注意的是，不同标记所对应的样本分布是不同的，因此每种标记都有一个对应的代价向量.

首先，将[0, 1]划分为m个子区间s₁，s₂，…，s_m，每个区间拥有相同的间隔$\frac{1}{m}$，即$s_{1}=\left[0, \frac{1}{m}\right), s_{2}=$ $\left[\frac{1}{m}, \frac{2}{m}\right), \cdots, s_{m}=\left[1-\frac{1}{m}, 1\right]$，并统计标记y_j在这些子区间的样本密度，从而得到其对应的代价向量：

其中：c_jb表示区间s_b样本密度开方后的倒数.

根据标记分布数据的特点，同一标记的相邻描述度之间不存在严格的界限，而是呈现连续性. 然而，对子区间的划分却在相邻区间之间引入了分界线，从而破坏了它们之间的连续性. 为了重新建立相邻区间之间的连续性，本文采用核函数来提取相邻区间的信息，以保持数据的连续性.

其中：h表示考虑的左右邻居数量，核函数为对称核函数，满足$k(i, b)=k(b, i)$以及$\nabla_{i} k(i, b)+$ $\nabla_{b} k(b, i)=0$. 常见的对称核函数有高斯核、拉普拉斯核等，本文采用拉普拉斯核.

考虑了近邻信息后，不仅能够重建不同区域之间的连续性，还能使样本分布更加平滑，更符合真实分布，从而提高算法的泛化能力.

为了方便计算，结合代价向量和训练数据的标记空间构建n×o的代价矩阵：

其中：$a_{i j}=\hat{c}_{j b}$，仅当样本x_i的第j个标记d_ij位于区域s_b时.

2.2. 标记分布学习

基于代价敏感的标记分布学习算法(CSLDL)符合专用算法设计策略，由输出模型、目标函数、以及优化方法构成.

标记y_j对样本x_i的描述度由条件概率的形式表示，其参数模型记作p(y_j|x_i；θ). 本文采用最大熵模型作为输出模型

其中：$Z=\sum\limits_{j} \exp \left(\sum\limits_{k} \theta_{j, k} x_{i k}\right)$为归一化项，保证x_i所有标记的描述度之和为1；θ是描述特征与标记分布之间关系的系数矩阵，θ_j，k是θ中的一个元素；x_ik是x_i的第k个特征.

为了获取最优参数θ，构建如下目标函数：

其中：L(θ)为定义在训练数据上的损失函数，Ω(θ)为约束模型参数复杂度的正则项，λ控制正则的贡献度.

损失函数用于度量真实标记分布与预测标记分布之间的相似性，常见的损失函数有欧式距离，K-L散度等. 经过实验验证，采用K-L散度效果最佳，

其中：d_ij为标记y_j对样本x_i的真实描述度，p(y_j|x_i；θ)为预测描述度.

传统的K-L散度无法平衡模型对高密度区域样本的偏重，因此本文将其与代价矩阵相结合以平衡这种偏重. 结合代价矩阵的损失函数表示为

本文采用标记分布学习中常用的L2范数来约束模型参数θ，以防止模型过拟合，即：

将公式(8)和公式(9)代入公式(6)，最终目标函数表示为

2.3. 优化

本文采用拟牛顿法(limited-memory quasi-Newton method，L-BFGS)^[23]对目标函数T(θ)进行求解. 对应当前迭代次数的二阶泰勒展开为

其中：$\varDelta=\left(\boldsymbol{\theta}^{(l+1)}-\boldsymbol{\theta}^{(l)}\right)$为参数变化量，$\nabla T\left(\boldsymbol{\theta}^{(l)}\right)$为梯度矩阵，$H\left(\boldsymbol{\theta}^{(l)}\right)$是Hessian矩阵.

式(11)最小化可得

在L-BFGS优化方法中，Δ^(l)被视为搜索方向，选择满足Wolfe准则的α^(l)作为步长，则θ的迭代公式为

L-BFGS的基本思想是避免直接计算牛顿法中使用的逆Hessian矩阵，而用迭代更新的矩阵B来近似逆Hessian矩阵，其表示为

其中：I是单位矩阵，其余变量如下表示

对于目标函数T(θ)的优化，L-BFGS的计算主要与其一阶梯度有关，表示为

得到最佳参数θ^*之后，代入公式(5)求得样本标记分布.

4. 总结与未来工作

标记分布学习展现出了强大的泛化能力，相较于单标记学习和多标记学习，它可以更有效地应对标记歧义问题. 然而，现有的标记分布学习算法主要是针对均匀分布数据而设计的，却常忽视了实际情况下训练数据中普遍存在的不均匀分布现象. 本文提出了一种基于代价敏感的标记分布学习算法，旨在处理非均匀分布数据. 实验结果表明，CSLDL算法在处理这类数据时表现优于大多数现有的标记分布学习算法.

在未来的工作中，我们将尝试探索其他方法来处理非均匀分布数据，以取得更加高效的结果. 例如，可以考虑将数据合成、进行重采样等方法迁移到标记分布学习领域；也可以对原始数据进行增强处理，使其更容易被标记所描述等.

Figure (4) Table (4) Reference (33)

Name
	Name cannot be empty!
E-mail
	Mailbox cannot be empty! Mailbox cannot be empty!
Telephone
	Mobile number cannot be empty! Please enter a valid mobile number!
Title

Content
Verification Code

Message Board

Cost-sensitive Label Distribution Learning for Non-Uniform Distributed Data