L-CSNet: Application of a Lightweight Count-Supervised Network Based on Efficient Multi-Scale Dilated Convolution in Wheat Ear Counting

ZHU Yufeng; CHEN Ling; HUANG Jiayan; LI Chuandong; HUANG Tingwen; ZENG Xiaoyang

doi:10.13718/j.cnki.xdzk.2026.04.013

2026 Volume 48 Issue 4

Article Contents

Previous Article Next Article

ZHU Yufeng, CHEN Ling, HUANG Jiayan, et al. L-CSNet: Application of a Lightweight Count-Supervised Network Based on Efficient Multi-Scale Dilated Convolution in Wheat Ear Counting[J]. Journal of Southwest University Natural Science Edition, 2026, 48(4): 182-195. doi: 10.13718/j.cnki.xdzk.2026.04.013

Citation:

ZHU Yufeng, CHEN Ling, HUANG Jiayan, et al. L-CSNet: Application of a Lightweight Count-Supervised Network Based on Efficient Multi-Scale Dilated Convolution in Wheat Ear Counting[J]. Journal of Southwest University Natural Science Edition, 2026, 48(4): 182-195. doi: 10.13718/j.cnki.xdzk.2026.04.013

L-CSNet: Application of a Lightweight Count-Supervised Network Based on Efficient Multi-Scale Dilated Convolution in Wheat Ear Counting

1.
College of Electronic and Information Engineering, Southwest University, Chongqing 400715, China
2.
State Key Laboratory of Integrated Chips and Systems, Fudan University, Shanghai 200433, China
3.
Faculty of Computer Science and Control Engineering, Shenzhen University of Advanced Technology, Shenzhen Guangdong 518055, China

More Information

Corresponding author: CHEN Ling ;
Received Date: 22/01/2026
Available Online: 20/04/2026
MSC: TP29

Abstract

Current deep learning methods for wheat ear counting predominantly rely on expensive location-level annotations such as bounding boxes or density maps, which require considerable manual annotation effort and are susceptible to annotation noise, thus limiting their practical use in agriculture. To address this issue, this study proposed a new lightweight count-supervised network for high-precision wheat ear counting based on image-level count labels, without requiring location information. The core innovation lay in the design of an efficient multi-scale dilated convolution (EMDC) module, which replaced the traditional computationally expensive structures with parallel dilated convolutions. This enabled efficient extraction of multi-scale features while keeping the number of model parameters to a minimum. Systematic experiments on a public wheat ear detection dataset demonstrated that the proposed method significantly outperforms existing position-supervised approaches in terms of both the mean absolute error (MAE) and root mean square error (RMSE).With an inference speed of 120 frames per second, the network was highly suitable for real-time deployment on resource-constrained devices such as unmanned aerial vehicles (UAVs) or mobile sensors.
- wheat ear counting,
- counting supervision,
- multi-scale dilated convolution,
- lightweightnetwork,
- agricultural automation

References

[1]	ASSENG S, GUARIN J R, RAMAN M, et al. Wheat Yield Potential in Controlled-Environment Vertical Farms[J]. Proceedings of the National Academy of Sciences of the United States of America, 2020, 117(32): 19131-19135. Google Scholar
[2]	TADESSE W, SANCHEZ-GARCIA M, ASSEFA S G, et al. Genetic Gains in Wheat Breeding and Its Role in Feeding the World[J]. Crop Breeding, Genetics and Genomics, 2019, 1(1) : e190005. Google Scholar
[3]	WEN H X, LI C D, WANG X P, et al. Software and Hardware Synergy for Accelerated Plant Disease Identification[J]. Applied Soft Computing, 2025, 174: 112926. doi: 10.1016/j.asoc.2025.112926 CrossRef Google Scholar
[4]	DONG X Y, ZHAO K J, WANG Q, et al. PlantPAD: A Platform for Large-Scale Image Phenomics Analysis of Disease in Plant Science[J]. Nucleic Acids Research, 2024, 52(D1): D1556-D1568. Google Scholar
[5]	PASK A, PIETRAGALLA J, MULLAN D, et al. Physiological Breeding Ⅱ: A Field Guide to Wheat Phenotyping[M]. 景蕊莲, 译. 北京: 科学出版社, 2017. Google Scholar
[6]	COINTAULT F, GUERIN D, GUILLEMIN J P, et al. In-Field Triticum Aestivum Ear Counting Using Colour-Texture Image Analysis[J]. New Zealand Journal of Crop and Horticultural Science, 2008, 36(2): 117-130. doi: 10.1080/01140670809510227 CrossRef Google Scholar
[7]	ALHARBI N, ZHOU J, WANG W. Automatic Counting of Wheat Spikes from Wheat Growth Images[C] //7th International Conference on Pattern Recognition Applications and Methods. Faro: SciTePress-Science and Technology Publications, 2018: 346-355. Google Scholar
[8]	LI L, HASSAN M A, YANG S R, et al. Development of Image-Based Wheat Spike Counter through a Faster R-CNN Algorithm and Application for Genetic Studies[J]. The Crop Journal, 2022, 10(5): 1303-1311. doi: 10.1016/j.cj.2022.07.007 CrossRef Google Scholar
[9]	LI R F, SUN X H, YANG K, et al. A Lightweight Wheat Ear Counting Model in UAV Images Based on Improved YOLOv8[J]. Frontiers in Plant Science, 2025, 16: 1536017. doi: 10.3389/fpls.2025.1536017 CrossRef Google Scholar
[10]	LI X X, ZHANG Z H, WANG J Y, et al. Research on Wheat Spike Phenotype Extraction Based on YOLOv11 and Image Processing[J]. Agriculture, 2025, 15(21): 2295. doi: 10.3390/agriculture15212295 CrossRef Google Scholar
[11]	XIONG H P, CAO Z G, LU H, et al. TasselNetv2: In-Field Counting of Wheat Spikes with Context-Augmented Local Regression Networks[J]. Plant Methods, 2019, 15(1): 150. doi: 10.1186/s13007-019-0537-2 CrossRef Google Scholar
[12]	KHAKI S, SAFAEI N, PHAM H, et al. WheatNet: A Lightweight Convolutional Neural Network for High-Throughput Image-Based Wheat Head Detection and Counting[J]. Neurocomputing, 2022, 489: 78-89. doi: 10.1016/j.neucom.2022.03.017 CrossRef Google Scholar
[13]	WU W, ZHONG X C, LEI C K, et al. Sampling Survey Method of Wheat Ear Number Based on UAV Images and Density Map Regression Algorithm[J]. Remote Sensing, 2023, 15(5): 1280. doi: 10.3390/rs15051280 CrossRef Google Scholar
[14]	LECUN Y, BOSER B, DENKER J S, et al. Backpropagation Applied to Handwritten Zip Code Recognition[J]. Neural Computation, 1989, 1(4): 541-551. doi: 10.1162/neco.1989.1.4.541 CrossRef Google Scholar
[15]	YANG Y F, LI G R, WU Z, et al. Weakly-Supervised Crowd Counting Learns from Sorting rather than Locations[C] //Computer Vision-ECCV 2020. Cham: Springer, 2020: 1-17. Google Scholar
[16]	LIANG D K, CHEN X W, XU W, et al. TransCrowd: Weakly-Supervised Crowd Counting with Transformers[J]. Science China Information Sciences, 2022, 65(6): 160104. doi: 10.1007/s11432-021-3445-y CrossRef Google Scholar
[17]	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is All You Need[J]. Advances in Neural Information Processing Systems, 2017, 30: 6000-6010. Google Scholar
[18]	LI Y X, WU X C, WANG Q, et al. CSNet: A Count-Supervised Network via Multiscale MLP-Mixer for Wheat Ear Counting[J]. Plant Phenomics, 2024, 6: 0236. doi: 10.34133/plantphenomics.0236 CrossRef Google Scholar
[19]	YU F, KOLTUN V. Multi-scale Context Aggregation by Dilated Convolutions[EB/OL]. (2015-11-23)[2025-12-26]. https://www.semanticscholar.org/venue?name=International%20Conference%20on%20Learning%20Representations. Google Scholar
[20]	DAVID E, MADEC S, SADEGHI-TEHRAN P, et al. Global Wheat Head Detection (GWHD) Dataset: A Large and Diverse Dataset of High-Resolution RGB-Labelled Images to Develop and Benchmark Wheat Head Detection Methods[J]. Plant Phenomics, 2020, 2020: 3521852. doi: 10.34133/2020/3521852 CrossRef Google Scholar
[21]	DAVID E, SEROUART M, SMITH D, et al. Global Wheat Head Detection 2021: An Improved Dataset for Benchmarking Wheat Head Detection Methods[J]. Plant Phenomics, 2021, 2021: 9846158. doi: 10.34133/2021/9846158 CrossRef Google Scholar
[22]	SIMONYAN K, ZISSERMAN A. Very Deep Convolutional Networks for Large-scale Image Recognition[EB/OL]. (2014-09-04)[2025-12-26]. https://www.semanticscholar.org/paper/Very-Deep-Convolutional-Networks-for-Large-Scale-Simonyan-Zisserman/eb42cf88027de515750f230b23b1a057dc782108?p2df. Google Scholar
[23]	DONG X Y, WANG Q, HUANG Q D, et al. PDDD-PreTrain: A Series of Commonly Used Pre-Trained Models Support Image-Based Plant Disease Diagnosis[J]. Plant Phenomics, 2023, 5: 0054. doi: 10.34133/plantphenomics.0054 CrossRef Google Scholar
[24]	HU J, SHEN L, SUN G. Squeeze-and-Excitation Networks[C] //2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Pres, 2018: 7132-7141. Google Scholar
[25]	LIU W, ANGUELOV D, ERHAN D, et al. SSD: Single Shot MultiBox Detector[C] //Computer Vision-ECCV 2016. Cham: Springer, 2016: 21-37. Google Scholar
[26]	HE L Q, TODOROVIC S. DESTR: Object Detection with Split Transformer[C] //2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). New York: IEEE Pres, 2022: 9367-9376. Google Scholar
[27]	ZHANG Y Y, ZHOU D S, CHEN S Q, et al. Single-Image Crowd Counting via Multi-Column Convolutional Neural Network[C] //2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). New York: IEEE Pres, 2016: 589-597. Google Scholar
[28]	LI Y H, ZHANG X F, CHEN D M. CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes[C] //2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New York: IEEE Pres, 2018: 1091-1100. Google Scholar
[29]	HE K M, ZHANG X Y, REN S Q, et al. Deep Residual Learning for Image Recognition[C] //2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). New York: IEEE Pres, 2016: 770-778. Google Scholar
[30]	HOWARD A, SANDLER M, CHEN B, et al. Searching for MobileNetV3[C] //2019 IEEE/CVF International Conference on Computer Vision (ICCV). New York: IEEE Pres, 2019: 1314-1324. Google Scholar
[31]	MAHASIN M, DEWI I A. Comparison of CSPDarkNet53, CSPResNeXt-50, and EfficientNet-B0 Backbones on YOLO V4 as Object Detector[J]. International Journal of Engineering, Science and Information Technology, 2022, 2(3): 64-72. doi: 10.52088/ijesty.v2i3.291 CrossRef Google Scholar
[32]	LIU Z, LIN Y T, CAO Y, et al. Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows[C] //2021 IEEE/CVF International Conference on Computer Vision (ICCV). New York: IEEE Pres, 2022: 9992-10002. Google Scholar
[33]	SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization[J]. International Journal of Computer Vision, 2020, 128(2): 336-359. doi: 10.1007/s11263-019-01228-7 CrossRef Google Scholar
[34]	SHAN H X, WEI C Y, RAMOS N, et al. Neuromorphic Computing in the Era of Large Models[J]. Artificial Intelligence Science and Engineering, 2025, 1(1): 17-30. doi: 10.23919/AISE.2025.000002 CrossRef Google Scholar

Access History

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(4) / Tables(7)

Export Citation

PDF

XML

Article Metrics

Article views(106) PDF downloads(16) Cited by(0)

Access History

Other Articles By Authors

on this site
on Google Scholar

HTML

开放科学（资源服务）标识码（OSID）：
自动化技术对提高育种效率和粮食产量意义重大。小麦作为全球重要的粮食作物，为人类提供了约20%的蛋白质与碳水化合物^[1]，且在工业原料、生物燃料以及动物饲料等众多领域有着广泛应用。但小麦产量的增长速度已跟不上不断提升的社会发展需求。有数据表明，小麦需求的年增长率为1.7%，而其遗传增益的年均增长率仅为1%^[2]。自动化技术在农业领域得到广泛运用，如利用软硬件协同，加速农作物病害鉴定^[3]，或通过自动化技术替代人工对作物表型(如株高、颜色、麦穗数量等)进行统计分析^[4]，减少了人力和时间成本，促进了高效育种的开展。

自动化计数筛选具有优良性状的品种是小麦育种中的核心环节。小麦产量作为关键的育种性状，由单位面积的麦穗数、单穗粒数和千粒质量这3个要素共同决定^[5]。传统的育种方式不仅效率低下、耗费大量时间和人力，还容易因为人为操作产生较高误差。所以，实现麦穗的自动化计数对于提高育种效率、节省人力资源十分重要。为了实现这一目标，研究人员早期尝试通过图像处理技术来识别麦穗：文献[6]利用颜色和纹理特征处理技术实现了图像中小麦穗的分割；文献[7]结合Gabor滤波器与K-means聚类算法，完成了麦穗区域的检测与计数。不过，这些传统方法的泛化能力有限，容易受到光照、环境等干扰因素的影响，难以适应复杂场景。

随着深度学习技术的不断突破，以边界框监督和点监督为代表的位置监督方法在麦穗计数领域受到了广泛关注^[8-13]。这些方法在一定程度上解决了一部分传统方法泛化能力差、抗噪声能力弱的问题，但都依赖高成本的位置级图像进行训练，并且边界框监督和点监督的麦穗计数模型大多基于局部感知的卷积神经网络^[14]。然而麦穗密集且多样的位置信息不仅标注成本高昂，还会引入噪声制约模型性能。

本文借鉴了针对人群的计数监督方法^[15-18]，构建了一个只需图像级计数标签即可做到高精度的麦穗计数网络模型(A Lightweight Count-Supervised Network via Efficient Multi-Scale Dilated Convolution，L-CSNet)，其主要的创新之处在于设计了高效多尺度膨胀卷积模块(Efficient Multi-Scale Dilated Convolution，EMDC)，即用并行膨胀卷积^[19]替代计算密集型结构，在高效捕获多尺度麦穗特征的同时将模型参数量大大降低，且没有削弱模型的性能。在GWHD_2020^[20]和GWHD_2021^[21]数据集上的实验结果可以清楚地证明，L-CSNet在平均绝对误差(MAE)和均方根误差(RMSE)等指标上都优于现有方法，因此它也是自动化农业计数任务中更经济实用、更易于推广的理想选择，可以更好地推动深度学习在实际农业场景中的应用落地。通过以上论述，本文的主要研究工作有以下几点：

1) 提出了轻量级计数监督网络L-CSNet：该网络只需对图像级计数标签进行训练，并将模型参数量显著压缩，为模型在边缘设备上的实时部署提供实际支撑。

2) 设计了高效多尺度膨胀卷积(EMDC)模块：该模块通过通道优化、并行多尺度膨胀卷积和轻量级注意力机制，实现了高效且鲁棒的多尺度特征感知。

3) 构建架构与训练协同优化策略：从模型架构和训练策略两方面进行协同设计与优化，确保了训练的稳定性和模型最终性能。

3. 结论

本文针对农业场景下小麦穗计数任务标注成本高、现有模型计算复杂度高难以实时部署的挑战，提出了一种轻量级计数监督网络L-CSNet。该网络模型摒弃了对边界框或密度图等位置标注的依赖，仅需图像级计数标签即可完成训练，显著降低了数据标注成本。

高效多尺度膨胀卷积(EMDC)模块是本文的核心创新。该模块通过并行膨胀卷积与通道注意力机制，实现了在较低的参数量下对多尺度小麦穗特征的高效捕获。在公开数据集GWHD上的大量实验表明，L-CSNet在计数精度、模型轻量化和推理速度方面取得了平衡，在GWHD数据集上，模型的综合性能均优于现有的位置监督和计数监督方法，模型参数量大幅压缩至9.96×10⁶，在单GPU上实现了120 FPS的实时推理速度。

但本论文工作仍然存在可以改进的地方，未来的工作将集中于以下几个方向：一是探索模型在不同作物(如水稻、高粱)计数任务上的泛化能力；二是研究如何将L-CSNet与无人机等移动平台深度集成，实现真正端到端的实时田间作物表型分析；三是借鉴深度学习与类脑计算^[34]的最新进展，提升模型的能效与可扩展性，探究更前沿的农业自动化方法。

Figure (4) Table (7) Reference (34)

Name
	Name cannot be empty!
E-mail
	Mailbox cannot be empty! Mailbox cannot be empty!
Telephone
	Mobile number cannot be empty! Please enter a valid mobile number!
Title

Content
Verification Code

Message Board

L-CSNet: Application of a Lightweight Count-Supervised Network Based on Efficient Multi-Scale Dilated Convolution in Wheat Ear Counting

Abstract

References

Access History

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Access History

Other Articles By Authors