Sarkar Shagor, Zhou Jing, Scaboo Andrew, Zhou Jianfeng, Aloysius Noel, Lim Teng Teeh
Division of Plant Science and Technology, University of Missouri, Columbia, MO 65211, USA.
Department of Biological Systems Engineering, University of Wisconsin-Madison, Madison, WI 53705, USA.
Plants (Basel). 2023 Aug 8;12(16):2893. doi: 10.3390/plants12162893.
Plant lodging is one of the most essential phenotypes for soybean breeding programs. Soybean lodging is conventionally evaluated visually by breeders, which is time-consuming and subject to human errors. This study aimed to investigate the potential of unmanned aerial vehicle (UAV)-based imagery and machine learning in assessing the lodging conditions of soybean breeding lines. A UAV imaging system equipped with an RGB (red-green-blue) camera was used to collect the imagery data of 1266 four-row plots in a soybean breeding field at the reproductive stage. Soybean lodging scores were visually assessed by experienced breeders, and the scores were grouped into four classes, i.e., non-lodging, moderate lodging, high lodging, and severe lodging. UAV images were stitched to build orthomosaics, and soybean plots were segmented using a grid method. Twelve image features were extracted from the collected images to assess the lodging scores of each breeding line. Four models, i.e., extreme gradient boosting (XGBoost), random forest (RF), K-nearest neighbor (KNN) and artificial neural network (ANN), were evaluated to classify soybean lodging classes. Five data preprocessing methods were used to treat the imbalanced dataset to improve classification accuracy. Results indicate that the preprocessing method SMOTE-ENN consistently performs well for all four (XGBoost, RF, KNN, and ANN) classifiers, achieving the highest overall accuracy (OA), lowest misclassification, higher F1-score, and higher Kappa coefficient. This suggests that Synthetic Minority Oversampling-Edited Nearest Neighbor (SMOTE-ENN) may be a good preprocessing method for using unbalanced datasets and the classification task. Furthermore, an overall accuracy of 96% was obtained using the SMOTE-ENN dataset and ANN classifier. The study indicated that an imagery-based classification model could be implemented in a breeding program to differentiate soybean lodging phenotype and classify lodging scores effectively.
植株倒伏是大豆育种计划中最重要的表型之一。传统上,大豆倒伏情况由育种人员通过目视评估,这既耗时又容易出现人为误差。本研究旨在探究基于无人机图像和机器学习评估大豆育种品系倒伏状况的潜力。使用配备RGB(红绿蓝)相机的无人机成像系统,在生殖阶段收集了一个大豆育种田1266个四行小区的图像数据。由经验丰富的育种人员对大豆倒伏情况进行目视评估,并将评分分为四类,即不倒伏、中度倒伏、高度倒伏和严重倒伏。对无人机图像进行拼接以构建正射镶嵌图,并使用网格法对大豆小区进行分割。从收集的图像中提取了12个图像特征,以评估每个育种品系的倒伏评分。评估了四种模型,即极端梯度提升(XGBoost)、随机森林(RF)、K近邻(KNN)和人工神经网络(ANN),用于对大豆倒伏类别进行分类。使用五种数据预处理方法处理不均衡数据集,以提高分类准确率。结果表明,预处理方法SMOTE-ENN对所有四种(XGBoost、RF、KNN和ANN)分类器均表现良好,实现了最高的总体准确率(OA)、最低的误分类率、更高的F1分数和更高的卡帕系数。这表明合成少数过采样编辑最近邻(SMOTE-ENN)可能是处理不均衡数据集和分类任务的一种良好预处理方法。此外,使用SMOTE-ENN数据集和ANN分类器获得了96%的总体准确率。该研究表明,基于图像的分类模型可应用于育种计划,以有效区分大豆倒伏表型并对倒伏评分进行分类。