College of Agronomy and Biotechnology, Yunnan Agricultural University, Kunming, China.
Medicinal Plants Research Institute, Yunnan Academy of Agricultural Sciences, Kunming, China.
Phytochem Anal. 2024 Oct;35(7):1704-1716. doi: 10.1002/pca.3413. Epub 2024 Jun 27.
Identifying the geographical origin of Gastrodia elata Blume contributes to the scientific and rational utilization of medicinal materials. In this study, infrared spectroscopy was combined with machine learning algorithms to distinguish the origin of G. elata BI.
Realization of rapid and accurate identification of the origin of G. elata BI.
Attenuated total reflection Fourier transform infrared (ATR-FTIR) spectra and Fourier transform near-infrared (FT-NIR) spectra were collected for 306 samples of G. elata BI.
Firstly, a support vector machine (SVM) model was established based on the single-spectrum and the full-spectrum fusion data. To investigate whether feature-level fusion strategy can enhance the model's performance, the sequential and orthogonalized partial least squares discriminant analysis (SO-PLS-DA) model was established to extract and combine two types of spectral features. Next, six algorithms were employed to extract feature variables, SVM model was established based on the feature-level fusion data. To avoid complicated preprocessing and feature extraction processes, a residual convolutional neural network (ResNet) model was established after converting the raw spectral data into spectral images.
The accuracy of the feature-level fusion model is better as compared to the single-spectrum model and the fusion model with full-spectrum, and SO-PLS-DA is simpler than feature-level fusion based on the SVM model. The ResNet model performs well in classification but requires more data to enhance its generalization capability and training effectiveness.
Sequential and orthogonalized data fusion approaches and ResNet models are powerful solutions for identifying the geographic origin of G. elata BI.
鉴定天麻的地理来源有助于对药用材料进行科学合理的利用。本研究采用红外光谱结合机器学习算法对天麻进行产地鉴别。
实现天麻产地的快速准确鉴别。
采集 306 份天麻的衰减全反射傅里叶变换红外光谱(ATR-FTIR)和傅里叶变换近红外光谱(FT-NIR)。
首先,基于单光谱和全光谱融合数据建立支持向量机(SVM)模型。为了考察特征级融合策略是否能提高模型的性能,建立序贯偏最小二乘判别分析(SO-PLS-DA)模型来提取和组合两种光谱特征。然后,采用六种算法提取特征变量,基于特征级融合数据建立 SVM 模型。为避免复杂的预处理和特征提取过程,将原始光谱数据转换为光谱图像后建立残差卷积神经网络(ResNet)模型。
特征级融合模型的准确率优于单光谱模型和全光谱融合模型,SO-PLS-DA 比基于 SVM 模型的特征级融合更简单。ResNet 模型在分类方面表现良好,但需要更多的数据来增强其泛化能力和训练效果。
序贯和正交数据融合方法和 ResNet 模型是鉴别天麻地理来源的有效方法。