Durjoy Sabbir Hossain, Shikder Md Emon, Shoib Md Mehedi Hasan, Bijoy Md Hasan Imam
Department of Computer Science and Engineering, Daffodil International University, Daffodil Smart City, Birulia, Dhaka 1216, Bangladesh.
Data Brief. 2025 Apr 28;60:111594. doi: 10.1016/j.dib.2025.111594. eCollection 2025 Jun.
Cauliflower is among the more well-known vegetables there are. Consumed all around the globe due to it being rich in nutrients such as vitamins, antioxidants, and for being high in fibre. These are nutritional qualities that help with digestion, immune-system, and minimizing inflammation. It is a common issue among farmers to have to deal with various diseases in cauliflower leaves that are difficult to diagnose in their early stages. These diseases have a tendency to propagate in a really swift pace throughout entire fields worth of crops. This in-turn causes heavy losses in the harvest, and makes it much more tedious and resource-intensive to protect the crops. As a result, farmers get more likely to use high amounts of pesticides and harmful chemicals to streamline the process of getting a more reliable yield on their crops. This is not only costly, but it is also harmful both to the quality of crops and to the well-being of the environment. In this publication, we are introducing a dataset containing a considerable number of images of cauliflower leaves. This is intended to drive development on this topic at a faster pace than it is now, and to help enhance disease monitoring, diagnosis, and precautionary techniques. We collected our dataset images between November 2024 and January 2025. In this dataset, cauliflower leaves were categorized into three classes: Healthy, Insect Holes, and Black Rot, each reflecting a specific condition that impacts plant health at different stages. This dataset consists of 2,661 images. The pictures were captured at different locations in Bangladesh, under different weather conditions, dates, temperatures, and with different devices. To enhance the data quality, we used several steps to process the dataset, making sure it would reflect real-world conditions and be ready for training. The images were resized to a standard size of 3000 × 3000 pixels, brightness was adjusted to make the images more easily discernible, and we removed duplicates and poor-quality images. These actions helped ensure the dataset was in the best possible shape for effective model training. This dataset will be highly effective for agricultural research, precision agriculture, and effective management of diseases. It should help develop highly accurate machine learning models for early detection of Cauliflower leaf diseases. The dataset is employed to train deep learning models to support automated monitoring and smart decision-making in precision agriculture. This data set also has immense potential for real-time and practical use. It can be utilized to develop applications like mobile apps or automated systems where farmers can easily identify diseases at early stages and take immediate action, without the requirement of expert on-site knowledge. This data set can also be utilized with smart farming equipment like drones and sensors to track big fields in real time.
菜花是较为知名的蔬菜之一。因其富含维生素、抗氧化剂等营养成分且纤维含量高,故而在全球范围内都有食用。这些营养特质有助于消化、增强免疫系统并减少炎症。对于农民来说,处理菜花叶片上各种早期难以诊断的病害是个常见问题。这些病害往往会在整片农田的作物中迅速蔓延,进而导致收成严重受损,并且保护作物会变得更加繁琐且资源消耗大。因此,农民更有可能大量使用农药和有害化学物质来简化获得更可靠收成的过程。这不仅成本高昂,而且对作物质量和环境健康都有害。在本出版物中,我们引入了一个包含大量菜花叶片图像的数据集。旨在比目前更快地推动该领域的发展,并有助于加强病害监测、诊断和预防技术。我们在2024年11月至2025年1月期间收集了数据集图像。在这个数据集中,菜花叶片被分为三类:健康、虫洞和黑腐病,每一类都反映了在不同阶段影响植物健康的特定状况。这个数据集由2661张图像组成。这些图片是在孟加拉国的不同地点、不同天气条件、日期、温度下,使用不同设备拍摄的。为了提高数据质量,我们采取了几个步骤来处理数据集,确保它能反映现实世界的情况并准备好用于训练。图像被调整为3000×3000像素的标准尺寸,调整了亮度以使图像更易于辨别,并去除了重复和质量不佳的图像。这些操作有助于确保数据集处于最佳状态以进行有效的模型训练。这个数据集对于农业研究、精准农业以及病害的有效管理将非常有效。它应该有助于开发用于早期检测菜花叶片病害的高精度机器学习模型。该数据集用于训练深度学习模型,以支持精准农业中的自动监测和智能决策。这个数据集在实时和实际应用方面也有巨大潜力。它可用于开发诸如移动应用程序或自动化系统等应用,农民可以在这些应用中轻松在早期阶段识别病害并立即采取行动,而无需专家现场知识。这个数据集还可与无人机和传感器等智能农业设备一起使用,以实时跟踪大片农田。