Food for Artificial Intelligence

Overview of training datasets for agricultural computer vision tasks

Author: Florian Kitzler

The term Artificial Intelligence (AI) is used in various ways for different kinds of things. One reason for that is the absence of a clear definition of intelligence. This results in misinterpretation or confusion with other concepts or terms. Implementations of AI are computer programs and algorithms that try to imitate the human brain’s cognitive skills to automate intelligence behavior. They are often used for pattern analysis, pattern recognition in data, and robotics. As a subset of AI, Machine Learning (ML) aims to generate knowledge from experience. Algorithms used in ML can solve pattern recognition tasks without being explicitly programmed how to do it. In supervised learning, models are being created and parametrized with experience in the form of training data (ground-truth). After this so-called training of the model, it can find the learned patterns in new data, leading to decision support.

Figure 1: Attempt to define Artificial Intelligence, examples for tasks and methods of Machine

Learning, and schematic representation of Deep Learning models.
Artificial Neural Networks (ANN) with a high number of hidden layers, so-called Deep Learning (DL) models, are used for the following tasks in the field of computer vision.

  • Classification – The whole image is categorized
  • Detection – Objects within the image are classified and localized
  • Semantic Segmentation – Each pixel within the image is categorized

Because of their high number of layers and thus many parameters, Deep Learning models require vast amounts of training data. The resulting models are non-mechanistic, meaning that a mathematical formula cannot explain them. They can be considered as black-box models with no comprehensible connection between input and output.

Deep Learning tasks in agriculture

A goal of the project “Integration of plant parameters for intelligent agricultural processes” is to acquire plant parameters from digital images using computer vision methods. Plant species classification will be performed by using Deep Learning models for Semantic Segmentation. In practice, this can be used for weed recognition and decision support for intelligent weed regulation measures. For the training data, images of different plants will be taken and manually annotated. Annotation requirements are dependent on the individual task; in some cases, additional information may be needed for the algorithm to operate correctly. A single label can be used for categorization in classification, whereas in detection, different bounding boxes containing the object are necessary. For the Semantic Segmentation task, each pixel must be assigned to a class. This can be done by drawing polygons around each annotated object in the image. This leads to so-called segmentation masks where a different color represents each class, and the pixel is colored according to their class membership.

Figure 2: Images of soybean (left), bounding box annotation (middle), and segmentation masks (right, green: soybean, yellow: creeping thistle, red: milk thistle, black: soil) for the various growth stages of soybeans (top-down). Used annotation software CVAT [1].

For training, images and annotation data are used for model optimization to achieve a small error between prediction results and ground-truth. The precision and stability of the model depend mainly on the quality of the training data and, therefore, on the annotation quality. This fact is well-known as the phrase garbage in, garbage out. To avoid the outcome of “garbage” from the model, the training data’s quality and quantity must fulfill certain criteria. The images’ selection must be representative of the future practical use case and need to show a high variability to make sure the model works for different circumstances. Depending on the model’s complexity, several hundred or thousands of images need to be acquired and annotated. It is preferred to make use of already existing training datasets since the annotation step is very time-consuming.
In the field of autonomous driving, there are a lot of open access training datasets available to use. These datasets contain very detailed image annotations of street scenes taken from car-mounted cameras. The open-access of these data enables a broad community to work on challenging scientific issues and compare their results with the current state-of-the-art approaches, which lead to a boost in scientific excellence in this research topic.

Table 1: Overview of open access training datasets in the field of autonomous driving and precision agriculture.












Semantische Segmentation

Autonomes Fahren

Cityscapes [3]




Semantische Segmentation

Autonomes Fahren

BDD100K [4]




Semantische Segmentation

Autonomes Fahren





Semantische Segmentation

Karotte vs. Unkraut

Sugar-Beet [6]




Semantische Segmentation

Zuckerrübe vs. Unkraut

DeepWeeds [7]






Agriculture-Vision [8]




Semantische Segmentation



In the research field of precision agriculture, there is only a limited number of open access training datasets, most of which were collected for single application tasks and differ in the acquisition and the used sensors. The dataset Agriculture-Vision contains many images (RGB, NIR, near-infrared) from drone flights to distinguish between 9 types of field anomalies like nutrient deficiency, storm damage, or weed clusters. The datasets Sugar-Beet and CWFID (Crop Weed Field Image Dataset) were collected with RGB and NIR cameras mounted on the mobile field robot BoniRob [9] by using artificial illumination and shadowing of natural light sources. The annotation of these datasets consists of the two classes, carrot vs. weed, and sugar beet vs. weed, to be used for a binary classification task. DeepWeeds is an image dataset of weeds species from the Australian rangelands (RGB images). Each image’s labels can be used for an image classification task as each image contains single plants.
The availability of open access training data of plant images and other agricultural objects with high precision annotation could also boost scientific activities in the field of precision agriculture and smart farming. The application of Deep Learning models in computer vision tasks of agriculture is only at the initial stages. Better availability of open access data can lead to a faster implementation regarding infield practice.


F. Kitzler, „Nahrung für Künstliche Intelligenz: Überblick von Trainingsdaten für die Bildanalyse in der Landwirtschaft“. In: DiLaAg Innovationsplattform [Webblog]. Online-Publikation:, 2020.


[1]  „OpenCV CVAT,“ 10 07 2020. [Online]. Available:
[2]  A. Geiger, P. Lenz, C. Stiller und R. Urtasun, „Vision meets robotics: The kitti dataset,“ The International Journal of Robotics Research, Bd. 32, p. 1231–1237, 2013. 
[3]  M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth und B. Schiele, „The Cityscapes Dataset for Semantic Urban Scene Understanding,“ in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. 
[4]  F. Yu, H. Chen, X. Wang, W. Xian, Y. Chen, F. Liu, V. Madhavan und T. Darrell, „BDD100K: A diverse driving dataset for heterogeneous multitask learning,“ in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020. 
[5]  S. Haug und J. Ostermann, „A Crop/Weed Field Image Dataset for the Evaluation of Computer Vision Based Precision Agriculture Tasks,“ in Computer Vision – ECCV 2014 Workshops, Cham, 2015. 
[6]  N. Chebrolu, P. Lottes, A. Schaefer, W. Winterhalter, W. Burgard und C. Stachniss, „Agricultural robot dataset for plant classification, localization and mapping on sugar beet fields,“ The International Journal of Robotics Research, Bd. 36, p. 1045–1052, 7 2017. 
[7]  A. Olsen, D. A. Konovalov, B. Philippa, P. Ridd, J. C. Wood, J. Johns, W. Banks, B. Girgenti, O. Kenny, J. Whinney und others, „DeepWeeds: A multiclass weed species image dataset for deep learning,“ Scientific reports, Bd. 9, p. 1–12, 2019. 
[8]  M. T. Chiu, X. Xu, Y. Wei, Z. Huang, A. G. Schwing, R. Brunner, H. Khachatrian, H. Karapetyan, I. Dozier, G. Rose und others, „Agriculture-vision: A large aerial image database for agricultural pattern analysis,“ in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020. 
[9]  A. Ruckelshausen, P. Biber, M. Dorna, H. Gremmes, R. Klose, A. Linz, F. Rahe, R. Resch, M. Thiel, D. Trautz und others, „BoniRob–an autonomous field robot platform for individual plant phenotyping,“ Precision agriculture, Bd. 9, p. 1, 2009.