Multi-Modal Deep Learning

Project leader: Prof. Dr. Marius Kloft (Department of Computer Science)



  • Philipp Liznerski

Video: Presentation at Plant 2030 - Status Seminar 2021

In AVATARS, one of the major aims is to predict seed properties with a multi-modal deep learning model that is able to process the heterogeneous data modalities collected in AVATARS. At the beginning of the project, we focus on high throughput data such as the hyperspectral and CT data that is going to be generated in deep phenotyping experiments, as we find that these are the most promising types of data containing rich high-dimensional information. The data is stored and managed by the IPK and NPZi developed AVATARS data hub, which provides access to structured data and meta data via SQL or RESTful services in conjunction with an online accessible file system for binary data. In order to feature on-premise compute, local deployable data containers are compiled on demand.

Figure 1: An Overview of Multimodal Network Structure for Deep Learning

For a deep machine learning model, it is hard to learn how to deal with completely different data modalities. We have to develop a deep learning architecture that is able to perform a deep integration of data sources. As a starting point, we use our own prior work, which was done in the project „DeepIntegrate" in cooperation with NPZi. For each modality, this architecture requires a specifically crafted subnetwork, whose features are used in a final fusion layer for prediction (see Figure 1). In AVATARS, we developed an unsupervised-learning based approach that detects anomalies in images [1]. While the features that are learned during this process are useful for our multi-modal setup, the method itself can also be independently applied to CT scans of rapeseeds, as it can be readily extended to three-dimensional data. Our algorithm can be used to detect anomalous seeds (i.e., seeds that are going to germinate abnormally) or dead seeds (i.e., seeds that are not going to germinate at all). Additionally, we designed our algorithms to be explainable. So, instead of it behaving like a black box that predicts an output without justification, it explains its decision by providing anomaly heatmaps that are marking the regions in the input that the network deems to be anomalous.

Figure 2: Example of a Lung Scan with corresponding Prediction and Explanation

Since the CT scans in AVATARS are ongoing, we have instead tested our approach on a freely available dataset of CT scans of lungs, where the aim is to distinguish between lungs of persons that are healthy or infected by Covid-19. Some first experiments yielded promising results with an AUC (Area Under the ROC curve) score of roughly 90%, which is a score rating the performance of anomaly detectors. Figure 2 shows an example of an input scan and corresponding anomaly heatmap. In a next step, we will apply our algorithm to the soon-to-be available AVATARS rapeseed scans to predict germination capabilities.


P. Liznerski, L. Ruff, R. Vandermeulen, B. Franks, M. Kloft, and K.-R. Müller. Explainable deep one-class classification. Proceedings of the International Conference on Learning Representations (ICLR), 2021. [PDF]