Nrdly
Get Nrdly Free Trial Built with Nrdly

Data-centric Explainable Machine Learning: Untangling the Complexity of Dense Stacks of EO data

Introduction

To date, many global or regional land cover map products are available to scientists and policy-makers. These land cover maps are needed to help solve critical environmental, climate change, and socioeconomic challenges. While much progress has been made, remote sensing researchers and analysts are yet to solve the problems and difficulties of obtaining helpful information from enormous volumes of Earth Observation (EO) data. Challenges such as the highly heterogeneous nature of urban areas, informal and fragmented settlement areas, and the spectral similarity between various land cover types impede land cover mapping, especially in developing countries.

Over the past decades, remote sensing researchers have been using advanced machine learning methods to improve remote-sensing image analysis. However, advanced machine learning methods have provided limited helpful land cover map products for practical applications due to many reasons. First, the “black-box” nature and selective case-study use of machine learning methods limit their application potential. Second, the lack of quality training data and the class imbalance problem contribute to significant error rates. Third, the lack of domain expertise and understanding of existing machine learning methods leads to suboptimal results, continuous tweaking of models, and biased evaluations. Last but not least, a lack of interpretability and explainability inhibits trust in the models.

The data-centric explainable machine learning approach

Recently, machine learning experts and researchers have been advocating the need for a data-centric and explainable machine learning approach. In the context of remote sensing data analysis, the data-centric approach focuses on continuously improving the quality of training data and using a variety of EO and other ancillary data. The explainable machine learning approach provides more insights into how the models work. To improve the accuracy of machine learning models and the land cover map products, remote sensing researchers should acquire quality training data and use seasonal information derived from dense stacks of EO data (e.g., Sentinel and Landsat imagery). This approach moves from a model-centric approach to a hybrid approach combining data and explainable machine learning methods. In the data-centric approach, domain experts primarily build high-quality training data sets in an iterative model development process.

Next Steps

This blog post will use quality training data, Sentinel 1, Sentinel 2, and the LIME method to gain insights into a random forest model. Readers can access the blog tutorial and download data sets in the links below.

Access the tutorial here

Download data here

There are many resources to learn about data-centric and explainable machine learning. You can also check my book to learn about data-centric explainable machine learning methods.