# Matchups of in situ data with satellite data using the ThoMaS match-up toolkit
**Authors:** Anna Windle (NASA, SSAI), James Allen (NASA, Morgan State University), Juan Ignacio Gossn (EUMETSAT)
<br> Notebook modified from Copernicus Marine Training Service [Jupyter Notebook](https://gitlab.eumetsat.int/eumetlab/oceans/ocean-training/sensors/learn-olci/-/blob/main/2_OLCI_advanced/2_4_OLCI_matchup_validation.ipynb?ref_type=heads#section2) developed by Ben Loveday (EUMETSAT/Innoflair UG), Hayley Evers-King (EUMETSAT), Juan Ignacio-Gossn (EUMETSAT)

## Summary

In this example we will conduct matchups of in situ data with PACE OCI satellite data using the ThoMaS (Tool to generate Matchups for OC products with Sentinel-3/OLCI) package. This package provides a comprehensive set of tools to help with the validation of satellite products, supporting many common workflows including:

* satellite data acquisition
* mini file extraction
* in situ data management
* bidirectional reflectance distribution function (BRDF) correction

ThoMaS is written in Python and is made available through a [EUMETSAT Gitlab repository](https://gitlab.eumetsat.int/eumetlab/oceans/ocean-science-studies/ThoMaS). The package can be used from the command line, or imported as a Python library, as done here. This notebook contains an example of how to use ThoMaS, but the capability shown is not exhaustive. Many more command-line examples are included in the repository, and we encourage users to familiarise themselves with both the [project README](https://gitlab.eumetsat.int/eumetlab/oceans/ocean-science-studies/ThoMaS/-/blob/main/README.md) and  [example README](https://gitlab.eumetsat.int/eumetlab/oceans/ocean-science-studies/ThoMaS/-/blob/main/README_examples.md) for more information.

## Learning Objectives

At the end of this notebook you will know:

* How to create a configuration file for the ThoMaS matchup toolkit
* How to run ThoMaS for a full matchup exercise: satellite extractions + minifiles + extraction statistics + matchup statistics
* Use standard matchup protocols to apply statistics and plot matchup data

## Contents 

1. [Setup](#1.-Setup)
2. [Create configuration file for ThoMaS](#2.-Create-configuration-file-for-ThoMaS)
3. [Run ThoMaS](#3.-Run-ThoMaS)

## 1. Setup

Begin by importing all of the packages used in this notebook. 

In [None]:
import sys
import os

We also need to retrieve the toolkit itself. For the hackweek, we have already saved the ThoMaS toolkit under `shared/pace-hackweek-2024/lib/ThoMaS`.

ThoMaS can be used from the [command line](https://gitlab.eumetsat.int/eumetlab/oceans/ocean-science-studies/ThoMaS/-/blob/main/README_examples.md), but here we will use it as a Python library. Lets import ThoMaS into our notebook.

In [None]:
sys.path.insert(1, 'shared/pace-hackweek-2024/lib/ThoMaS')
from main import ThoMaS_main as ThoMaS

We also need to save our Earthdata login credentials in our home directory.

Copy your username and password and store them in a JSON file under
`~/.obpg_credentials.json` (~ stands for your home directory)" <br>
{"username": "john.doe", "password": "jd_1234"}

[back to top](#Contents)

## 2. Create configuration file for ThoMaS

In this example we will conduct matchups of in situ AERONET-OC Rrs data with PACE OCI Rrs data. The Aerosol Robotic Network (AERONET) was developed to sustain atmospheric studies at various scales with measurements from worldwide distributed autonomous sun-photometers. This has been extended to support marine applications, called AERONET â€“ Ocean Color [(AERONET-OC)](https://aeronet.gsfc.nasa.gov/new_web/ocean_levels_versions.html), and provides the additional capability of measuring the radiance emerging from the sea (i.e., water-leaving radiance) with modified sun-photometers installed on offshore platforms like lighthouses, oceanographic and oil towers. AERONET-OC is instrumental in satellite ocean color validation activities. 

In this tutorial, we will be collecting Rrs data from the Casablanca Platform AERONET-OC site located at 40.7N, 1.4W in the western Mediterranean Sea which is typically characterized as oligotrophic/mesotrophic (ocean color signals tend to strongly covary with chlorophyll a). 

Below are our requirements for this workflow:
1. I want to test the performance of PACE OCI at the AERONET-OC station Casablanca_Platform during July 2024.
2. I wish to get matchups between this Casablanca_Platform subset and PACE/OCI Rrs using the standard extraction protocol from [Bailey and Werdell, 2006](https://oceancolor.gsfc.nasa.gov/staff/jeremy/bailey_and_werdell_2006_rse.pdf), using an extraction window of 5x5.
3. I want to apply the [Lee et al. 2011](https://doi.org/10.1364/AO.50.003155) BRDF correction to both satellite and in situ data.
4. Store all outputs in the "Casablanca_Platform" directory. 


Let's define a quick function that helps us to write our configuration options to a file.

In [None]:
# Write config_params sections into config_file.ini
def write_config_file(path_to_config_file,config_params):
    with open(path_to_config_file, 'w') as text_file:
        for section,section_params in config_params.items():
            text_file.write('\n[%s]\n' % (section))
            for param, value in section_params.items():
                text_file.write('%s: %s\n' % (param, value))

Let's first define and create the pathto our main output directory

In [None]:
output_path = os.path.join(os.getcwd(), "Casablanca_Platform")
if not os.path.exists(output_path):
    os.mkdir(output_path)

Let's now define out configuration file.

In [None]:
path_to_config_file = os.path.join(output_path, 'config_file.ini')
config_params = {}

# global
config_params['global'] = {}
config_params['global']['path_output'] = output_path
config_params['global']['SetID'] = 'Casablanca_Platform'
config_params['global']['workflow'] = 'insitu, SatData, minifiles, EDB, MDB'


# AERONETOC
config_params['AERONETOC'] = {}
config_params['AERONETOC']['AERONETOC_pathRaw'] = os.path.join(output_path, 'Casablanca_Platform', 'AERONET_OC_raw')
config_params['AERONETOC']['AERONETOC_dateStart'] = '2024-03-01T00:00:00'
config_params['AERONETOC']['AERONETOC_dateEnd'] = '2024-08-01T00:00:00'
config_params['AERONETOC']['AERONETOC_dataLevel'] = 1.5
config_params['AERONETOC']['AERONETOC_station'] = 'Casablanca_Platform'

# insitu
config_params['insitu'] = {}
config_params['insitu']['insitu_data2OCDBfile'] = 'AERONETOC'
config_params['insitu']['insitu_input'] = os.path.join(output_path, 'Casablanca_Platform_OCDB.csv')
config_params['insitu']['insitu_satelliteTimeToleranceSeconds'] = 3600
config_params['insitu']['insitu_getAncillary'] = False 
config_params['insitu']['insitu_BRDF'] = 'M02' 

# satellite
config_params['satellite'] = {}
config_params['satellite']['satellite_path-to-SatData'] = os.path.join(output_path, 'SatData')
config_params['satellite']['satellite_source'] = 'NASA_OBPG'
config_params['satellite']['satellite_collections'] = 'operational'
config_params['satellite']['satellite_platforms'] = 'PACE'
config_params['satellite']['satellite_resolutions'] = 'FR'
config_params['satellite']['satellite_BRDF'] = 'M02'

# minifiles
config_params['minifiles'] = {}
config_params['minifiles']['minifiles_winSize'] = 5

# EDB
config_params['EDB'] = {}
config_params['EDB']['EDB_protocols_L2'] = 'Bailey_and_Werdell_2006'
config_params['EDB']['EDB_winSizes'] = 5

# MDB
config_params['MDB'] = {}
config_params['MDB']['MDB_time-interpolation'] = 'insitu2satellite_NN'
config_params['MDB']['MDB_stats_plots'] = True
config_params['MDB']['MDB_stats_protocol'] = 'EUMETSAT_standard_L2'

# Write config_params sections into config_file.ini
write_config_file(path_to_config_file, config_params)

[back to top](#Contents)

## 3. Run ThoMaS

Now, let's run this configuration and check our outputs

In [None]:
ThoMaS(path_to_config_file)

If all went well, in our Casablanca_Platform directory you should now have several folders that contain the outputs from the ThoMaS analysis:
* SatData contains the full downloaded products
* SatDataLists contains information on the inventory of downloaded data
* minifiles contains the extracted minifiles
* minifilesLists contains information on the inventory of downloaded data
* EDB, the most important folder, contains the results of the extractions we made from the minifiles.
* Summary plots of matchups

[back to top](#Contents)