Taking AI-ML to ALL-SKY

Variational data assimilation (DA) and machine learning (ML) share fundamental algorithmic similarities, and we aim to combine the strengths of both realms to improve the DA scheme.

A ML-based Approach for Bias Correction of Microwave Radiances in Regional NWP: 

Accurate radiance observations are essential for DA, providing crucial information on atmospheric temperature and humidity. However, systematic biases in these observations with respect to the model can degrade the quality of analyses and forecasts. Variational bias correction (VarBC) is widely used in variation DA systems to adaptively consider such biases at each assimilation cycle, but its implementation in regional NWPs poses challenges due to the low amount of satellite and anchor observations on the regional domain. In this research, we use ML methods to emulate VarBC for the correction of microwave radiance biases in regional NWP. Neural networks, extreme gradient boosting and random forests are trained to predict observation biases, and their predictions are evaluated in a simulated operational setting of the HARMONIE-AROME system in a hybrid ML-DA framework shown in Figure 1. 

To assess the impact of the ML bias correction scheme on the analysis quality, we compare the hybrid ML-DA scheme using the ML bias correction within 4D-Var (ML), the traditional HARMONIE-AROME 4D-Var setup (CTRL) and a denial 4D-Var setup which only assimilates conventional observations (DENIAL). Figure 2 shows the fit of radiosonde observations to the short-range forecast under these different setups. Overall, the ML-based bias correction provides a fast and stable treatment of the biases, yielding performances comparable to that of VarBC.  

The proposed framework addresses key limitations of VarBC in regional models and offers an efficient pathway for bias correction for accelerating model spin-up and facilitating the assimilation of observations from new satellite instruments. These findings demonstrate the potential of a hybrid DA approach within HARMONIE-AROME and emphasize the potential of simple, interpretable ML models to enhance bias correction strategies in variational DA for regional NWPs. This hybrid ML-DA tool can benefit the 11 countries using HARMONIE-AROME in operations. 

Comparison between (a) the traditional incremental 4D-Var scheme in H-A and (b) the hybrid ML-DA H-A shown for a general ML algorithm.
Figure 1: Comparison between (a) the traditional 4D-Var scheme in HARMONIE-AROME and (b) the hybrid ML-DA HARMONIE-AROME shown for a general ML algorithm. In the hybrid ML-DA scheme, the ML bias coefficients for VarBC are inferred for each assimilation cycle with a ML model and are sent to the DA process in HARMONIE-AROME as VarBC is deactivated.
Background std dev. and 95% CI of ML exp and DENIAL exp normalized to CTRL exp of radiosondes observations for upper air temperature, humidity and wind components, respectively.
Figure 2: Background std dev. and 95% CI of ML exp and DENIAL exp normalized to CTRL exp of radiosondes observations for upper air temperature, humidity and wind components, respectively. The observation count is indicated in grey.
TrajDOP's workflow within a three-hour-long assimilation window
Figure 3: TrajDOP's workflow within a three-hour-long assimilation window.

Next steps:

Variational DA and ML can be viewed as cousins due to strong similarities between both realms. Indeed, variational DA finds the analysis by minimizing a cost function that measures the distance between the background and the observations with respect to the state vector and uses the adjoint to calculate the gradient. This optimization framework is equivalent to the minimization of loss functions via backpropagation widely used in ML. As weather prediction moves toward higher resolutions, as the volume of satellite observations continues to grow drastically, and as data-driven forecasts emerge, the classical variational framework is becoming a critical bottleneck in terms of computational scalability, observational throughput, and speed. Therefore, we aim to combine the strength of ML and DA to replace the most expensive step of 4D-Var, the tangent linear and adjoint of the forecast integration component, with ML. The ML model, TrajDOP, will combine information from the observations and NWP fields to provide sub hourly predictions of the atmospheric state in observation space, therefore replacing simultaneously the forecast integration component and the observation operator. TrajDOP’s design is depicted in Figure 3.