Calibration

The Calibration Process

Calibration is the process of tweaking the input parameters and landscape description in the model in order to obtain optimal fit with observations. In this process, you correct for input data errors, uncertainties, and incorrect model assumptions. Validation is the process of applying a calibrated model to another event to validate its predictive power on unseen data. Calibration and validation is often required for application of physically-based models in an operational or official context. There are a couple steps to undertake when doing a calibration:
  1. Gather observation data
  2. Define an objective function
  3. Choose parameters for calibration

Observation Data

Observation data could be of many types, but must relate to a model output variable. In the case of FastFlood, flood extent or peak discharge values are logical options. Flood extent can often be obtained using remote sensing data. Sentinel-1 radar estimates of water cover on the land might be used to generate a flood map. Alternatively, interviews or extrapolation of measurements might be used. For discharge, the peak values are required as FastFlood is a-temporal, and only predicts peak flow heights and peak discharges. The estimated hydrographs are not fit for calibration and validation. We can first visualize the discharge in the model.

FastFlood

Then, we can add an observation on the right location. We can specify either peak discharge, total discharge, or maximum flow height.

FastFlood
FastFlood

Objective Functions

The objective function is a function that takes model output, the observation data and returns a single number indicating goodness-of-fit or model error. Many such metrics exist. For binary data (flood/no-flood), you might classify the pixels into TP (true positive), TN (true negative), FP (false positive) and FN (false negative), and N is the total count ((TN+TP+FN+FP)).
  1. Percentage Accuracy = (TN + TP)/(TN+TP+FN+FP)
  2. Cohen’s Kappa = (((TP + TN)/N) - ((TP + FP)/N + (TN + FN)/N)) / (1 - ((TP + FP)/N + (TN + FN)/N))
  3. F1 score = 2 × TP / (2×TP + FP + FN)
  4. Matthews correlation coefficient = (N × TP - (TP+FP)×(TN+FN)) / √((TP+FP)(TN+FN)(N−(TN+FN))(N−(TP+FP)))
Each of these have some benefits and drawbacks related to the importance of certain aspects of the comparison. Try them all and decide for yourself in the end. Percentage accuracy is a safe option, but can for large areas over-prioritize under-prediction. For variables with real values, other metrics can be used:
  • Pearson correlation
  • RMSE

Parameter Selection

In order to improve the model performance, the right parameters must be altered. It is not desirable to alter too many parameters, as each unique combination must be tested, resulting in many simulations. Typically, Manning’s surface roughness and the infiltration rate (Ksat) should be taken as calibration parameters. In case of levee-systems, channel dimensions might be an additional option to improve fit with channel design capacity.

FastFlood

Calibration Algorithms

The primary way to calibrate once the above choices have been made is to simulate using each unique combination of input parameters. Define a number of parameter values for each parameter. For example, 50, 70, 90, 110, 130 and 150 % of the original value. Use the built-in calibration multipliers to re-scale the input values/maps. Note that the number of parameters exponentially increases the number of simulations (2 parameters → 6×6 = 36 simulations, etc.). If you require higher accuracy, you might define a narrower search area close to the optimal simulation that comes out of the first calibration attempt. The finalized calibration will show a calibration graph, indicating the final optimal settings and how those relate to the parameter space.

FastFlood
FastFlood

Currently, a brute-force approach is available, trying multiple combinations of parameter values. In the future, we aim to include automatic calibration using gradient descent, which doesn’t suffer from the exponential increase in number of simulations. This algorithm instead searches the n-dimensional parameter space using gradients to step to lower error values.