Skip to contents

The calibration of the SWAT+ model is crucial for ensuring its accuracy and reliability in simulating hydrological processes and predicting the impacts of land management practices on water resources. Calibration involves adjusting the model’s parameters to match observed data, such as streamflow, sediment yield, and nutrient loads, thereby improving its representation of the real-world watershed conditions. This process enhances the model’s predictive capability, making it a more effective tool for water resource management, policy-making, planning and in the context of assessing the effects of climate change, land use alterations, and conservation practices. This page provides steps and tools to perform a hard calibration of SWAT+ models using scripted workflows.

1. Loading required packages

The first step in the hard calibration workflow is to load the necessary R packages. These packages provide functions and tools for data manipulation, visualization, and model calibration. The following code chunk loads the required packages for the hard calibration workflow.

2. Defining settings

The next step is to define the settings for the hard calibration workflow. This includes specifying the SWAT+ model path, input files, calibration parameters, observed data, and other relevant information. The following code chunk sets up the necessary settings for the hard calibration workflow.

# Path to the SWAT+ model
model_path <- '../test/my_dearest_model'

# Path to the crop data file
qobs_path <- '../inst/extdata/q_cha.csv'

# Define the path to the observed WQ data
wqobs_path <- '../inst/extdata/no3.csv'

# You can define the path to the other observed data, like groundwater depth data, etc.
gwobs_path <- '../inst/extdata/gw.csv'

# Set the path to save results of the soft calibration
sc_res <- '../test/simulations'

# Set the number of cores available for calculations.  
# Number of cores for 12 runs use 3, 4, 6, or 12 cores (if available)
cores <- 3

# Channel IDs where gauges are located at the channel outlets.
cha_ids <- c(67)

# Define the simulation save file names to be used in SWATrunR
save_file_name <- "sim1"

# Number parameter combinations to run
n_comb <- 100

# Fraction of the best parameter combinations to be used for the final 
# WQ calibration
n_best <- NULL # example n_best <- .02 # 2% of the best parameter combinations, 
# if NULL, n_best = 20/n_comb

# Number of combination drawn for the nutrient parameters (this number 
# is multiplied by the number of selected parameter combinations for the flow.
# For example n_comb = 1000, n_best .1, means 100 best runs selected, multiplied
# with n_nutr = 100 results in 10,000 model runs for the nutrient calibration.
# If null, 1000/(n_comb*n_best) formula applied resulting in 1000 runs)
n_nutr <- NULL

# Number of cores to be used for the parallelization of the model runs
n_cores <- NULL ## If NULL, all available physical cores - 2 will be used

# Define time period to be used in model runs. Please note this period should
# match your management data period, if it was prepared with SWATfarmR. You can
# safely modify 'start_date_print' and 'end_date' to define the period for which
# you want to print the model output. However, please take great care, if you 
# modify 'start_date' as you might need to update your management files too. 

# Start date for model runs (if NULL, the start date of the model setup files 
# will be used!!!)
start_date <- NULL # start_date <- '2000-01-01'
# End date for model runs (if NULL, the end date of observation data will be used).
end_date <- NULL # end_date <- '2010-12-31'
# Start printing model output (if NULL, the start date of observation data will used).
start_date_print <- NULL # start_date_print <-'2003-01-01'

# Define calibration and validation periods. Please note that the validation 
# should take around 1/3 and calibration 2/3 of the total period. Also the 
# workflow will run all simulations for the whole period (defined above), 
# these periods are only used for the performance calculation and plotting.

# Q calibration period
q_cal_period <- c('2007-01-01', '2010-01-01')
# Calibration end date
q_val_period <- c('2011-01-01', '2022-12-31')

# WQ calibration period
wq_cal_period <- c('2012-01-01', '2017-12-31')
# Calibration end date
wq_val_period <- c('2018-12-31', '2022-12-31')

3. Preparing parameters sets for the flow (and water quality) calibration

The suggested list and ranges of SWAT+ parameters for hydrology calibration are divided into groups based on the primary processes they affect. Some groups are considered optional. These parameters help fine-tune the model to achieve accurate hydrological simulations, ensuring that various aspects of the hydrological cycle are properly represented.

It is important to note that calibration and validation of all hydrology and water quality/load related parameters could be done in one step. For this, you need to define the nutrient parameters together with hydrology. Please refer to step 3 on the Extend calibration page for an example of how to define sediment, nitrogen, and phosphorus parameters. The example here presents only hydrology parameters.

par_bound <- tibble(
  # Snow (optional - use if average snow fall to precipitation ratio is higher 
  # than 5%)
  'snomelt_tmp.hru | change = absval' = c(-1.5, 1.5),
  'snofall_tmp.hru | change = absval' = c(-1.5, 1.5),
  # ET (note: it is suggested that a narrow range for esco selected in soft 
  # calibration of water balance is used instead of the wide (0,1) range)
  'esco.hru | change = absval' = c(0.01, 0.95),
  'epco.hru | change = absval' = c(0.05, 1),
  'awc.sol | change = relchg' = c(-0.25, 0.25),
  'canmx.hru | change = relchg' = c(-0.5, 0.5),
  #surface runoff
  'cn2.hru | change = relchg' = c(-0.15, 0.15),
  'cn3_swf.hru | change = absval' = c(0, 1),
  'ovn.hru | change  = relchg ' = c(-0.25, 0.25),
  'surlag.bsn | change = absval' = c(0.05, 4),
  # lateral flow (optional - use if lateral flow constitutes at least 5% of total 
  # water yield)
  #'lat_time.hru | change = relchg' = c(-0,15, 0.15),
  'lat_len.hru | change = abschg' = c(-30, 30),
  'latq_co.hru | change = absval' = c(0, 1),
  'bd.sol | change = relchg' = c(-0.25, 0.25),
  'k.sol | change = relchg' = c(-0.5, 2),
  # tile flow (optional - use if tile flow constitutes at least 5% of total water
  # yield; note: tile_lag and tile_dtime should be active only if tile_drain is 
  # set to 0 in codes.bsn file))
  #'tile_dep.hru | change = relchg' = c(-0.1, 0.2),
  #'tile_lag.hru | change = absval' = c(48, 100),
  #'tile_dtime.hru | change = absval' = c(48, 100),
  #percolation/aquifer
  'perco.hru | change = absval' = c(0, 1),
  'flo_min.aqu | change = abschg' = c(-2, 2),
  'revap_co.aqu | change = absval' = c(0.02, 0.2),
  'revap_min.aqu | change = abschg' = c(-2, 2),
  'alpha.aqu | change = absval' = c(0.001, 0.5),
  # 'sp_yld.aqu | change = absval' = c(0.001, 0.05),
  'bf_max.aqu | change = absval' = c(0.5, 2),
  #channel routing
  'chn.rte | change = absval' = c(0.02, 0.1)
)

# Draw a first parameter set of parameter combinations 
par_flow <- sample_lhs(par_bound, n_comb)

3a. (Optional) Calibration of parameters for HRU groups

For some parameters it might be important to differentiate calibration values based on some characteristic of areas. For instance the parameters cn3_swf (soil water adjustment factor for CN3) and latq_co (lateral flow coefficient) are initialized based on the runoff potential of a Hydrologic Response Unit (HRU), while the parameter perco (percolation coefficient) is initialized based on its leaching potential. perco, cn3_swf, and latq_co have normalized range 0-1, which sampled as single value, but better approach is considering their different initial values. Therefore, the parameter sampling could be conducted as follows:

  • From 'hydrology.hyd', HRU IDs with low, moderate, and high leaching and runoff potentials are drawn.
  • These IDs are compiled into text strings required for parameter definition via the parameter names.
  • Parameter boundaries or the different low, moderate, and high potentials are defined.
  • The normalized parameter ranges are translated into the defined boundaries.

This is a very reduced set for testing purposes and does not include all parameters that could be used in model setups.

# Read the hydrology.hyd file to identify the HRU IDs with the different 
# runoff and leaching potentials.
hyd <- read_tbl(paste0(model_path, '/hydrology.hyd'))

# Generate the ID text strings to be included in the parameter names so that 
# parameter values for HRUs can be addressed separately.
id_lch_pot <- id_text_strings('perco', c(low = 0.01, mod = 0.50, high = 0.95), hyd)
id_run_pot <- id_text_strings('cn3_swf', c(low = 0.95, mod = 0.30, high = 0.00), hyd)

# Define parameter boundaries in which the parameter values should vary for 
# the different leaching and runoff potentials.
perco_bound <- list(low = c(0.05, 0.30), mod = c(0.30, 0.60), high = c(0.5, 0.90))
cn3_bound   <- list(low = c(0.50, 0.95), mod = c(0.15, 0.45), high = c(0.0, 0.30))
latq_bound  <- list(low = c(0.01, 0.30), mod = c(0.10, 0.40), high = c(0.5, 0.90))

# Translate the normalized parameter ranges into the defined boundary ranges.
for(i in c('low', 'mod', 'high')) {
  if(nchar(id_lch_pot[i]) > 0) {
    par_flow[[paste0('perco_', 
                     i, '::perco.hru | change = absval | unit = c(', 
                     id_lch_pot[i], ')')]] <-  
      par_flow[['perco.hru | change = absval']] * (perco_bound[[i]][2] - perco_bound[[i]][1]) + perco_bound[[i]][1]
  }
  if(nchar(id_run_pot[i]) > 0) {
    par_flow[[paste0('cn3_', i, '::cn3_swf.hru | change = absval | unit = c(', 
                     id_run_pot[i], ')')]] <- 
      par_flow[['cn3_swf.hru | change = absval']] * (cn3_bound[[i]][2] - cn3_bound[[i]][1]) + cn3_bound[[i]][1]
    par_flow[[paste0('latq_', i, '::latq_co.hru | change = absval | unit = c(', 
                     id_run_pot[i], ')')]] <- par_flow[['latq_co.hru | change = absval']] * 
      (latq_bound[[i]][2] - latq_bound[[i]][1]) + latq_bound[[i]][1]
  }
}

# Remove the initial normalized parameters.
par_flow <- par_flow %>% 
  select(-starts_with(c('perco.hru', 'cn3_swf.hru', 'latq_co.hru')))
# Or this method in case you get an error (e.g., due to hru number >10k)
# par_flow <- par_flow[, !grepl("^perco.hru|^cn3_swf.hru|^latq_co.hru", names(par_flow))]

4. Running simulations

Prepared parameter sets are used to run simulations. The SWAT+ model is executed for the defined period, and the daily hydrographs (in this example) are stored for the channel IDs. The simulation is run in parallel, and the number of cores used can be defined. The simulation results are stored in the defined output folder. Observed discharge data are read to obtain the period needed for the runs.

# Read the observed discharge data to obtain period needed for the runs.
obs_full <- read_csv(qobs_path) 

unlink(paste0(model_path, "/", save_file_name, '_q'), recursive = TRUE) # Remove 
# the previous saved runs if they exist.

# Simulate daily discharge for the channel IDs.
sim_flow_full_bck <- run_swatplus(project_path = model_path,
                         output = list(
                           flo_day = define_output(file = 'channel_sd_day',
                                                   variable = 'flo_out',
                                                   unit = cha_ids)
                         ),
                         parameter = par_flow,
                         start_date = start_date,
                         end_date = end_date,
                         start_date_print = lubridate::as_date(
                           ifelse(length(start_date_print)==0, min(obs_full$date), 
                                  start_date_print)),
                         # Get the number of cores to be used for parallel processing.
                         n_thread = ifelse(length(cores) == 0, 
                                           parallel::detectCores(logical = FALSE) - 2, cores),
                         save_file = paste0(save_file_name, '_q')
) 

# If you have to remove the runs that did not finish, use the following code.
sim_flow_full <- remove_unsuccesful_runs(sim_flow_full_bck)

# If you have to load the runs from the saved runs in the data bases, use the
# following code.
# sim_flow <- SWATrunR::load_swat_run(paste0(model_path, '/',  paste0(save_file_name, '_q')))

5. (Optional) Getting concentrations

If you need to calculate and use concentrations together with flow values in the next steps, use the get_conc function. It has two optional arguments: not_conc and sediment. The not_conc argument is a regular expression that defines the variables which should not be converted to concentrations. By default, it is set to “^flo_day|^gwd”. The sediment argument is a regular expression that defines the variables which should be identified as sediment variables. The default for this argument is “^sed”.

# Get the concentrations for the daily discharge.
sim_flow_full <- get_conc(sim_flow_full)

6. Readjusting observation and simulation periods

To align the dates of the simulated and observed data for the calibration period (or validation), the fix_dates function could be used. This function adjusts the dates to ensure that the simulated data matches the observed data and can also trim the dataset to a specified time range.

# Align the dates of the simulated and observed data for the discharge calibration period.
tmp <- fix_dates(sim_flow_full, obs_full, trim_start = q_cal_period[1], 
                 trim_end = q_cal_period[2])
sim_flow <- tmp$sim
obs <- tmp$obs
## Example of usage for trimming the data to a specific period.
# tmp <- fix_dates(runr_obj = sim_flow, obs_obj = obs, 
# trim_start = "2010-01-01", trim_end = "2015-01-01")

# In case you want to use water quality data, read it here before applying the following code.
# obs2 <- read_csv(wqobs_path) 
# tmp <- fix_dates(sim_flow_full, obs2, trim_start = wq_cal_period[1], trim_end = wq_cal_period[2])
# obs2 <- tmp$obs

7. Calculating performance metrics

To calculate the performance metrics for the simulated discharge, the sim object containing the simulated discharge data and the obs object with the observed discharge data are required. The parameter par_name specifies the variable for which the performance metrics should be calculated; if not provided, it defaults to the first variable in the simulation list. For cases with multiple variables, specifying par_name is necessary. The perf_metrics parameter allows the selection of performance metrics for the calculation, which will influence the final ranking used in the selection of parameter sets in step 7. If perf_metrics is not provided, all available metrics will be used. The period parameter defines the time interval for calculating performance metrics, such as “week”, “month,” or “year”. The fn_summarize parameter specifies the function to summarize data over the defined period, such as “mean”, “median”, “max”, “min” or “sum”.

# Calculate the performance metrics for the daily discharge.

obj_tbl <- calculate_performance(sim_flow, obs)

# Example of usage: 
# obj_tbl <- calculate_performance(sim = sim_flow, obs, "flo_day", c("kge", "nse"), 
# "month", "sum")

7a. Calculating performance metrics for multiple variables

The performance metrics can also be applied to multiple variables within the simulation list. In this scenario, the performance metrics will be calculated separately for each variable, and subsequently, the mean of the performance ranks will be determined. Additionally, weights can be assigned to each variable in this process, allowing for a more nuanced evaluation that takes into account the relative importance of each variable.

# Example of usage:
var_to_use <- c("flo_day_52",  "no3_day_52_conc", "gwd")
var_weights <- c(0.5, 0.3, 0.2) ## The sum of weights should be 1.
periods <- list(q_cal_period, wq_cal_period, c('2007-01-01', '2011-12-26'))

# Read the another observed data
# In case you want to use water quality data, read it here before applying the 
# following code.
obs2 <- read_csv(wqobs_path) 
obs3 <- read_csv(gwobs_path) 

# Calculate the performance metrics for each variable separately.
obj_tbl_m <- calculate_performance_2plus(sim_flow_full, var_to_use, 
                                         list(obs, obs2, obs3), periods, 
                                         var_weights)

8. Selecting the best parameter sets

To select the best-performing parameter sets for continuing next iteration of calibration or going into water quality (WQ) calibration or proceeding into validation, you can follow two methods. The first method is based on the rank of performance metrics. Filter the parameter sets to include only the best-performing ones. The number of best sets (n_best) parameters in step 2 determines this. If n_best = NULL , then 20 the best sets are selected, otherwise this numebr is calculated based on specified ratio. The selected run IDs are obtained by arranging the parameter sets by their total rank and selecting the top fraction.

n_best <- ifelse(length(n_best) == 0, 20/n_comb, n_best)

run_sel_ids <- obj_tbl %>% 
  arrange(rank_tot)%>%
  top_frac(-n_best, wt = rank_tot) %>% 
  pull(run_id) %>% 
  as.numeric

The second method is based on the thresholds of performance metrics, which should be defined by the user based on the plots. For example, you can select the parameter sets where the Nash-Sutcliffe efficiency (NSE) is greater than 0.5, the Kling-Gupta efficiency (KGE) is greater than 0.5, and the absolute percent bias (PBIAS) is less than 15. After selecting the run IDs based on these criteria, you can examine the selected parameter sets by printing the run IDs and extracting the corresponding rows from the obj_tbl data frame.

run_sel_ids <- as.numeric(str_remove(names(which(obj_tbl$nse > 0.5 & 
                              obj_tbl$kge > 0.5 & 
                              abs(obj_tbl$pbias) < 15)), 'run_'))

You can examine the selected parameter sets by running the following code.

print(paste0("Selected run ids are: ", paste(run_sel_ids, collapse = ", ")))
obj_tbl_cal<- obj_tbl[run_sel_ids,]

After this proceed to Plot results page for guidance of the visualization of the results. Also you can proceed to Extend calibration page for guidance on how to add additional parameters or build another iteration to the SWAT+ model calibration process.