Skip to contents

This function cleans outliers from a dataframe based on some standard deviations.

Usage

clean_outliers(df, times_sd = 3)

Arguments

df

A dataframe with columns "Station", "DATE", "Variables", "Values", "Source". The "Variables" column should contain the names of the variables.

times_sd

(optional) A numeric value representing the multiplication factor for standard deviation. Values outside mean - sd X times_sd and mean + sd X times_sd are identified as outliers. Default times_sd = 3.

Value

A list of two dataframes. newdf contains the dataframe cleaned from outliers. dropped contains the data that was removed from newdf.

See also

Examples

if (FALSE) {
  # Load calibration data from an Excel file
  temp_path <- system.file("extdata", "calibration_data.xlsx", package = "SWATprepR")
  cal_data <- load_template(temp_path)
  
  # Clean outliers from the data
  lst <- clean_outliers(cal_data$data)
  
  # Display data to be removed
  print(head(lst$dropped))
  
  # Update the original data with cleaned data
  cal_data$data <- lst$newdf
}