Function to clean outliers outside by some standard deviations
Arguments
- df
dataframe with columns c("Station", "DATE", "Variables", "Values", "Source").
- times_sd
numeric value representing multiplication factor for standard deviation. Values outside mean - sd X times_sd, mean + sd X times_sd are identified as outliers. Optional with default value 3.
Value
list of two dataframes. newdf dataframe contains dataframe cleaned from outliers. dropped dataframe contains data, which was removed from newdf dataframe.
Examples
temp_path <- system.file("extdata", "calibration_data.xlsx", package = "svatools")
cal_data <- load_template(temp_path)
#> [1] "Loading data from template."
#> [1] "Loading of data is finished."
lst <- clean_outliers(cal_data$data)
##Looking at data to be removed
print(head(lst$dropped))
#> # A tibble: 6 × 5
#> Station DATE Variables Values Source
#> <chr> <dttm> <chr> <dbl> <chr>
#> 1 10 2012-12-11 00:00:00 N-NH4 16.8 grab sample
#> 2 10 2016-02-04 00:00:00 N-NH4 6 grab sample
#> 3 10 2016-03-07 00:00:00 N-NH4 4.67 grab sample
#> 4 10 2016-04-04 00:00:00 N-NH4 11.1 grab sample
#> 5 10 2020-01-27 00:00:00 N-NH4 20.1 grab sample
#> 6 10 2013-10-07 00:00:00 N-NO2 0.837 grab sample
##Updating data
cal_data$data <- lst$newdf