Skip to contents

Function to clean outliers outside by some standard deviations

Usage

clean_outliers(df, times_sd = 3)

Arguments

df

dataframe with columns c("Station", "DATE", "Variables", "Values", "Source").

times_sd

numeric value representing multiplication factor for standard deviation. Values outside mean - sd X times_sd, mean + sd X times_sd are identified as outliers. Optional with default value 3.

Value

list of two dataframes. newdf dataframe contains dataframe cleaned from outliers. dropped dataframe contains data, which was removed from newdf dataframe.

Examples

temp_path <- system.file("extdata", "calibration_data.xlsx", package = "svatools")
cal_data <- load_template(temp_path)
#> [1] "Loading data from template."
#> [1] "Loading of data is finished."
lst <- clean_outliers(cal_data$data)
##Looking at data to be removed
print(head(lst$dropped))
#> # A tibble: 6 × 5
#>   Station DATE                Variables Values Source     
#>   <chr>   <dttm>              <chr>      <dbl> <chr>      
#> 1 10      2012-12-11 00:00:00 N-NH4     16.8   grab sample
#> 2 10      2016-02-04 00:00:00 N-NH4      6     grab sample
#> 3 10      2016-03-07 00:00:00 N-NH4      4.67  grab sample
#> 4 10      2016-04-04 00:00:00 N-NH4     11.1   grab sample
#> 5 10      2020-01-27 00:00:00 N-NH4     20.1   grab sample
#> 6 10      2013-10-07 00:00:00 N-NO2      0.837 grab sample
##Updating data
cal_data$data <- lst$newdf