# How to replace a column in R by a modified column, dependent on filtered values? (removing outliers in panel data)

## Issue

I have a panel dataset that goes like this

year | id | treatment_year | time_to_treatment | outcome |
---|---|---|---|---|

2000 | 1 | 2011 | -11 | 2 |

2002 | 1 | 2011 | -10 | 3 |

2004 | 2 | 2015 | -9 | 22 |

and so on and so forth. I am trying to deal with the outliers by ‘Winsorize’. The end goal is to make a scatterplot with time_to_treatment on the X axis and outcome on the Y.

I would like to replace the outcomes for each time_to_treatment by its winsorized outcomes, i.e. replace all extreme values with the 5% and 95% quantile values.

So far what I have tried to do is this but it doesn’t work.

```
for(i in range(dataset$time_to_treatment)){
dplyr::filter(dataset, time_to_treatment == i)$outcome <- DescTools::Winsorize(dplyr::filter(dataset,time_to_treatment==i)$outcome)
}
```

I get the error – *Error in filter(dataset, time_to_treatment == i) <- *vtmp* :
could not find function "filter<-"*

Would anyone able to give a better way?

Thanks.

my actual data

where: conflicts = outcome, commission = year of treatment, CD_mun = id.

The concerned time period indicator is time_to_t

Groups: year, CD_MUN, type [6]

type | CD_MUN | year | time_to_t | conflicts | commission |
---|---|---|---|---|---|

chr | dbl | dbl | dbl | int | dbl |

manif | 1100023 | 2000 | -11 | 1 | 2011 |

manif | 1100189 | 2000 | -3 | 2 | 2003 |

manif | 1100205 | 2000 | -9 | 5 | 2009 |

manif | 1500602 | 2000 | -4 | 1 | 2004 |

manif | 3111002 | 2000 | -11 | 2 | 2011 |

manif | 3147006 | 2000 | -10 | 1 | 2010 |

## Solution

For a start you may use this:

```
# The data
set.seed(123)
df <- data.frame(
time_to_treatment = seq(-15, 0, 1),
outcome = sample(1:30, 16, replace=T)
)
# A solution without Winsorize based solely on dplyr
library(dplyr)
df %>%
mutate(outcome05 = quantile(outcome, probs = 0.05), # 5% quantile
outcome95 = quantile(outcome, probs = 0.95), # 95% quantile
outcome = ifelse(outcome <= outcome05, outcome05, outcome), # replace
outcome = ifelse(outcome >= outcome95, outcome95, outcome)) %>%
select(-c(outcome05, outcome95))
```

You may adapt this to your exact problem.

Answered By – timm

**This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0 **