how to change value with repetition less than a specific number to "other" in dataframe

0

Issue

I have a data frame that has more than 20 types values in its "S.A" column. I showed a sample of the column below:

structure(list(`temp$S.A[1:30]` = c("Yaletown", "Fairview VW", 
"West End VW", "Fairview VW", "Downtown VW", "Hastings", "Yaletown", 
"Main", "Marpole", "West End VW", "Yaletown", "Yaletown", "Kitsilano", 
"Hastings East", "Grandview VE", "Grandview Woodland", "Downtown VW", 
"Downtown VW", "West End VW", "Downtown VE", "West End VW", "West End VW", 
"West End VW", "Yaletown", "Downtown VW", "West End VW", "Downtown VW", 
"West End VW", "Yaletown", "West End VW")), row.names = c(NA, 
-30L), class = "data.frame") 

if I use table function, I get the result shown below which shows all possible values for S.A in my dataframe:

enter image description here

Now, what I want to do is to Replace names with repetition less than 100 with "other". For example, in the values shown below, "Arbutus" is repeated less than 100 times, so I want to change all "Arbutus" values to "other" in order to reduce the number of variables.
I tried this code to find the names:

    aa <- as.data.frame(table(temp$S.A))
    bb <- subset(aa, aa$Freq < 100)
    cc <- bb[1]

This helps me to find the names, however, I am not sure how to continue and replace them.

Solution

To continue working with what you have you may use –

temp$S.A[temp$S.A %in% cc] <- 'Other'

to change all the values available in cc to "Other".


However, forcats has a function to do this fct_lump_min.

temp$S.A <- forcats::fct_lump_min(temp$S.A, 100)

Answered By – Ronak Shah

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave A Reply

Your email address will not be published.

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More