How to use an "or" conditional in an n_distinct function, in dplyr?

0

Issue

Suppose we start with this data frame:

mydat <- 
  data.frame(
    ID = c(115,115,115,88,88,88,100,100),
    Period = c(1, 2, 3, 1, 2, 3, 1, 2),
    Status_1 = c(1,2,1,1,2,3,2,1),
    Status_2 = c("Open","Open","Terminus","Open","Open","Closed","Open","Open")
  )

> mydat
   ID Period Status_1 Status_2
1 115      1        1     Open
2 115      2        2     Open
3 115      3        1 Terminus
4  88      1        1     Open
5  88      2        2     Open
6  88      3        3   Closed
7 100      1        2     Open
8 100      2        1     Open

Next, we run the following dplyr grouping to sum the number of instances by Period and Status_1, where Status_2 = "Open":

mydat %>%
  group_by(Period,Status_1) %>%
  summarize(StatusCount = n_distinct(ID[Status_2 == "Open"]))

  Period Status_1 StatusCount
   <dbl>    <dbl>       <int>
1      1        1           2
2      1        2           1
3      2        1           1
4      2        2           2
5      3        1           0
6      3        3           0

I’ve been trying to expand the above n_distinct() function to also include Status_2 = "Terminus" (in addition to the "Open" in the above code). I’ve tried various iterations of "or" conditions, and summing tricks, with no luck yet. Any ideas how to do this?

The result, including the Status_2 = "Terminus", would look like this:

Period Status_1 StatusCount
   <dbl>    <dbl>       <int>
1      1        1           2
2      1        2           1
3      2        1           1
4      2        2           2
5      3        1           1
6      3        3           0

Solution

This might work for you. I added a conditional/logical or in the selection.

mydat %>% 
      group_by(Period,Status_1) %>% 
      summarize(StatusCount = n_distinct(ID[Status_2 == "Open"|Status_2 == "Terminus"])) %>% 
      ungroup()
`summarise()` has grouped output by 'Period'. You can override using the `.groups` argument.
# A tibble: 6 x 3
  Period Status_1 StatusCount
   <dbl>    <dbl>       <int>
1      1        1           2
2      1        2           1
3      2        1           1
4      2        2           2
5      3        1           1
6      3        3           0

Answered By – Andre Wildberg

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave A Reply

Your email address will not be published.

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More