R: Data frame, for each row find cases with same value on a variable and add them to row with additional information (dplyr, pivot?)

0

Issue

I have a data frame in R that looks something like this:

event_ID date person_ID
1234 2000-08-09 5678
2345 2001-09-10 6789
1234 2000-08-09 7890
3456 2001-10-11 5678

For each person, I want to find all cases where another person has been at the same event as that person, which person it was and the date of the event. I want to add all cases of co-occurence with another person with their ID and date for each row/person_ID of the dataframe. Ideally, the result should look like this:

event_ID date person_ID 1st_person_met_ID 1st_person_met_date 2nd_person_met_ID 2nd_person_met_date
1234 2000-08-09 5678 7890 2000-08-09

I’ve played around with dplyr and it seems like it should be able to create the result I want, but I haven’t figured out exactly how to combine the tools. Any help is appreciated!

Solution

With a left_join, some data wrangling and pivot_wider you could do:

Note: Not sure whether you want to keep observation where no other persons have been met. I kept these cases and assigned them an NA.

library(dplyr)
library(tidyr)

df %>% 
  left_join(df, by = c("event_ID"), suffix = c("", "_met")) %>% 
  rename(person_met_ID = person_ID_met, person_met_date = date_met) %>% 
  #### Deal with cases where no persons have been met
  add_count(event_ID, person_ID) %>% 
  filter(n < 2 | !person_ID == person_met_ID) %>%
  select(-n) %>% 
  mutate(person_met_ID = ifelse(!person_ID == person_met_ID, person_met_ID, NA),
         person_met_date = ifelse(!person_ID == person_met_ID, person_met_date, NA)) %>% 
  #### 
  group_by(event_ID, person_ID) %>% 
  mutate(id_met = scales::ordinal(row_number())) %>% 
  ungroup() %>% 
  pivot_wider(names_from = id_met, values_from = c(person_met_ID, person_met_date), names_glue = "{id_met}_{.value}")
#> # A tibble: 4 × 5
#>   event_ID date       person_ID `1st_person_met_ID` `1st_person_met_date`
#>      <int> <chr>          <int>               <int> <chr>                
#> 1     1234 2000-08-09      5678                7890 2000-08-09           
#> 2     2345 2001-09-10      6789                  NA <NA>                 
#> 3     1234 2000-08-09      7890                5678 2000-08-09           
#> 4     3456 2001-10-11      5678                  NA <NA>

Answered By – stefan

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave A Reply

Your email address will not be published.

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More