How to add new column in R data frame showing sum of a value in a current row and a prior row, if certain conditions are met in the 2 rows?

Issue

Suppose you have a data frame of columns "a" and "b" with the values shown below, generated with df <- data.frame(a=c(0, 1, 2, 2, 3), b=c(1, 3, 8, 9, 4)). Suppose you want to add a column "c", whereby if a value in "a" equals the value in the immediately preceding row in col "a", then the corresponding row values in col "b" are summed; otherwise a 0 value is shown. A column "c" is added to the below to illustrate what I’m trying to do:

1  0  1       0
2  1  3       0
3  2  8       0
4  2  9       17 (since the values in col "a" rows 3 and 4 are equal, add the values in col b rows 3 and 4)
5  3  4       0

Or in this scenario, whereby cols "a" and "b" are generated by df <- data.frame(a=c(0,1,2,2,2,3), b=c(1,2,3,4,5,6)):

1  0  1        0
2  1  2        0
3  2  3        0
4  2  4        7 (3+4 from col "b")
5  2  5        9 (4+5 from col "b")
6  3  6        0 (since 2 from prior row <> 3 from current row)

What is the easiest way to do this in native R?

Solution

As we are interested in the adjacent values to be equal, use rleid (from data.table) to create a grouping index, then create the ‘c’, by adding the ‘b’ with lag of ‘b’ and replace the default first value of lag (NA) to 0

library(dplyr)
library(data.table)
library(tidyr)
df %>%
group_by(grp = rleid(a)) %>%
mutate(c = replace_na(b + lag(b), 0)) %>%
ungroup %>%
select(-grp)

-output

# A tibble: 6 × 3
a     b     c
<dbl> <dbl> <dbl>
1     0     1     0
2     1     2     0
3     2     3     0
4     2     4     7
5     2     5     9
6     3     6     0

Or using base R – a similar approach is with the rle to create the ‘grp’, then use ave to do the addition of previous with current value (by removing the first and last) and then append 0 at the beginning

grp <- with(rle(df\$a), rep(seq_along(values), lengths))
df\$c <- with(df, ave(b, grp, FUN = function(x) c(0, x[-1] + x[-length(x)])))