What is the difference between n() and count() in R? When should one favour the use of either or both?

0

Issue

Can you please clarify for me when to use the count() versus the n() function?

This will help me to understand why the following codes gave two different outputs.

Programming in R

Code 1

fueleconomy::vehicles %>% 
  distinct(model, make ) %>%
  group_by( model ) %>%
  count() %>%
  filter( n > 1 ) %>%
  arrange( desc( n ))

Code 1 output

A tibble 60 X 2
Groups: model [60]

Code 2

fueleconomy::vehicles %>% 
  distinct(model, make ) %>%
  group_by( model ) %>%
  filter( n() > 1 ) %>%
  arrange( model )

Code 2 output

A tibble 126 X 2
Groups: model [60]

Note: I was expecting the two codes to give the same output but they didn’t. So, I’m confused and would like some clarifications of the main difference between the n() and the count() functions.
Also, when can one use either in favour of the other?
Can both be used together in certain circumstances?

P.s: I’m a beginner with no programming background and self-learning, so, be gentle.

Thank you in advance for your help.

Solution

You cannot directly compare a function with another. The order/sequence in which a function is applied is important and needs to be considered. It is also important to take note of which function was applied before and after.

In this case, applying count, you get one row for each model. It is an aggregated dataframe.

library(dplyr)

count_data <- fueleconomy::vehicles %>% 
  distinct(model, make ) %>%
  group_by( model ) %>%
  count() %>%
  filter( n > 1 ) %>%
  arrange( desc( n ))

count_data

#   model                   n
#   <chr>               <int>
# 1 Coachbuilder Wagon      3
# 2 Conquest                3
# 3 Laser                   3
# 4 Limousine               3
# 5 Truck 2WD               3
# 6 Truck 4WD               3
# 7 200                     2
# 8 240 DL/240 GL Wagon     2
# 9 300E                    2
#10 300SL                   2
# … with 50 more rows

Note the output. It says that 'Coachbuilder Wagon' occurs 3 times, 'Conquest' occur 3 times and so on.

Now compare it with n() output.

n_Data <- fueleconomy::vehicles %>% 
  distinct(model, make ) %>%
  group_by( model ) %>%
  filter( n() > 1 ) %>%
  arrange( model )

n_Data

#   make                   model              
#   <chr>                  <chr>              
# 1 Audi                   200                
# 2 Chrysler               200                
# 3 Mcevoy Motors          240 DL/240 GL Wagon
# 4 Volvo                  240 DL/240 GL Wagon
# 5 Lambda Control Systems 300E               
# 6 Mercedes-Benz          300E               
# 7 J.K. Motors            300SL              
# 8 Mercedes-Benz          300SL              
# 9 Mercedes-Benz          500SE              
#10 Texas Coach Company    500SE              
# … with 116 more rows

This is not an aggregated dataframe and model still have multiple rows.

How are these two output data related?

sum(count_data$n)
#[1] 126

nrow(n_Data)
#[1] 126

Answered By – Ronak Shah

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave A Reply

Your email address will not be published.

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More