How are missings represented in R?




Most obvious answer to the title is that missings are represented with NA in R. Dummy data:

x <- c("a", "NA", "<NA>", NA)

We can transform all elements of x to characters using x_paste0 <- paste0(x). After doing so, the second and fourth elements are same ("NA") and to my knowledge this is why there is no way to backtransform x_paste0 to x.


But working with addNA indicates that it is not just the NA itself that represents missings. In x only the last element is a missing. Let’s transform the vector:

x_new <- addNA(x)
[1] a    NA   <NA> <NA>
Levels: <NA> a NA <NA>

Interestingly, the fourth element, i.e. the missing is shown with <NA> and not with NA. Further, now the fourth element looks same as the third. And we are told that there are no missings because when we try any( we get FALSE. At this point I would have thought that the information about what element is the missing (the third or the fourth) is simply lost as it was in x_paste0. But this is not true because we can actually backtransform x_new. See:

[1] "a"    "NA"   "<NA>" NA

How does as.character know that the third element is "<NA>" and the fouth is an actual missing, i.e. NA?


That’s probably a uncleanness in the base:::print.factor() method.

x <- c("a", "NA", "<NA>", NA)

# [1] a    NA   <NA> <NA>
# Levels: <NA> a NA <NA>


# [1] "<NA>" "a"    "NA"   NA    

So, there are no duplicated levels.

Answered By – jay.sf

This Answer collected from stackoverflow, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0

Leave A Reply

Your email address will not be published.

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More