[[
more and [
, $
lesslibrary(dplyr)
There are lots of ways to index different data structures in R (i.e. extract particular components). It’s confusing. I’m going to illustrate some of the possibilities and explain why it’s better to use [[
-indexing rather than one of the other options whenever you can. Most of what appears below is stated either explicitly or implicitly in help("Extract")
, but good luck figuring it out …
tl;dr you should use [[
rather than any of the other options when extracting a single element (item or column) from a vector or list or data frame.
I use “!!!” below to indicate trouble spots.
Indexing methods:
[[
(lists, data frames [DFs], atomic vectors)[
(lists, DFs, atomic vectors, and matrices)$
(lists and DFs)The overlap between list/DF/matrix indexing methods is not surprising because data frames are lists, so anything that works with a list should work with a DF. DFs also look like matrices (but aren’t!), so matrix-style indexing usually works. We can also think about subset()
(including its little-used select=
argument) and tidyverse’s select()
/filter()
verbs as indexing methods, but that’s beyond the scope of this document. For the moment we will lump tidyverse tibbles in with DFs, although we mention a few important distinctions below.
!!! “vector” is very confusing terminology in R. Technically lists are vectors too:
A vector in R is either an atomic vector i.e., one of the atomic types, see ‘Details’, or of type (‘typeof’) or mode ‘list’ or ‘expression’.
99.5% of the time when R users say “vector” they mean “atomic vector” (i.e. not a list).
Some objects to play with:
v <- 1:3 ## atomic vector
vn <- c(a = 1, b = 2, c = 3) ## named vector
m <- matrix(1:9, 3, 3) ## matrix
## named matrix
mn <- matrix(1:9, 3, 3,
dimnames = list(letters[1:3], LETTERS[1:3]))
## list & named list
L <- list(1, 2, 3)
Ln <- list(a=1, b=2, cc=3)
Ln2 <- list(cc=3, cd = 4, "weird name" = 5)
DF <- data.frame(a = 1:3, b = 4:6, c = 7:9)
tt <- tibble::tibble(a = 1:3, b = 4:6, c = 7:9)
[
[
extracts elements of a vector by integer index or character (non-integers are silently truncated). It will extract one or more
v[1]
## [1] 1
vn[1]
## a
## 1
vn[1:3]
## a b c
## 1 2 3
vn["a"]
## a
## 1
try(vn["a":"c"]) ## nice if this worked, but it doesn't
## Warning in doTryCatch(return(expr), name, parentenv, handler): NAs introduced by
## coercion
## Warning in doTryCatch(return(expr), name, parentenv, handler): NAs introduced by
## coercion
## Error in "a":"c" : NA/NaN argument
Using [
to access a non-existent element of an atomic vector silently returns NA
(Inferno 8.2.13); it’s easy to miss this. [[
throws an error instead (hurray!)
vn["d"] ## !!! NA
## <NA>
## NA
v[4] ## !!! ditto
## [1] NA
v[1.1] ## !!! non-integer indices are silently truncated
## [1] 1
try(v[[4]]) ## safer.
## Error in v[[4]] : subscript out of bounds
Assigning to a nonexistent index creates an element, with intervening NA
values as required (!). [[
, which is normally safer, doesn’t save us here (!!!)
v[5] <- 5 ## !!!
v["e"] <- 2 ## !!!
v[[10]] <- 1 ## !!!
print(v)
## e
## 1 2 3 NA 5 2 NA NA NA 1
An extreme case (extension and coercion to character type …)
v[1e5] <- "hello"
length(v)
## [1] 100000
format(object.size(v), unit = "Mb")
## [1] "1.5 Mb"
Less over-accommodating weirdness, but still some traps.
m[4] ## !!! acts as though the matrix is a vector
## [1] 4
## (usually not what you want)
m[2,2] ## best use of [; index a matrix by row & column
## [1] 5
mn[,"A"] ## must use this to extract a column of a matrix
## a b c
## 1 2 3
try(mn[,"a"]) ## fails loudly on subscripting error
## Error in mn[, "a"] : subscript out of bounds
try(mn[["A"]]) ## !!! can't use this
## Error in mn[["A"]] : subscript out of bounds
try(mn[[,"A"]]) ## can't use this
## Error in mn[[, "A"]] : subscript out of bounds
R automatically drops dimensions (see Burns inferno 8.1.44):
dim(mn[,"A"]) ## !!! automatically drops dimensions,
## NULL
## returns numeric vector
dim(mn[,"A", drop = FALSE]) ##
## [1] 3 1
This difference can be confusing when you’re programming; suppose the columns to extract are specified by the user. If they ask for two columns you get a matrix, if they ask for one you get an atomic vector …
Double brackets are better than single brackets for extracting single elements of (atomic) vectors.
vn["d"] ## !!! returns NA: will propagate and cause an error
## <NA>
## NA
## later on *or* turn all of your results into NA
try(vn[["d"]]) ## subscript error -- this is good!
## Error in vn[["d"]] : subscript out of bounds
vn[1:3]
## a b c
## 1 2 3
try(vn[[1:3]]) ## doesn't work
## Error in vn[[1:3]] :
## attempt to select more than one element in vectorIndex
Single brackets on lists (and data frames) return a list of length 1 (not an atomic vector): see Inferno 8.1.54
str(DF["a"]) ## still a data frame
## 'data.frame': 3 obs. of 1 variable:
## $ a: int 1 2 3
is.numeric(DF["a"]) ## !!! FALSE
## [1] FALSE
These all work if you want to extract a single column:
is.numeric(DF[["a"]]) ## list-like: TRUE
## [1] TRUE
is.numeric(DF$a) ## list-like: TRUE
## [1] TRUE
is.numeric(DF[,"a"]) ## matrix-like: TRUE
## [1] TRUE
On the other hand is.numeric(DF[,"a", drop = FALSE])
returns a DF (as it should).
What about tibbles?
is.numeric(tt[["a"]]) ## TRUE
## [1] TRUE
is.numeric(tt$a) ## TRUE
## [1] TRUE
is.numeric(tt[,"a"]) ## FALSE! drop = FALSE for tibbles
## [1] FALSE
## this fixes an 'infelicity' with
## DF indexing design, but can be confusing
is.numeric(tt |> pull(a)) ## approved tidyverse idiom
## [1] TRUE
Indexing a non-existent element of a list returns NULL
rather than NA
(or error) (Inferno 8.2.13)
The $
-operator will do partial matching, silently by default …
names(Ln)
## [1] "a" "b" "cc"
Ln$c ## !!! doesn't warn that it's getting 'cc'
## [1] 3
options(warnPartialMatchDollar = TRUE)
Ln$c ## now warns
## Warning in Ln$c: partial match of 'c' to 'cc'
## [1] 3
Ln2$c ## NULL because ambiguous (cc, cd)
## NULL
Ln2$`weird name` ## names with spaces etc have to use back-ticks
## [1] 5
nm <- "weird name"
## you can't do *indirect reference* with $
Ln2$nm ## i.e. this doesn't work (returns NULL)
## NULL
[[
allows indirect reference (using the value of a symbol to extract an element), which $
doesn’t (since it is intended as an interactive/programming shortcut):
Ln2[[nm]]
## [1] 5
Ln2[["weird name"]]
## [1] 5
## can also create a new list element by indirect reference
newnm <- "a"
Ln2[[newnm]] <- 16
Ln2[["a"]]
## [1] 16
Ln2[["c"]] ## NULL (no partial matching)
## NULL
Unfortunately matrix columns can only be indexed by m[,i]
(m[[i]]
doesn’t work), and matrices only have colnames()
, not names()
(Inferno 8.2.40). Matrices must be homogeneous (e.g. all-numeric). Save matrices for when you (1) actually want to do linear algebra; (2) want to do efficient rowwise extraction (still not as efficient as columnwise matrix extraction, but much better than working with rows of DFs or tibbles).
Another reason why you should use data.frame()
rather than cbind()
in general to combine things column-wise (cbind()
will automatically coerce all of your data to the most general type:
m0 <- matrix(1, nrow = 3, ncol = 2)
cbind(m0, "a") ## "a" is automatically recycled
## [,1] [,2] [,3]
## [1,] "1" "1" "a"
## [2,] "1" "1" "a"
## [3,] "1" "1" "a"
data.frame(m0, newcol = "a")
## X1 X2 newcol
## 1 1 1 a
## 2 1 1 a
## 3 1 1 a
t1 <- tibble(a = 1:3, b = 2:4)
t2 <- tibble(c = LETTERS[1:3])
## combines these but result is a data frame, not a tibble
data.frame(t1, t2)
## a b c
## 1 1 2 A
## 2 2 3 B
## 3 3 4 C
tibble(t1, t2)
## # A tibble: 3 × 3
## a b c
## <int> <int> <chr>
## 1 1 2 A
## 2 2 3 B
## 3 3 4 C
bind_cols(t1, t2) ## *NOT* like cbind() - doesn't coerce
## # A tibble: 3 × 3
## a b c
## <int> <int> <chr>
## 1 1 2 A
## 2 2 3 B
## 3 3 4 C
Negative indices can be convenient for dropping elements, but not always (Inferno 8.1.11). x[-which(...)]
can be particularly dangerous (Inferno 8.1.13).
vn[-1]
## b c
## 2 3
try(vn[-1:2]) ## !!! `-` has higher precedence than `:`
## Error in vn[-1:2] : only 0's may be mixed with negative subscripts
vn[-(1:2)] ## this is OK
## c
## 3
vn[-which(vn > 4)] ## !!!
## named numeric(0)
vn[!(vn > 4)] ## this works
## a b c
## 1 2 3
vn[vn <= 3] ## this is clearer
## a b c
## 1 2 3
Negative indexing doesn’t work with element names (except maybe in subset
)
try(vn[-"a"]) ## !!! oh well
## Error in -"a" : invalid argument to unary operator
vn[names(vn) != "a"] ## works but clunky
## b c
## 2 3
vn[!names(vn) %in% c("a", "b")] ## use ! ... %in% to exclude
## c
## 3
Inferno has more stuff on what happens when you index with NA
or NULL
…