library(dplyr)

There are lots of ways to index different data structures in R (i.e. extract particular components). It’s confusing. I’m going to illustrate some of the possibilities and explain why it’s better to use [[-indexing rather than one of the other options whenever you can. Most of what appears below is stated either explicitly or implicitly in help("Extract"), but good luck figuring it out …

tl;dr you should use [[ rather than any of the other options when extracting a single element (item or column) from a vector or list or data frame.

I use “!!!” below to indicate trouble spots.

Indexing methods:

The overlap between list/DF/matrix indexing methods is not surprising because data frames are lists, so anything that works with a list should work with a DF. DFs also look like matrices (but aren’t!), so matrix-style indexing usually works. We can also think about subset() (including its little-used select= argument) and tidyverse’s select()/filter() verbs as indexing methods, but that’s beyond the scope of this document. For the moment we will lump tidyverse tibbles in with DFs, although we mention a few important distinctions below.

!!! “vector” is very confusing terminology in R. Technically lists are vectors too:

A vector in R is either an atomic vector i.e., one of the atomic types, see ‘Details’, or of type (‘typeof’) or mode ‘list’ or ‘expression’.

99.5% of the time when R users say “vector” they mean “atomic vector” (i.e. not a list).

examples

Some objects to play with:

v <- 1:3  ## atomic vector
vn <- c(a = 1, b = 2, c = 3) ## named vector
m <- matrix(1:9, 3, 3) ## matrix
## named matrix
mn <- matrix(1:9, 3, 3,
             dimnames = list(letters[1:3], LETTERS[1:3]))
## list & named list
L <- list(1, 2, 3)
Ln <- list(a=1, b=2, cc=3)
Ln2 <- list(cc=3, cd = 4, "weird name" = 5)
DF <- data.frame(a = 1:3, b = 4:6, c = 7:9)
tt <- tibble::tibble(a = 1:3, b = 4:6, c = 7:9)

single brackets [

[ extracts elements of a vector by integer index or character (non-integers are silently truncated). It will extract one or more

v[1]
## [1] 1
vn[1]
## a 
## 1
vn[1:3]
## a b c 
## 1 2 3
vn["a"]
## a 
## 1
try(vn["a":"c"]) ## nice if this worked, but it doesn't
## Warning in doTryCatch(return(expr), name, parentenv, handler): NAs introduced by
## coercion

## Warning in doTryCatch(return(expr), name, parentenv, handler): NAs introduced by
## coercion
## Error in "a":"c" : NA/NaN argument

Using [ to access a non-existent element of an atomic vector silently returns NA (Inferno 8.2.13); it’s easy to miss this. [[ throws an error instead (hurray!)

vn["d"]          ## !!! NA
## <NA> 
##   NA
v[4]             ## !!! ditto
## [1] NA
v[1.1]           ## !!! non-integer indices are silently truncated
## [1] 1
try(v[[4]])      ## safer.
## Error in v[[4]] : subscript out of bounds

Assigning to a nonexistent index creates an element, with intervening NA values as required (!). [[, which is normally safer, doesn’t save us here (!!!)

v[5] <- 5     ## !!!
v["e"] <- 2   ## !!!
v[[10]] <- 1  ## !!!
print(v)
##                 e             
##  1  2  3 NA  5  2 NA NA NA  1

An extreme case (extension and coercion to character type …)

v[1e5] <- "hello"
length(v)
## [1] 100000
format(object.size(v), unit = "Mb")
## [1] "1.5 Mb"

The Scream


Single-bracket indexing of matrices

Less over-accommodating weirdness, but still some traps.

m[4]            ## !!! acts as though the matrix is a vector
## [1] 4
                ##     (usually not what you want)
m[2,2]          ## best use of [; index a matrix by row & column
## [1] 5
mn[,"A"]        ## must use this to extract a column of a matrix
## a b c 
## 1 2 3
try(mn[,"a"])   ## fails loudly on subscripting error
## Error in mn[, "a"] : subscript out of bounds
try(mn[["A"]])  ## !!! can't use this
## Error in mn[["A"]] : subscript out of bounds
try(mn[[,"A"]]) ## can't use this
## Error in mn[[, "A"]] : subscript out of bounds

R automatically drops dimensions (see Burns inferno 8.1.44):

dim(mn[,"A"])                ## !!! automatically drops dimensions,
## NULL
                             ##     returns numeric vector
dim(mn[,"A", drop = FALSE])  ## 
## [1] 3 1

This difference can be confusing when you’re programming; suppose the columns to extract are specified by the user. If they ask for two columns you get a matrix, if they ask for one you get an atomic vector …

Double brackets, atomic vectors

Double brackets are better than single brackets for extracting single elements of (atomic) vectors.

vn["d"]         ## !!! returns NA: will propagate and cause an error
## <NA> 
##   NA
                ##     later on *or* turn all of your results into NA
try(vn[["d"]])  ## subscript error -- this is good!
## Error in vn[["d"]] : subscript out of bounds
vn[1:3]
## a b c 
## 1 2 3
try(vn[[1:3]])  ## doesn't work
## Error in vn[[1:3]] : 
##   attempt to select more than one element in vectorIndex

Indexing of lists (and data frames)

Single brackets on lists (and data frames) return a list of length 1 (not an atomic vector): see Inferno 8.1.54

Hadley Wickham ‘pepper’ image

str(DF["a"])          ## still a data frame
## 'data.frame':    3 obs. of  1 variable:
##  $ a: int  1 2 3
is.numeric(DF["a"])   ## !!! FALSE
## [1] FALSE

These all work if you want to extract a single column:

is.numeric(DF[["a"]]) ## list-like: TRUE
## [1] TRUE
is.numeric(DF$a)      ## list-like: TRUE
## [1] TRUE
is.numeric(DF[,"a"])  ## matrix-like: TRUE
## [1] TRUE

On the other hand is.numeric(DF[,"a", drop = FALSE]) returns a DF (as it should).

What about tibbles?

is.numeric(tt[["a"]])      ## TRUE
## [1] TRUE
is.numeric(tt$a)           ## TRUE
## [1] TRUE
is.numeric(tt[,"a"])       ## FALSE!  drop = FALSE for tibbles
## [1] FALSE
                           ##   this fixes an 'infelicity' with
                           ##   DF indexing design, but can be confusing
is.numeric(tt |> pull(a))  ## approved tidyverse idiom
## [1] TRUE

Indexing a non-existent element of a list returns NULL rather than NA (or error) (Inferno 8.2.13)

The $-operator will do partial matching, silently by default …

names(Ln)
## [1] "a"  "b"  "cc"
Ln$c             ## !!! doesn't warn that it's getting 'cc'
## [1] 3
options(warnPartialMatchDollar = TRUE)
Ln$c             ## now warns
## Warning in Ln$c: partial match of 'c' to 'cc'
## [1] 3
Ln2$c            ## NULL because ambiguous (cc, cd)
## NULL
Ln2$`weird name` ## names with spaces etc have to use back-ticks
## [1] 5
nm <- "weird name"
                 ## you can't do *indirect reference* with $
Ln2$nm           ## i.e. this doesn't work (returns NULL)
## NULL

[[ allows indirect reference (using the value of a symbol to extract an element), which $ doesn’t (since it is intended as an interactive/programming shortcut):

Ln2[[nm]]
## [1] 5
Ln2[["weird name"]]
## [1] 5
## can also create a new list element by indirect reference
newnm <- "a"
Ln2[[newnm]] <- 16
Ln2[["a"]]
## [1] 16
Ln2[["c"]]  ## NULL (no partial matching)
## NULL

Unfortunately matrix columns can only be indexed by m[,i] (m[[i]] doesn’t work), and matrices only have colnames(), not names() (Inferno 8.2.40). Matrices must be homogeneous (e.g. all-numeric). Save matrices for when you (1) actually want to do linear algebra; (2) want to do efficient rowwise extraction (still not as efficient as columnwise matrix extraction, but much better than working with rows of DFs or tibbles).

Another reason why you should use data.frame() rather than cbind() in general to combine things column-wise (cbind() will automatically coerce all of your data to the most general type:

m0 <- matrix(1, nrow = 3, ncol = 2)
cbind(m0, "a") ## "a" is automatically recycled
##      [,1] [,2] [,3]
## [1,] "1"  "1"  "a" 
## [2,] "1"  "1"  "a" 
## [3,] "1"  "1"  "a"
data.frame(m0, newcol = "a")
##   X1 X2 newcol
## 1  1  1      a
## 2  1  1      a
## 3  1  1      a
t1 <- tibble(a = 1:3, b = 2:4)
t2 <- tibble(c = LETTERS[1:3])
## combines these but result is a data frame, not a tibble
data.frame(t1, t2)
##   a b c
## 1 1 2 A
## 2 2 3 B
## 3 3 4 C
tibble(t1, t2)
## # A tibble: 3 × 3
##       a     b c    
##   <int> <int> <chr>
## 1     1     2 A    
## 2     2     3 B    
## 3     3     4 C
bind_cols(t1, t2)  ## *NOT* like cbind() - doesn't coerce
## # A tibble: 3 × 3
##       a     b c    
##   <int> <int> <chr>
## 1     1     2 A    
## 2     2     3 B    
## 3     3     4 C

negative indexing gotchas

Negative indices can be convenient for dropping elements, but not always (Inferno 8.1.11). x[-which(...)] can be particularly dangerous (Inferno 8.1.13).

vn[-1]
## b c 
## 2 3
try(vn[-1:2])       ## !!! `-` has higher precedence than `:`
## Error in vn[-1:2] : only 0's may be mixed with negative subscripts
vn[-(1:2)]          ## this is OK
## c 
## 3
vn[-which(vn > 4)]  ## !!!
## named numeric(0)
vn[!(vn > 4)]       ## this works
## a b c 
## 1 2 3
vn[vn <= 3]         ## this is clearer
## a b c 
## 1 2 3

Negative indexing doesn’t work with element names (except maybe in subset)

try(vn[-"a"])                    ## !!! oh well
## Error in -"a" : invalid argument to unary operator
vn[names(vn) != "a"]             ## works but clunky
## b c 
## 2 3
vn[!names(vn) %in% c("a", "b")]  ## use ! ... %in% to exclude
## c 
## 3

Inferno has more stuff on what happens when you index with NA or NULL