Introduction to For Loops


For Loops in R

A for loop is a way of telling R to repeat something. Usually this takes the form of telling your computer: for every value in a series perform a function.

Some examples might help.

for (i in 1:10){
  print(i)
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10

This is a very simple for loop. It tells the computer, for every value of i in the sequence 1 to 10, print i. The computer then prints every number from 1 to 10.

The placeholder which takes the value of every part of the series doesn’t have to be i, you can use anything. In the next case, we will make a series of animals.

farm <- c("Cow", "Chicken", "Pig", "Goat")
for (animal in farm){
  print(animal)
}
## [1] "Cow"
## [1] "Chicken"
## [1] "Pig"
## [1] "Goat"

In this case the sequence I have given the computer isn’t a series of numbers, it’s a series of words. The for loop then instructs the computer to print every value in the vector.

You can also use the for loop to repeat something x number of times without using the i value within the for loop itself.

for (i in 1:10){
  print("eggs")
}
## [1] "eggs"
## [1] "eggs"
## [1] "eggs"
## [1] "eggs"
## [1] "eggs"
## [1] "eggs"
## [1] "eggs"
## [1] "eggs"
## [1] "eggs"
## [1] "eggs"

This just uses the for loop as a way of repeating a process over and over. In this case I wanted to print “eggs” ten times. There are easier ways to do this using R, but the for loop can be used in this way.

How is this useful? - Repetition

You can use this to do all sorts of tasks which involve repetition. As an example, if I wanted to add 1 to every value in a sequence I could do it like this:

x <- c(1,2,3,4,5)
print(x)
## [1] 1 2 3 4 5
for (i in x){
  x[i] <- x[i] + 1
}

print(x)
## [1] 2 3 4 5 6

This simple bit of code adds 1 to every value in x. In R you can do this in a much simpler way:

x <- c(1,2,3,4,5)
x <- x + 1
print(x)
## [1] 2 3 4 5 6

But there are occasions where no simpler way is obvious. For example, if I wanted to look at the mean of a bunch of different vectors.

x <- c(1,2,3,4,5)
y <- c(2,4,6,8,10,12)
z <- c(3,6,9,12,15)

for (i in list(x,y,z)){
  print(mean(i))
}
## [1] 3
## [1] 7
## [1] 9

This uses another type of data structure called a list to save three vectors, and then asks the computer to find the mean of each of the values in the list.

How else is this useful? - Multiple functions

You can also use for loops to perform, not just one function, but multiple functions. This can be great if you are trying to do something fairly complicated.

As an example, if I wanted to take a series, add 1 to each value, then divide each value by 6, then save this as another vector, I can do that using for loops.

old.vector <- c(5,10,15,25,30)
new.vector <- NA

for (i in 1:length(old.vector)){
  temp <- old.vector[i] + 1
  temp <- (temp/6)
  new.vector[i] <- temp
}
print(old.vector)
## [1]  5 10 15 25 30
print(new.vector)
## [1] 1.000000 1.833333 2.666667 4.333333 5.166667

This is a little clumsy and not nice to look at, but it helps to see how for loops can be helpful. In this case, I wanted to make a new vector by performing a set of different functions on each value of an old vector. I made the old.vector and filled it with values, and I made new.vector and filled it with nothing. Then I made the for loop.

I told it to work on a series from 1 to the length of the old.vector - in this case that’s 5 because old.vector has 5 values. Then I said: for every number in 1 to 5 take that value of old.vector (so the first, then second, all the way up to the fifth) and add one, then save it as a temporary object. After this, take the temporary object and divide it by six, saving it as the same temporary object (overwriting it). Then take the temporary object and save it as the value of new.vector corresponding with the index of old.vector. This is pretty complicated and using for loops, by splitting it into a series of different stages, can serve to make it a little simpler.

I see, how else? - Even more repetition

In the last few weeks I have been generating random numbers using rbinom() to learn more about limiting frequencies. The trouble with this is that it extracts a heavy toll on my computer. Let’s use a coin flip as an example. I want to prove that the more you flip a coin the closer its the relative frequency of heads will approach the probability of a head (0.5). If we start by flipping 10 coins.

set.seed(12345)
coinflips <- rbinom(n = 10, size = 1, prob = 0.5)
print(coinflips)
##  [1] 1 1 1 1 0 0 0 1 1 1
mean(coinflips)
## [1] 0.7

The mean here is 0.7, which is pretty far away from 0.5. Let’s try flipping the coin more times to get it closer to 0.5.

set.seed(12345)
coinflips <- rbinom(n = 100000, size = 1, prob = 0.5)
print(head(coinflips))
## [1] 1 1 1 1 0 0
mean(coinflips)
## [1] 0.50084

If we flip the coin 100,000 times we get 0.50084 which is much closer to 0.5. Still, I want to flip the coin more. But saving a vector with more than a million numbers inside might start to hurt my computer’s feelings. Instead I will use a for loop to save each simulation to a much less memory-intensive format, in this case a table.

set.seed(12345)
coinflips <- 0
for (i in 1:1000){
  coinflips <-coinflips + table(rbinom(n = 100000, size = 1, prob = 0.5))
}
print(coinflips)
## 
##        0        1 
## 50002322 49997678
prop.table(coinflips)
## 
##         0         1 
## 0.5000232 0.4999768

That’s one hundred million simulations. And as you can see, we’re getting close to 0.5. They reason you might do this sort of calculation in a for loop is because it is repetitive, and because it avoids strain on your computer’s memory. If we had just tried to change the n in rbinom() to one hundred million in the best case scenario it would have taken a long time. From experience, I would say RStudio might decide to switch itself off. With this useful for loop you can avoid that happening.

Previous