Spawning processes in Elixir, a gentle introduction to concurrency

elixir and erlang concurrency

Along with pattern matching, one of the coolest things in Erlang and Elixir is their concurrency implementation based on Actor model. In this article I introduce concurrency and show how we can start making our code concurrent in Elixir, using processes.

Concurrency

We can think of concurrency as dealing with multiple things happening, and progressing, at the same time.

Concurrency is not to be confused with parallelism. They are two different concepts that sometime are used like synonyms. We can have multiple things running at the same time on just one CPU (or core); they progress together, but they are not executed in parallel.

Quoting Rob Pike

Concurrency is the composition of independently executing things, typically functions

Parallelism is the simultaneous execution of multiple things, possibly related, possibly not

Rob Pike – Concurrency Is Not Parallelism

Once we are able to split our problem into sub-tasks, and make it concurrent, we are then also able to take advantage of multiple cores and run the different sub-tasks in parallel.

I’ll write more on this in further articles.

Erlang processes

To make our code concurrent in Elixir we use Erlang processes.

If you are coming from programming languages like Ruby, Python or Java, you may have used OS threads, or OS processes, to make your code concurrent.

I developed for many many years with Rails and Sinatra frameworks and I really love Ruby, they both make a joy developing services. In the last five years I’ve also used Python quite extensively, especially to take advantage of the fantastic machine learning libraries that Python community has.

BUT if you developed with one of these two languages you maybe know they have something called GIL (global interpreter lock). In short, the GIL ensures that only one thread at a time can access to the shared memory. Don’t get me wrong, the GIL makes easier to write thread-safe code, but it also makes really difficult to write code that can scale out running in parallel on multiple cores.

These languages were not built with concurrency as the main goal. And concurrency is not just matter of scaling out, it’s for modelling the real world, which is mainly concurrent. The Erlang and Elixir concurrency model brings isolation, fault-tolerance and a great way to coordinate and distribute processes.

Erlang processes are not OS threads. They are not even OS processes. Erlang processes are lighter than threads, they have a really small memory footprint and the context switching is much faster.

The reason because Erlang and Elixir are highly concurrent, is because processes are so cheap that is possible to easily spawn thousands of them, without using all the memory.

Since the web Phoenix Framework is built with Elixir, it inherits its highly concurrent nature. The phoenix core-team ran a test few years ago showing how they got 2 millions active WebSocket connections on a 40 cores, 128GB ram machine.

So, since an Erlang process is lighter than a thread and a OS process, why is it called a process?

the term “process” is usually used when the threads of execution share no data with each other and the term “thread” when they share data in some way. Threads of execution in Erlang share no data, that is why they are called processes).

Erlang.org – Concurrent Programming

Let’s see something in practice!

Make HTTP API requests concurrent

Let’s consider this example. We have a simple get_price function that makes an HTTP request to get a cryptocurrency price from the Coinbase API.

defmodule Coinbase do
  @coinbase_url "https://api.pro.coinbase.com"

  def get_price(product_id) do
    url = "#{@coinbase_url}/products/#{product_id}/ticker"

    %{"price" => price} =
      HTTPoison.get!(url).body
      |> Jason.decode!()

    price
  end
end

The function uses HTTPoison and Jason to get the price of the given product_id.

We also add a second function to the module.

def print_price(product_id) do
  start = System.monotonic_time(:millisecond)

  price = get_price(product_id)

  stop = System.monotonic_time(:millisecond)
  time = (stop - start) / 1000
  IO.puts("#{product_id}: #{price}\ttime: #{time}s")
end

print_price(product_id) helps us to see how much time get_price(product_id) needs to request the price and return with a result. We simply surround the function with a start and stop timestamps and then calculate the difference to get the number of seconds elapsed.

start = System.monotonic_time(:millisecond)
...
stop = System.monotonic_time(:millisecond)

Since both our functions accept a product_id, we can use them to get prices of multiple products. For example, for Bitcoin (BTC-USD), Ethereum (ETH-USD), Litecoin (LTC-USD) and Bitcoin Cash (BCH-USD).

Coinbase.print_price "BTC-USD"
Coinbase.print_price "ETH-USD"
Coinbase.print_price "LTC-USD"
Coinbase.print_price "BCH-USD"

Or in a nicer functional way

["BTC-USD", "ETH-USD", "LTC-USD", "BCH-USD"]
|> Enum.each( &Coinbase.print_price/1 )

Running these requests we get the updated prices along with the time needed to complete each request.

BTC-USD: 3708.29000000  time: 0.125s
ETH-USD: 125.44000000   time: 0.171s
LTC-USD: 45.71000000    time: 0.481s
BCH-USD: 122.91000000   time: 0.187s

Each single product is requested sequentially. We first request for BTC-USD and wait for the response, then we request for ETH-USD and so on…

Sequential HTTP requests in Erlang and Elixir
Sequential HTTP requests in Erlang and Elixir

The problem with making requests one after the other, is that in this case there isn’t much computation happening and our computer is idle most of the time just waiting for the response from the server.

So, how can we request the four prices together, without each request has to wait that the previous one has finished?

Spawning processes

With the spawn/1 function we can easily make the previous requests concurrent, so each request is made and carried out at the same time.

pid = spawn fn ->
  # our code
end

The spawn(func) function creates an Erlang process, returns a PID (a unique Process ID) and runs the passed function inside this new process.

We use the PID to get informations about the process, interact and, most importantly to communicate with it (which is something we will see in next articles).

So, let’s make our requests concurrent, running each one of them in its own process.

pid_btc = spawn fn -> 
  Coinbase.print_price("BTC-USD")
end

pid_eth = spawn fn -> 
  Coinbase.print_price("ETH-USD")
End
...

or as we did before we can use a much more compact way, enumerating the cryptocurrencies and passing them to the function we want to run in a different process

iex> ["BTC-USD", "ETH-USD", "LTC-USD", "BCH-USD"] \
|> Enum.map(fn product_id->  
	spawn(fn -> Coinbase.print_price(product_id) end)
end)

[#PID<0.206.0>, #PID<0.207.0>, #PID<0.208.0>, #PID<0.209.0>]
BTC-USD: 3704.51000000  time: 0.234s
ETH-USD: 125.15000000   time: 0.234s
LTC-USD: 45.64000000    time: 0.324s
BCH-USD: 122.83000000   time: 0.325s

This time each request is made at the same time. The Enum.map function enumerates the product list and spawns a new process for each product in the list, running the Coinbase.print_price(product_id) function in it. The result is a list of PIDs.

Here a diagram showing how the requests are made concurrently

Spawn multiple processes to concurrently request prices

Wrap Up

We saw how easy is to spawn processes and making our code concurrent. But this is just the beginning. We had to print the results because the spawn/1 function returns immediately a PID, and we weren’t able to get the result in a traditional way. To coordinate with processes and communicate with them we still need to see an important piece of the puzzle: message passing, which I will cover in the next article.