Streams are my Reality

At some point in life of a Ruby on Rails app comes the point where a user or admin needs to download data. This is often done in the form of CSV or JSON downloads.

At this year’s Ruby on Ice conference Tessi presented a way to stream CSV or JSON data directly from PostgreSQL to the client. This avoids processing data in Ruby and enables us to reach far better performance. And as a bonus on top of this, streaming keeps the connection to the user’s browser alive. This avoids timeouts which load balancers may trigger to kill seemingly dead connections.

Watch the 5 minute summary video, or read on for more information:

The usual way to do downloadsLink to section

Doing downloads the usual “rails way”, one could have a route pointing to a controller action which crafts the data to be downloaded and then sends it through the wire.

class UsersController < ApplicationController
  def index
    @users = User.where(deleted_at: nil)

    respond_to do |format|
      format.csv { render_csv }
    end
  end

  private

  def render_csv
    send_data MyCsvService.call(@user),
              filename: 'users.csv'
  end
end

This piece of code is readable, fast to develop, and convenient to test. There is a caveat though: we need to prepare the data before sending it. Users however won’t see any indication of progress until the service finished generating the file and their client may even timeout if this take too long.

An attempt to defer download generationLink to section

We could defer generating the data into a background process (like Sidekiq) and notify the user when their download is ready. This brings the advantage of a fast response to the download request and is in fact what many websites do.

The drawbacks need to be weighted in though:

  • the user needs to do more to receive the data (click a button, wait for a notification, act on that notification)
  • it requires significantly more code (background service, notification sending mechanism, storing the generated file, offering the download of the generated file)
  • it yields more edge cases: What if the user double clicks the download generation? What if the user cannot look into their mail at the moment? The stored downloads require disk space and need access control (so that only authorized users can access them).

Streaming downloads to the userLink to section

Instead, we propose to instantly deliver the file to the user by streaming the data. This avoids the complexity of background processes and prevents eventual timeouts since there are always bits streamed over the HTTP connection.

Streaming data is already possible in Rack applications. For example streaming all numbers up to infinity:

class UsersController < ApplicationController
  # ...

  private

  def render_csv
    # we need to set some headers, which we skip here for brevity
    # headers[...] = ...
    enum = (1..Float::INFINITY).lazy.map {|i| sleep(1); "#{i}\n"}
    send_data enum, filename: 'users.csv'
  end
end

To use this trick for CSV downloads, we need to change our CSV generation service to implement each (which returns an Enumerator when called without a block).

class UsersController < ApplicationController
  # ...

  private

  def render_csv
    # headers[...] = ...

    send_data MyCsvService.new(@user).each,
              filename: 'users.csv'
  end
end

Problem solved \o/ With little changes to our code, we are streaming data to the users.

But can do this even faster?

Let PostgreSQL build our streamLink to section

We are fans of the PostgreSQL database. As it turns out, PostgreSQL already knows a way to stream query results direcly to applications: the COPY command.

COPY moves data between PostgreSQL tables and standard file-system files

Our apps run on UNIX operating systems where everything is a file ‒ including sockets to our application.

Leveraging this fact, we built a Ruby gem called sql_to_csv_stream (which, albeit the name, also streams JSON...). You can give it a Rails scope (or any SQL) and it wraps the result to be streamable.

To use it in your Rails app, first enable Rails support:

# in an initializer if using Rails

require 'sql_to_csv_stream'
SqlToCsvStream.register_rails_renderer

Then change your controller action to:

class UsersController < ApplicationController
  def index
    @users = User.where(deleted_at: nil)

    respond_to do |format|
      format.csv do
        render csv_stream: @users, filename: 'users.csv'
      end
      format.json do
        render json_stream: @users, filename: 'users.json'
      end
    end
  end
end

That’s it! From just a user relation, we can stream the results directly to the user. You can of course modify what exactly is send to the user by adapting the relation (e.g. renaming columns, or scoping to only certain records).

Performance-wise, this improves our download quite a bit (data measured in a real world app and a complicated SQL query):

way of generating the download file size speed total download time
Stream with Ruby Service 74MB 429 KB/s 3m 7s
Streaming our gem 74MB 691 KB/s 1m 47s
Streaming our gem (zipped) 6MB 70 KB/s 1m 35s
Streaming our gem (simple query) 86MB 6210 KB/s 14s
Streaming our gem (simple, zipped) 14MB 3022 KB/s 4s

We presented a way to stream downloads directly from PostgreSQL to the user. Using sql_to_csv_stream, streaming is easy to integrate into existing Rails apps. We are in fact already using this approach in several apps in production.

This makes streams my reality 🎶

© 2020 bitcrowd GmbH.