At this year’s Ruby on Ice conference Tessi presented a way to stream CSV or JSON data directly from PostgreSQL to the client. This avoids processing data in Ruby and enables us to reach far better performance. And as a bonus on top of this, streaming keeps the connection to the user’s browser alive. This avoids timeouts which load balancers may trigger to kill seemingly dead connections.
Watch the 5 minute summary video, or read on for more information:
The usual way to do downloads
Doing downloads the usual “rails way”, one could have a route pointing to a controller action which crafts the data to be downloaded and then sends it through the wire.
class UsersController < ApplicationController
def index
@users = User.where(deleted_at: nil)
respond_to do |format|
format.csv { render_csv }
end
end
private
def render_csv
send_data MyCsvService.call(@user),
filename: 'users.csv'
end
end
This piece of code is readable, fast to develop, and convenient to test. There is a caveat though: we need to prepare the data before sending it. Users however won’t see any indication of progress until the service finished generating the file and their client may even timeout if this take too long.
An attempt to defer download generation
We could defer generating the data into a background process (like Sidekiq
) and notify the user when their download is ready. This brings the advantage of a fast response to the download request and is in fact what many websites do.
The drawbacks need to be weighted in though:
- the user needs to do more to receive the data (click a button, wait for a notification, act on that notification)
- it requires significantly more code (background service, notification sending mechanism, storing the generated file, offering the download of the generated file)
- it yields more edge cases: What if the user double clicks the download generation? What if the user cannot look into their mail at the moment? The stored downloads require disk space and need access control (so that only authorized users can access them).
Streaming downloads to the user
Instead, we propose to instantly deliver the file to the user by streaming the data. This avoids the complexity of background processes and prevents eventual timeouts since there are always bits streamed over the HTTP connection.
Streaming data is already possible in Rack applications. For example streaming all numbers up to infinity:
class UsersController < ApplicationController
# ...
private
def render_csv
# we need to set some headers, which we skip here for brevity
# headers[...] = ...
enum = (1..Float::INFINITY).lazy.map {|i| sleep(1); "#{i}\n"}
send_data enum, filename: 'users.csv'
end
end
To use this trick for CSV downloads, we need to change our CSV generation service to implement each
(which returns an Enumerator
when called without a block).
class UsersController < ApplicationController
# ...
private
def render_csv
# headers[...] = ...
send_data MyCsvService.new(@user).each,
filename: 'users.csv'
end
end
Problem solved \o/ With little changes to our code, we are streaming data to the users.
But can do this even faster?
Let PostgreSQL build our stream
We are fans of the PostgreSQL database. As it turns out, PostgreSQL already knows a way to stream query results direcly to applications: the COPY
command.
COPY moves data between PostgreSQL tables and standard file-system files
Our apps run on UNIX operating systems where everything is a file ‒ including sockets to our application.
Leveraging this fact, we built a Ruby gem called sql_to_csv_stream
(which, albeit the name, also streams JSON...). You can give it a Rails scope (or any SQL) and it wraps the result to be streamable.
To use it in your Rails app, first enable Rails support:
# in an initializer if using Rails
require 'sql_to_csv_stream'
SqlToCsvStream.register_rails_renderer
Then change your controller action to:
class UsersController < ApplicationController
def index
@users = User.where(deleted_at: nil)
respond_to do |format|
format.csv do
render csv_stream: @users, filename: 'users.csv'
end
format.json do
render json_stream: @users, filename: 'users.json'
end
end
end
end
That’s it! From just a user relation, we can stream the results directly to the user. You can of course modify what exactly is send to the user by adapting the relation (e.g. renaming columns, or scoping to only certain records).
Performance-wise, this improves our download quite a bit (data measured in a real world app and a complicated SQL query):
way of generating the download | file size speed | total download time |
---|---|---|
Stream with Ruby Service | 74MB 429 KB/s | 3m 7s |
Streaming our gem | 74MB 691 KB/s | 1m 47s |
Streaming our gem (zipped) | 6MB 70 KB/s | 1m 35s |
Streaming our gem (simple query) | 86MB 6210 KB/s | 14s |
Streaming our gem (simple, zipped) | 14MB 3022 KB/s | 4s |
Summary
We presented a way to stream downloads directly from PostgreSQL to the user. Using sql_to_csv_stream, streaming is easy to integrate into existing Rails apps. We are in fact already using this approach in several apps in production.
This makes streams my reality 🎶