Skip to main content

Idempotent seeds in Elixir

Run your seeds a million times with deterministic UUIDs in Elixir.

· 4 min read
Agathe Lenclen

A standard Phoenix app contains a priv/repo/seeds.exs script file, which populates a database when it is run, so that developers can work with a conveniently prepared environment.

At Bitcrowd, we like our seeds to be idempotent. In practice, this means that running mix run priv/repo/seeds.exs multiple times will not create more rows each time, but rather upsert the existing data. While it is always possible to drop the local database with mix ecto.reset, we might want to keep the current state of our development database.

Upsert all the things!

Our strategy here is to use insert! with a conflict_target on the :id. Let's take a classic blog app with Post and Comment schemas for the sake of the example. In our seeds.exs file we add this little helper:

def insert_idempotently(schema) do
Repo.insert!(schema, on_conflict: :replace_all, conflict_target: :id)
end

And instead of inserting a seed record like so:

Repo.insert!(%Post{title: "Hello World!"})

We wrap our insert in the helper:

@id = "64af9d13-0f60-45fc-971f-07e6b490c059"
insert_idempotently(%Post{id: @id, title: "Hello World!"})

Nice 🎉

Next time we run the seeds, we will not get a second Post row in the database if a post with this :id already exists. This keeps our development environment neat and clean.

Idempotency for has_many associations

While the previous example is fairly simple, hardcoding UUIDs has its limitations when seeding has_many associations. For example, let's say we want to insert 50 Comments associated to the Post:

@id = "64af9d13-0f60-45fc-971f-07e6b490c059"
post = insert_idempotently(%Post{id: @id, title: "Hello World!"})

Enum.each(1..50, fn index ->
insert_idempotently(%Comment{
id: ???,
message: "Comment #{index}",
post_id: post.id
})
end)

Of course one could add 50 lines of hardcoded UUIDs… Or could we generate deterministic UUIDs from the index value? Yes we can ⚡️!

Deterministic UUID v4 from a string

Our deterministic UUID generator should take a string as an argument, and always return the same UUID for the same argument. We first need to hash our string, and then to extract the number of bits that we need. UUIDs have a consistent structure: "64af9d13-0f60-45fc-971f-07e6b490c059": one group of 8 characters, then three groups of 4, and finally a group of 12, all separated by a -.

This is how it works:

def deterministic_uuid4(string) do
# Hash the string and extract the 128 bits,
# and match on the length of our characters group
<<a::size(32), b::size(16), c::size(16), d::size(16), e::size(48), _rest::binary>> =
:crypto.hash(:sha256, string)

# Override some bits (necessary to create valid UUID v4)
c = bor(band(c, 0x0FFF), 0x4000)
d = bor(band(d, 0x3FFF), 0x8000)

# Glue all of the chunks together and turn it into a string
Enum.map_join([{a, 4}, {b, 2}, {c, 2}, {d, 2}, {e, 6}], "-", fn {chunk, zero_padding} ->
# Binary
unsigned = :binary.encode_unsigned(chunk)

# Maybe pad with 0 so that 'FA' becomes '00FA' [1]
pad = zero_padding - byte_size(unsigned)
padded_unsigned = <<0::pad*8, unsigned::binary>>

# Turn into hexadecimal
Base.encode16(padded_unsigned)
end)
end

Let's see it in action:

iex(1)> MyApp.Seeds.deterministic_uuid4("foo")
"2C26B46B-68FF-468F-B99B-453C1D304134"

iex(2)> MyApp.Seeds.deterministic_uuid4("foo")
"2C26B46B-68FF-468F-899B-453C1D304134"

iex(3)> MyApp.Seeds.deterministic_uuid4("bar")
"FCDE2B2E-DBA5-4BF4-8860-1FB721FE9B5C"

Finally, let's validate that our generated UUID is valid with the uuid utility:

iex(1)> UUID.info("2C26B46B-68FF-468F-899B-453C1D304134")
{:ok,
[
uuid: "2C26B46B-68FF-468F-899B-453C1D304134",
binary: <<44, 38, 180, 107, 104, 255, 70, 143, 137, 155, 69, 60, 29, 48, 65,
52>>,
type: :default,
version: 4,
variant: :rfc4122
]}

Let's rewrite our seeds to make use of our brand new function:

@id = "64af9d13-0f60-45fc-971f-07e6b490c059"
post = insert_idempotently(%Post{id: @id, title: "Hello World!"})

Enum.each(1..50, fn index ->
insert_idempotently(%Comment{
id: deterministic_uuid4("comment-#{index}"),
message: "Comment #{index}",
post_id: post.id
})
end)

Amazing! We won't get 50 new rows of Comment each time we run the seeds script. Our development database is clean and we made our developers happy ☕️.

UUID Version-5

To ruin the party, deterministic UUID generation is exactly what UUID v5 is designed for. And since Ecto does not validate UUIDs against their specs, you might as well use uuid again and do:

iex(7)> UUID.uuid5(:nil, "foo")
"aa752cea-8222-5bc8-acd9-555b090c0ccb"

iex(8)> UUID.uuid5(:nil, "foo")
"aa752cea-8222-5bc8-acd9-555b090c0ccb"

But what's the fun in that 🤷‍♀️.

Additional resources

Notes

  • [1] EDIT: Updated the map_join function call to pad with zeroes
ps

If you enjoyed reading this, you might be interested in working with Elixir at bitcrowd. Check our job offerings!

Agathe Lenclen

Agathe Lenclen

Pattern Matching Sandwich Artist

We’re hiring

Work with our great team, apply for one of the open positions at bitcrowd