A standard Phoenix app contains a priv/repo/seeds.exs
script file, which populates a database when it is run, so that developers can work with a conveniently prepared environment.
At Bitcrowd, we like our seeds to be idempotent. In practice, this means that running mix run priv/repo/seeds.exs
multiple times will not create more rows each time, but rather upsert the existing data. While it is always possible to drop the local database with mix ecto.reset
, we might want to keep the current state of our development database.
Upsert all the things!
Our strategy here is to use insert!
with a conflict_target
on the :id
. Let's take a classic blog app with Post
and Comment
schemas for the sake of the example. In our seeds.exs
file we add this little helper:
def insert_idempotently(schema) do
Repo.insert!(schema, on_conflict: :replace_all, conflict_target: :id)
end
And instead of inserting a seed record like so:
Repo.insert!(%Post{title: "Hello World!"})
We wrap our insert in the helper:
@id = "64af9d13-0f60-45fc-971f-07e6b490c059"
insert_idempotently(%Post{id: @id, title: "Hello World!"})
Nice 🎉
Next time we run the seeds, we will not get a second Post
row in the database if a post with this :id
already exists. This keeps our development environment neat and clean.
Idempotency for has_many
associations
While the previous example is fairly simple, hardcoding UUIDs has its limitations when seeding has_many
associations. For example, let's say we want to insert 50 Comments
associated to the Post
:
@id = "64af9d13-0f60-45fc-971f-07e6b490c059"
post = insert_idempotently(%Post{id: @id, title: "Hello World!"})
Enum.each(1..50, fn index ->
insert_idempotently(%Comment{
id: ???,
message: "Comment #{index}",
post_id: post.id
})
end)
Of course one could add 50 lines of hardcoded UUIDs… Or could we generate deterministic UUIDs from the index
value? Yes we can ⚡️!
Deterministic UUID v4 from a string
Our deterministic UUID generator should take a string as an argument, and always return the same UUID for the same argument. We first need to hash our string, and then to extract the number of bits that we need. UUIDs have a consistent structure: "64af9d13-0f60-45fc-971f-07e6b490c059"
: one group of 8 characters, then three groups of 4, and finally a group of 12, all separated by a -
.
This is how it works:
def deterministic_uuid4(string) do
# Hash the string and extract the 128 bits,
# and match on the length of our characters group
<<a::size(32), b::size(16), c::size(16), d::size(16), e::size(48), _rest::binary>> =
:crypto.hash(:sha256, string)
# Override some bits (necessary to create valid UUID v4)
c = bor(band(c, 0x0FFF), 0x4000)
d = bor(band(d, 0x3FFF), 0x8000)
# Glue all of the chunks together and turn it into a string
Enum.map_join([{a, 4}, {b, 2}, {c, 2}, {d, 2}, {e, 6}], "-", fn {chunk, zero_padding} ->
# Binary
unsigned = :binary.encode_unsigned(chunk)
# Maybe pad with 0 so that 'FA' becomes '00FA' [1]
pad = zero_padding - byte_size(unsigned)
padded_unsigned = <<0::pad*8, unsigned::binary>>
# Turn into hexadecimal
Base.encode16(padded_unsigned)
end)
end
Let's see it in action:
iex(1)> MyApp.Seeds.deterministic_uuid4("foo")
"2C26B46B-68FF-468F-B99B-453C1D304134"
iex(2)> MyApp.Seeds.deterministic_uuid4("foo")
"2C26B46B-68FF-468F-899B-453C1D304134"
iex(3)> MyApp.Seeds.deterministic_uuid4("bar")
"FCDE2B2E-DBA5-4BF4-8860-1FB721FE9B5C"
Finally, let's validate that our generated UUID is valid with the uuid utility:
iex(1)> UUID.info("2C26B46B-68FF-468F-899B-453C1D304134")
{:ok,
[
uuid: "2C26B46B-68FF-468F-899B-453C1D304134",
binary: <<44, 38, 180, 107, 104, 255, 70, 143, 137, 155, 69, 60, 29, 48, 65,
52>>,
type: :default,
version: 4,
variant: :rfc4122
]}
Let's rewrite our seeds to make use of our brand new function:
@id = "64af9d13-0f60-45fc-971f-07e6b490c059"
post = insert_idempotently(%Post{id: @id, title: "Hello World!"})
Enum.each(1..50, fn index ->
insert_idempotently(%Comment{
id: deterministic_uuid4("comment-#{index}"),
message: "Comment #{index}",
post_id: post.id
})
end)
Amazing! We won't get 50 new rows of Comment
each time we run the seeds script. Our development database is clean and we made our developers happy ☕️.
UUID Version-5
To ruin the party, deterministic UUID generation is exactly what UUID v5 is designed for. And since Ecto does not validate UUIDs against their specs, you might as well use uuid again and do:
iex(7)> UUID.uuid5(:nil, "foo")
"aa752cea-8222-5bc8-acd9-555b090c0ccb"
iex(8)> UUID.uuid5(:nil, "foo")
"aa752cea-8222-5bc8-acd9-555b090c0ccb"
But what's the fun in that 🤷♀️.
Additional resources
Notes
- [1] EDIT: Updated the map_join function call to pad with zeroes
If you enjoyed reading this, you might be interested in working with Elixir at bitcrowd. Check our job offerings!