Entities as Dependencies

Entity and record types make a handy way to express “this has to exist before I can use it” in data access code.

The problems

Recently I’ve found myself trying to address a collection of needs related to data access in a relational-database-backed application. By using retrieval to act as a first pass at detecting invalid references, and passing the retrieved entities around instead of passing bare keys, I’ve been able to provide much clearer error indications in more situations.

The external world references entities by key, not by value, for fairly boring practical reasons. Data access code frequently needs to deal with records that contain those keys, but also deal with underlying data stores that will enforce constraints on the values those keys can have.

pub fn create(&mut self, username: String, body: String) -> Result<Message>

The example above is the signature (in Rust) for a hypothetical database accessor function that writes a new row to a table containing two columns. If the user column must reference a row in the user table by ID, then this function has a constraint: it must be called with a “valid” user, or it will return an error.

Passing keys through to the database directly like this is completely functional. As long as referential integrity is set up in the schema, the database will observe whether a user passed in here corresponds to a row in the user table or not, and will return an error if not. However, the errors that arise when someone passes an invalid identifier tend to be highly opaque and difficult to program around, and thus difficult to signal back to callers in a form more useful than a generic “data access” error. Frequently, the returned error only specifies which constraint was violated; the application programmer is then left to understand what the constraint name means (or, more regularly, to return it to the end user unmodified, who will then be completely stumped).

My solution

Not every invalid identifier scenario results in an error. Queries to read values don’t generate errors, generally; instead, they return the empty set. Converting an empty result into a not-found error is straightforwards (and is often available out of the box in data access frameworks, given how often this need arises in real software).

What I’ve settled on is to retrieve the entity by key, and return appropriate errors based on the returned result set, before passing the whole entity to later methods, to express the dependency on having a valid identifier. To take the example above, that means replacing the first non-self argument:

pub fn create(&mut self, user: &User, body: String) -> Result<Message>

Internally, the function may only care about the passed &User’s user.id field, but the caller must resolve the user’s ID to a User, first.

// Resolve `username` to an actual User entity, or return an error if no such user
// exists…
let sender = tx.users()
	.get(username)
	.await
	.not_found(|| Error::UserNotFound(username))?;
// … then pass the whole User entity onwards, rather than a bare ID.
let message = tx.messages()
	.create(&sender, body)
	.await?;

The convention is backstopped by the same referential integrity rules as above; however, these are only used to catch race conditions and programming mistakes, not to provide user feedback in the most common cases.

Tradeoffs

If the entity in question is relatively flat, this mostly means doing IO twice - once to retrieve the row, then again when the database itself validates the reference. That’s not too bad in most cases, but it is worth being aware of.

If the entity in question is not flat, and may carry a lot of subordinate data with it, then retrieving all of that data only to throw most of it away is very wasteful. In that situation, I’ve sometimes resorted to defining a separate “reference” entity to meet this need, that doesn’t carry all the related data with it, at the cost of having two ways to represent those entities.

It is also still possible for this kind of check-then-do pattern to fail due to race conditions. While I’ve opted to ignore that possibility in my programs, it is a tradeoff, and users who hit that race will be left with an opaque error anyways. These have been rare for me in practice because of the overall design of the systems I built this for, but it might be a bigger issue for some designs.