A(n append-only) tale of A good technology in the wrong environment
If you come from a relational-db background, EventStore may seem simultaneously exotic and simplistic. Instead of the tables, views and rich relational query syntax that we all know and love (and let’s face it, are very comfortable with), EventStore seems to have only one offering – streams; lists of immutable facts that can be appended to, and at a later stage, recalled. But this simple view of the technology does it little justice, because with the correct mind-set and architecture, it provides extremely powerful capabilities that wouldn’t be obvious when using an RDMBS, or even necessarily a NoSQL solution. Without that mind-set however, EventStore can be used quite incorrectly, whilst still “working”. In this post, I will give a candid history of JustGiving’s first foray into EventStore, noting what went well, what didn’t, and why.
EventStore, meet JustGiving
We started using EventStore in 2014 (according to our git history), when we were first exploring how to leverage the power of microservices. We needed to create a ‘Care’ button, akin to a Facebook ‘like’ button, to allow users to associate with good causes. It was something non-critical, with a very clear bounded context, making it perfect for experimentation. Although we were expecting low throughput, this work was intended to as an architectural tracer for further microservice development, so we decided to go all-out, and make a super-scalable solution.
We figured that since a care-count is displayed on each page (as well as a status of whether someone cares for a cause or not) that there would be many reads, whilst people actually caring for a cause would be orders of magnitude less. To this end, we decided to split the application into two separately deployable concerns: a ‘write’, which would add care/uncare events into EventStore, and a read concern, which would asynchronously subscribe to our event stream, and update a (sql-server-based) viewstore with some denormalised data for each page.
The diagram below shows the high-level architecture of the service, which we were pretty happy with at the time, although now, I am not so sure..
EventStore has two interfaces: TCP and HTTP. TCP is an order of magnitude faster than the HTTP one, whilst HTTP allows individual events to be cached inside the network, potentially reducing the strain on an EventStore cluster when reading event back. Since we were building some shiny Varnish clusters, we went for the HTTP approach.
The developers of EventStore chose AtomPub (+XMLor JSON) as the application-level format of the HTTP interface, which was a nice touch. Technically, any language that has an RSS client can trivially subscribe to events, opening it up to a vastly wider range of environments than the developers themselves would otherwise be able to cope with. However, there is more to creating a solid subscriber than simply pulling messages off a queue (uh-oh..did he just say queue?); polling needs to be configured, stream positions need to be managed, messages need to be deserialised and dispatched to handlers. To this end, we decided to roll our own open-source .Net client to help our devs make the most of the pattern. I won’t go into the API now – it’s on the GitHub readme, but it makes use of EventStore’s long-polling support, and has some nice polymorphic facilities to make handling events a doddle.
The solution we came up with used a single stream, ‘Cares’, to store all care-related events being raised by the front-end. To start processing cares, the read concern simply creates a subscriber for that stream, which performs the following logic:
- Get the position of the last processed event from a persistent, shared store
- Read the event metadata of the next 100 unprocessed events
- For each event that has a handler
- Fetch the event body (a single event is immutable so can be cached)
- Process the event
- Update the last processed position to the current position
Before we proceed, I think it’s worth mentioning that the Care service works solidly. It never goes down, and is always performant enough for our needs. This isn’t an assassination attempt of EventStore, just a candid look at our team’s failure at understanding a new technology.
The solution (so we thought) gave us three advantages over our traditional monolithic approach
- We could deploy fixes to our service very easily, thanks to blue-green deployment – this worked brilliantly, which was a testament to JustGiving’s strong ethos of AutomateEverything(™), but is a topic for another day.
- We could accept ‘cares’ from clients even if the read-concern was bogged down, ensuring we could always capture data, even if we were unable to display it for some reason. This was correct, but unimportant since we fake feedback on the UI to mimick an instant response anyway
- We could deploy more read-concern instances than write-concern, saving us money whilst keeping the scalability we needed. This step was the reason we used EventStore for this project, and we saw no benefit at all, for an important reason…
At the time of implementation, (from 2014 until the end of 2015), EventStore’s HTTP interface had no support for competing-consumers. This was a major consideration in our architecture – we envisaged multiple read-concern instances to spread the load of processing Care events. However, since we did not have access to the pattern, we made do with the shared ‘last-processed-position’ counter mentioned above. When we had multiple active consumers, each one relied on the database for synchronisation, yielding an ‘at-least-once’ delivery mechanism (in our case, it was ‘usually-more-than-once’). This was easily overcome by making our message handlers idempotent, but led to a bizarre conclusion:
We chose a technology to enable us to scale out a service for performance, and ended with a solution that:
- Required a trip to the database for each message read to update a position counter
- Scaled out poorly, with items being processed multiple times, due to the inadequacy of the above solution
- Required at least one trip to the EventStore per message, whilst realising very little caching benefit
Looking back, I think this project was clearly a bad fit for EventStore. We envisaged it as a queueing and distribution technology, when in fact there are some better candidates for queuing needs, such as RabbitMQ, which we use in-house. The main part of our open source client (which was written for this project) is the subscriber, which I think addresses the wrong problem – we could have made use of MassTransit or Brighter, and been done with it.
So…would we use EventStore again? Absolutely! We in fact have a number of projects that successfully use the technology, but use it idiomatically, with a stream per Aggregate Root. We use projections to tie things together when we need to, and those systems sing, especially when tied with libraries such as AggregateSource. But that, dear reader, is also a tale for another time…