With 20+ years in the SaaS world as a founder and an architect spanning multiple industries, it feels great to solve business problems with proven messaging solutions. I'd like to share with you the benefits, from my experience, when switching from a push to a pull system, and an example of how it remedied a disaster!
I once worked on a solution in the travel industry where a travel agent would make a booking for a corporate traveller using an Online Booking Tool (OBT). A third-party Travel Provider asked if the OBT could export the booking in real-time by calling their API as shown below.
Figure 1. Synchronous Request-Reply pattern.
It was a simple “real-time” solution with strong consistency. What could possibly go wrong?
- What happened during scheduled or unscheduled maintenance?
- What happened with transient, temporary network, errors?
- What happened during a promotion when traffic spiked?
- What happened when the API spec changed?
A simple task turned out to be a burden on the OBT, which had to be enhanced to:
- Perform retries without overwhelming the travel provider’s API
- Detect and resolve duplicate bookings
- Store failed bookings that had to be retried, consuming lots of disk I/O
- Implement a scheduler to periodically export failed bookings again
- Develop a dashboard to track the status of exported bookings
The additional OBT components are shown below in the green box.
Figure 2. Synchronous Request-Reply pattern with bespoke retry logic.
It wasn’t long until additional travel providers wanted to receive exported bookings too. Developers had to extend the OBT, understand the authentication and specifications of various travel provider APIs. Then there was ongoing maintenance such as dealing with new API versions, network issues and software bugs.
8 benefits of changing push to pull
Changing the real-time system to an eventual consistency model provided the benefits discussed in the previous post.
The new design is shown in Figure 3 below by adding Azure Service Bus in between the OBT and the Travel Providers.
Figure 3. Publisher-Subscriber pattern.
The (pub/sub) pattern provided the following benefits:
- Availability – the OBT export system remains operational even if one or all the travel provider services are unavailable
- Reliability – when a travel provider’s service is temporarily unavailable, bookings would queue up and can be processed once the service is online again. Or when there are intermittent heavy loads on the system, the queue-based load levelling pattern can help
- Scalability – each travel provider can consume bookings as fast as possible and scale with multiple workers using the competing consumers pattern
- Observability – Azure provides dashboards to view queue lengths with the ability to add alerts to notify subscribers when queues are not being processed or processed too slowly
- Extensibility - adding additional travel provider subscriptions doesn’t add load to the OBT or requires code changes. The OBT doesn’t care if there are 1 or 1000 travel service providers.
- Loose coupling – OBT developers don’t need to understand the API spec and authentication for each travel provider. It is up to each travel provider to use the booking’s message schema and access the desired properties in their service
- Responsiveness – the travel agent can place a booking fast without the need to wait around for bookings to be exported
- Simplicity – fewer components are needed since we can eliminate the bespoke dashboard, scheduler, and reading/writing export statuses to a database
Recovering from an unexpected outage
An incident happened where a travel provider received an alert that there were 1,000 bookings waiting in their queue. It turns out that the travel provider made a firewall change which prevented their service from processing the bookings. When the firewall issue was resolved, they knew that their system is in control of the number of bookings that they can process, which was 2 per second. They knew that the queued bookings would take around 30 minutes to process without losing any data and that their system can handle the load. It was also possible to use the priority queue pattern to ensure urgent bookings can be processed first.
In summary, if you don’t have a queue anywhere in your architecture, you’re probably missing out on some of the benefits such as availability, reliability, and scalability. Get in contact if you would like to know more or need help to design your messaging architecture.
Are you interested in more of an introduction to messaging and queuing systems, and why they are important to SaaS? You might find our blog from last week interesting.