ServiceM8 Background Services: a tool to manage scalable workloads built with Amazon EC2 and SQS

ServiceM8 is Cloud Software in the truest sense of the term — it’s not traditional software ported or adapted to run in the cloud; it’s built from the ground up to take advantage of cloud services. One great example of this is our Background Services system, which uses Amazon’s Simple Queue Services (SQS) and Elastic Compute Cloud (EC2) to give ServiceM8 developers an easy-to-use abstraction for scheduling work, allow our server capacity to be closely matched to our workloads, and keep page load times fast for our users.

ServiceM8’s Background Services system allows our developers to request a function to be executed with a particular set of arguments at some time in the future. Scheduling a background event¹, from a developer’s point of view, is a lot like calling an asynchronous function. The call to schedule the event returns immediately, but the actual work happens in another thread — or in our case, on another machine.

1. We refer to the combination of a function name and its arguments as an Event. For example, if a function SendEmail requires arguments ToAddress and Message, then executing SendEmail with ToAddress=”bob@example.com” and Message=”Hi Bob!” is an Event.

When a developer schedules a background event, here’s [a greatly simplified version of] what happens:

  1. The Background Services system generates a JSON hash specifying the function to execute, its arguments, the current environment details (authenticated user, account, etc), and optional parameters such as event priority and/or requested time to run the event.
  2. This hash is stored into an Amazon SQS queue.
  3. A “background server” — a server dedicated to processing Background Events — retrieves the event details from the SQS Queue and de-serializes the JSON.
  4. The background server authenticates itself as the same user and account which fired the event, so that things like permissions, user details etc are maintained.
  5. The background server executes the requested function with the arguments provided
  6. If all goes smoothly, then background server removes the item from the SQS queue. If something goes wrong while running the code (e.g. a database server fails, or an external service is unavailable) and an exception is thrown, then the item is placed back into the queue to be retried later.

This helpful abstraction provides three important benefits:

Responsiveness

As much as possible, ServiceM8 is written such that the “heavy lifting” is done in background events. Our front-end servers which handle user requests perform as little processing as possible. Essentially they act like dispatchers, transforming user requests into background events where the real work gets done. What this means is that users see fast page load times, even when there’s a backlog of Background Events to be processed.

Smooth Scalability

Because our front-end servers don’t do the hard work themselves, a sudden influx of user requests doesn’t result in dropped requests or “Service Unavailable” messages. Instead, it just results in a lot of background events being fired, so the number of messages in the background queue increases.

When this happens, scaling rules configured in Amazon EC2 boot more servers to process the backlog of work. The buffering protection provided by the queue means we don’t need to keep servers sitting around doing nothing “just in case” they’re needed. We can closely match the number of servers we run to the actual workload, safe the in the knowledge that if the workload increases, we can boot more servers to bring the queue size back under control. Every second a server is idle, it’s essentially burning money, so matching server supply to workload demand is crucial to keeping costs under control.

Simple Scheduling

Scheduling events to happen at a particular time in the future is difficult in typical web applications, where code runs only in response to a user request (i.e. page load). By de-coupling our user-facing servers from our background servers, this becomes a much simpler task. For a ServiceM8 developer, scheduling an event for a pre-defined time is as simple as putting it in the queue and supplying one extra parameter: the time at which the event should run. The Background Services system ensures that the event comes out of the queue at that time (or at least very close to it).

This is very handy for scheduling things such as automated reminder SMS messages. The code to generate the SMS can be run days, weeks or months before the SMS should be sent (e.g. when a user requests a reminder message). For the developer, scheduling the reminder message for exactly 12 hours before the customer’s appointment is as simple looking up the appointment timestamp, subtracting 12 hours, and supplying the value as an argument when scheduling the “Send SMS” background event.

So there you have it — Background Services make our developers more productive, reduce our server costs, and keep our customers happy! And it’s all made possible by cloud services: ServiceM8’s Background Services system would be infeasible to build for a traditionally-hosted web application.

Why you’ll never see scheduled maintenance on ServiceM8

Welcome to our first developer blog post. We wanted to share some insight on one of our behind-the-scenes features which we think help makes the ServiceM8 platform great.

Have you ever tried to use an app or website, only to find that it’s currently offline for maintenance – not fun. Prior to the cloud, this was standard IT practice, you had to log all users out while you completed the upgrade. “Users will understand, when we come back online they’ll have access to great new features.”

When we set out to build an online platform designed for small business, we knew this wasn’t an option. Most small businesses work weekends and at all hours of the night, so any maintenance would impact someone, somewhere.

The ServiceM8 Platform works a little differently. We call it our concurrent-version-deployment (CVD) system. And what makes it great is that we don’t need to log users out during scheduled maintenance, in fact, upgrades to users accounts are constantly occurring automatically 24×7.

  • New releases/updates of ServiceM8 are deployed at any time of the day, and usually several each day
  • Customers currently logged into their account are not interrupted, they just won’t see new features until they next login

So how does it work?

In most software, everyone will have the access to the exactly the same version/features. How ServiceM8 works is:

  • The ServiceM8 Platform will have tens, even hundred of versions of our codebase running at any point in time.
  • Each customer account is assigned a version, and accounts will gradually upgrade to the latest codebase over several weeks.
  • Issues impacting a customer due to a specific codebase can be resolved by upgrading that customer’s account to a newer (or older) codebase on the fly without impacting other customers.
  • We’re able to prioritise accounts for upgrade, so our support partners always get access to new features prior to the general public – so they will be upskilled before people start sending them enquiries.
  • We’re able to manage helpdesk workloads, as instead of 100% of customers receiving a new feature the same day, we control the upgrade speed of accounts to ensure our support levels are maintained.

It’s all possible because ServiceM8 was built from the start for the cloud, and takes full advantage of Amazon’s Web Services. We’re able to deliver what couldn’t have been achieved in a traditional IT infrastructure.