The What, Why and How of SLAs, aka Service Level Agreements (part 1)

Every company uses technology vendors, such as Software-as-a-Service providers, to provide critical components of their business operations. One pervasive issue in technology vendor agreements is the vendor’s commitment to the levels of service the customer will receive.  A representation to use commercially reasonable efforts to correct product defects or nonconformity with product documentation may not be sufficient for a customer relying on a technology vendor’s service for a mission-critical portion of its business. In this situation, the vendor may offer (and/or a customer may require) a contractual commitment as to the vendor’s levels of service and performance, typically called a “Service Level Agreement” or “SLA.” Service Level Agreements (SLAs) ensure there is a meeting of the minds between a vendor and its customer on the minimum service levels to be provided by that vendor.

At a high level, a SLA does three things:

  1. Describes the types of minimum commitments the vendor will make with respect to levels of service provided by the vendor;
  2. Describes the metrics by which the service level commitments will be measured; and
  3. Describes the rights and remedies available to the customer if the vendor fails to meet their commitments.

In many cases, a SLA is presented as an exhibit or appendix to the vendor agreement (and not a separate agreement). In others, a SLA may be presented as a separate document available on a vendor’s website.  Think of the former as a customer-level SLA which is stated directly in (and quite often negotiated on a customer-by-customer basis as part of) the service agreement with that customer, and the latter as a service-level SLA which the vendor wants to apply equally to every user of its service.

In this two-part post, I’ll explain the contents of, reasons for, and important tips and tricks around technology SLAs.  Part 1 will cover uptime and issue resolution SLAs.  Part 2 will cover other types of technology SLA commitments, SLA remedies, and other things to watch for.

Common types of commitments in SLAs

The most common types of commitments found in technology SLAs are the uptime commitment and the issue resolution commitment.

Uptime SLA Commitment

An uptime commitment is generally provided in connection with online services, databases, and other systems or platforms (a “Service”). A technology vendor will commit to a minimum percentage of Service availability during specified measurement periods.  This percentage is typically made up of nines – e.g., 99% (“two nines”), 99.9% (“three nines”), 99.99% (“four nines”), 99.999% (“five nines”), etc.  Some SLAs will use “.5” instead of “.9”, for example, 99.5% or 99.95%”.   Uptime is typically calculated as follows:

(total minutes in the measurement period - minutes of Downtime in that period) / Total minutes in the measurement period

Definitions are key. The right definitions can make all the difference in the effectiveness of an uptime SLA commitment. Vendors may gravitate towards a narrower definition of “Downtime” (also called “Unavailability” in some SLAs) to ensure they are able to meet their uptime commitment, e.g., by excluding a slowdown that makes the Service hard (but not impossible) to use. Customers should look carefully at this definition to ensure it covers any situation in which they cannot receive substantially all of the value of the Service. For example, consider the difference between Unavailability/Downtime as a period of time during which the Service fails to respond or resolve, versus a period of time during which a material (or non-material) function of the service is unavailable. The SLA should define when the period of Unavailability/Downtime starts and ends, e.g., starting when the vendor first learns of the issue, and ending when the Service is substantially restored or a workaround is in place; customers should look at this carefully to ensure it can be objectively measured.

Mind the measurement period. Some vendors prefer a longer (e.g., quarterly) measurement period, as a longer measurement period reduces the chance a downtime event will cause a vendor to miss its uptime commitment. Customers generally want the period to be shorter, e.g., monthly.

Consider whether the uptime percentage makes sense in real numbers. Take the time to actually calculate how much downtime is allowed under the SLA – you may be surprised. For a month with 30 days:

  • 99% uptime = 432 minutes (7 hours, 12 minutes) of downtime that month
  • 99.5% uptime = 216 minutes (3 hours, 36 minutes) of downtime that month
  • 99.9% uptime = 43.2 minutes of downtime that month
  • 99.99% uptime = 4.32 minutes of downtime that month

One critical question customers should ask is whether a Service is mission-critical to its business.  If it’s not, a lower minimum uptime percentage may be acceptable for that service.

Some vendors may offer a lower uptime commitment outside of business hours, e.g., 99.9% from 6am to 10pm weekdays, and 99% all other times. Again, as long as this works for a customer’s business (e.g., the customer is not as concerned with downtime off-hours), this may be fine, but it can make it harder to calculate.

Ensure the Unavailability/Downtime exclusions are appropriate. Uptime SLAs generally exclude certain events from downtime even though the Service may not be available as a result of those events. These typically include unavailability due to a force majeure event or an event beyond the vendor’s reasonable control; unavailability due to the equipment, software, network or infrastructure of the customer or their end users; and scheduled maintenance.  Vendors will often seek to exclude a de minimis period of Unavailability/Downtime (e.g., less than 5/10/15 minutes), which is often tied to the internal monitoring tool used by the vendor to watch for Service unavailability/downtime. If a vendor wouldn’t know if a 4-minute outage between service pings even occurred, it would argue that the outage should not count towards the uptime commitment.

Customers should make sure there are appropriate limits to these exclusions (e.g., force majeure events are excluded provided the vendor has taken commercially reasonable steps to mitigate the effects of such events consistent with industry best practices; scheduled maintenance is excluded provided a reasonable amount of advance written notice is provided.  Customers should watch out for overbroad SLAs that try to exclude maintenance generally (including emergency maintenance).  Customers may also want to ensure uptime SLAs include a commitment to take reasonable industry-standard precautions to minimize the risk of downtime (e.g., use of no less than industry standard anti-virus and anti-malware software, firewalls, and backup power generation facilities; use of redundant infrastructure providers; etc.)

Don’t overlook SLA achievement reporting. One important thing customers should look for in a SLA is how the vendor reports on SLA achievement metrics, which can be critical to know when a remedy for a SLA failure may be available. Vendors may place the burden on the customer to provide notice of a suspected uptime SLA failure within a specified amount of time following the end of the measurement period, in which case the vendor will review uptime for that period and verify whether the failure occurred. However, without proactive metrics reporting, a customer may only have a suspicion of a SLA failure, not actual facts. Customers using a mission-critical system may want to consider asking for proactive reporting of SLA achievement within a certain amount of time following each calendar month.

Issue Resolution SLA Commitment

Of equal importance to an uptime commitment is ensuring that a Service issue (downtime or otherwise) will be resolved as quickly as possible.  Many technology SLAs include a service level commitment for resolution of Service issues, including the levels/classifications of issues that may occur, a commitment on acknowledging the issue, and a commitment on resolving the issue.  The intent of both parties should be to agree on a commitment gives customers assurances that the vendor is exerting reasonable and appropriate efforts to resolve Service issues.

Severity Levels. Issue resolution SLAs typically include from 3-5 “severity levels” of issues.  Consider the following issues:

Impact Example Classification
Critical The Service is Unavailable
High An issue causing one or more critical functions to be Unavailable or disrupting the Service, or an issue which is materially impacting performance or availability
Medium An issue causing some impact to the Service, but not materially impacting performance or availability
Low An issue causing minimal impact to the Service
Enhancement The Service is not designed to perform a desired function

Issue resolution SLAs typically use some combination of these to group issues into “severity levels.”  Some group critical and high impact issues into Severity Level 1; some do not include a severity level for enhancements, instead allowing them to be covered by a separate change order procedure (including it in the SLA may be the vendor’s way of referencing a change order procedure for enhancements). Vendors may include language giving them the right to reclassify an issue into a lower severity level with less stringent timeframes. Customers should consider ensuring whether they should have the ability to object to (and block) a reclassification if they disagree that the issue should be reclassified.

Acknowledgment Commitment. Issue resolution SLAs typically include a commitment to acknowledge the issue. As with the uptime SLA, the definition of the acknowledgment timeframe is important (when it starts and when it ends). A vendor will typically define this as the period from the time it is first notified of or becomes aware of the issue to the time the initial communication acknowledging the issue is provided to the customer.  Customers should look at the method of communication (e.g., a post to the vendor’s support page, tweet through their support Twitter account, an email, a phone call from the customer’s account representative required, etc.) and determine if a mass communication method versus a personal communication method is important.

For critical and high impact issues, vendors (especially those operating multi-tenant environments) will often not offer a specific acknowledgment commitment, instead offering something like “as soon as possible depending on the circumstances.”  The argument for this is that for a critical or high impact issue, a vendor wants all available internal resources triaging and working the problem, not reaching out to customers to tell them there is a problem. In many cases, this may be sufficient for a customer provided there is some general acknowledgment provided to a support page, support Twitter account, etc. to alert customers that there is an issue. In others, a customer may want to push for their account representative, or a vendor representative not involved in triaging the problem such as an account executive, to acknowledge the issue within a fixed amount of time, putting the burden on the vendor to ensure it has appropriate internal communication processes in place.

Resolution Commitment. Issue resolution SLAs also typically include a time commitment to resolve the issue. One important thing to focus on here is what “resolve” means.  Vendors may define it as the implementation of a permanent fix or a workaround that temporarily resolves the problem pending the permanent fix; in some cases, vendors may also define it as the commencement of a project to implement a fix.  Customers should ensure that a vendor promptly implement a permanent fix if a workaround is put in place, and that failure to do so is a failure under the SLA. Many vendors are reluctant to provide a firm issue resolution timeframe, as the time required to resolve or implement a workaround is dependent on the issue itself, and are often unwilling to negotiate the resolution commitment or commit to a fixed timeframe for resolution.  Customers should ensure the resolution commitment is reasonable and that the vendor is doing everything it can to correct issues.  For example, for critical and high impact issues, consider an issue resolution commitment of “as soon as possible using continuous diligent efforts” – as long as the vendor is working diligently and continuously to fix the issue, they’re in compliance with the SLA. For lower impact issues, consider a commitment to implement a fix or workaround in the ordinary course of business.

In part 2, I’ll cover other types of technology SLA commitments, SLA remedies, and other things to watch for.

Eric Lambert has spent most of his legal career working in-house as a proactive problem-solver and business partner. He specializes in transactional agreements, technology/software/e-commerce, privacy, marketing and practical risk management. Any opinions in this post are his own. This post does not constitute, nor should it be construed as, legal advice. He is a technophile and Internet evangelist/enthusiast. In his spare time Eric dabbles in voice-over work and implementing and integrating connected home technologies.