While, as engineers, we often focus on engineering metrics (throughput, availability, security, for example) for growing a platform, I think it is more important to ensure the correct product metrics are in place.
I'm not advocating for forgetting about non-functional requirements. Still, I think those should often sit in Product teams and mainly impact the Platform team indirectly (yes, even the DORA metrics). If you haven't read it before and want to understand the rationale for this statement, look at my previous article on Platform as a Product. In this article, I want to focus instead on how to set objectives for your internal platform, what metrics you want to focus on and how they change with the platform's maturity and in different contexts.
To do that, I'll use a set of metrics that I like to call HEAT:
- Happiness
- Efficiency
- Adoption
- Task Success
HEAT metrics are a variation of the famous HEART metrics, so I didn't invent anything; I removed Engagement and Retention and added Efficiency :).
Platform, What's in a name?
Before we start, let me reiterate my preferred definition of an Internal Platform:
"a foundation of self-service APIs, tools, services, knowledge and support which are arranged as a compelling internal product. Autonomous delivery teams can make use of the platform to deliver product features at a higher pace, with reduced coordination." (Evan Bottcher)
First, let's get this straight, if enabling other teams to "deliver at a higher pace" is the platform goal, I think you will have to agree that you shouldn't mandate the use of the platform services. This may sound quite controversial, especially for Cloud teams, where platforms are often used to centralise infrastructure usage and patterns.
Why is mandating a bad idea, especially at the beginning? For the same reason that a company that has a monopoly it's likely going to impact the customers and overall society negatively. For monopolists, there are very few incentives to create a great product if their customers don't have an alternative. Imagine if Stripe was the only company providing a Payment Platform. Would they have invested in an incredible customer experience and DX? I don't think so.
Workflow Platform at Babylon
I worked in three platform teams throughout my career (NowTV, NewsUK and Babylon), and we are about to kick off Onto's first platform team (exciting times). Below I'll walk you through how we introduced Product techniques at Babylon Workflow Platform, although some of those techniques come from my previous roles.
Let me give you a bit of context. Babylon's Aftercare department has a platform team that enables other teams to build/deploy/operate workflows independently. To avoid getting bogged down in what a workflow is at Babylon, imagine a service that allows orchestrating services to achieve a specific business goal.
The team used to build workflows for the overall company, but this wasn't very scalable. Ultimately, it was a dependency for other internal teams, not allowing them to develop their features independently.
From now on, I'll call the internal teams using the platform: our customers.
Measuring success when creating a new platform (or platform feature)
There are multiple reasons for an organisation to create a platform team, but to simplify the discussion, I'll highlight the main two:
- You have a team doing some work, providing a service that is used and needed inside your company. That means you have already validated that your service solves a need for your customer. In this case, you don't know if your customer wants to use your service because you are doing the work or because, inherently, there is a value in it. Would they still use your service if it was self-service? Would they use it if it works in the way you envision it? Don't get me wrong, this is a great position, but there are still quite a few unknowns.
- A different case is if you build a product that provides a service that wasn't there before. This increases uncertainty even more; is the service useful to someone, for example? Imagine you want to provide a platform that allows you to quickly deploy your software into a serverless provider while everyone in your company uses Containers. Are teams even allowed to run their software via a serverless provider? Would that work for teams building new features or only to retrofit existing ones? What team would have been happy to bet on a new architecture for their next feature/product?
At Babylon, we were in the first scenario: by the time we decided to set ourselves up into a self-service platform, we already had customers. Actually, one of the reasons we wanted to turn into a self-service product was that we couldn't handle all the customer requests from various teams.
It doesn't matter if you are in scenario 1 or 2 at this stage; a valuable metric to measure your success is Adoption.
For example, a good adoption metric may look like this:
At least 3 teams committing to use your product and 2 happy to be early adopters, working on a cutting edge version of it.
If you were an early-stage business, how would you measure success? People knocking at the door that want to try it and preferably are happy to pay for it. For internal products, often the "price" other teams are paying is with their time.
Your early adopters
Another good reason to want customers with you from day one is so that you can do user testing with them. You don't want to build a generic product for a generic set of users; you want to be closer to your users and make something that works for them. A word of advice is to try to start with two different teams to have more variations in terms of feedback.
In the Babylon Workflow team, for example, we felt that few customers could benefit from a way to ask questions to the end user in the Mobile App and retrieve the information in a Workflow. We started a roadshow to tell teams how this would work and ask them if they were interested. We then found two teams to work closely with us to ensure we could build something that worked for them.
I'll be honest it's pretty hard to keep your early adopters on board; team priorities change, and losing one of your early adopters is not uncommon. However, suppose you are struggling to keep your early adopters. In that case, this will likely be an issue also moving forward, so it's great to have to deal with it early on to understand how much value teams or the senior leadership attribute to your project.
If you have been struggling to get early adopters through the doors or fear that adoption is more challenging than expected, keeping adoption as a key metric for a while is not a bad idea. Perhaps you provide an offering that is too niche, or the way you promote inside the company could be improved, the value you provide is not significant enough to start using it, or the risk is considered too high. It doesn't matter what the reason is; the earlier you find out, the quicker you can react.
What about inward metrics?
We first looked at a metric that is outward. Typically, product metrics are all outward-looking or at least related to the customer, but sometimes you want to measure how well you are dealing internally. The inward metric I find more useful for teams is efficiency: how long does it take to onboard a new customer? How much ongoing support is required per customer?
If you have an influx of customers through the doors keen to use your product, perhaps focusing on efficiency may be a good idea. (especially if you created the platform from scenario 1)
Efficiency is easier to measure because it's internal to your team and will likely affect your team's capacity. Symptoms are that you may feel overwhelmed by customer support, struggle to keep up with demand from new customers, or that every time you have a discussion.
Given that in the Workflow team, we started with a service we already provided, but in a non-self-service way (scenario 1), we focused for two quarters on efficiency. It was a difficult decision: as often is the case, there were other aspects we would have to neglect to focus on it (see the section below). Still, we needed to free our capacity to evolve our platform later.
We looked at the number of tickets we raised in our support channel per month per customer. Support was such a big time sink for us that we had two people almost entirely dedicated to it for the entire week. In a quarter, we reduced the number of tickets received by 30% by implementing a feature that gave much more autonomy to one of our customer segments. This was a great way to increase the team capacity and improve the Developer Experience of the platform.
Another time we included a metric that was like this:
70% of the workflow should be built and deployed autonomously (without our team intervention).
What other metrics should you keep an eye on while you evolve your platform?
Not surprisingly, although we measure the impact of this metric inwardly often brings significant benefits to users as well: greater independence, quicker onboarding time and so on. Sometimes, however, it affects the user negatively, so let's look at what other metrics you want to keep an eye on:
- Happiness: is the user satisfied with your service?
- Task Success: how long does it take for your customer to learn how to use your service (onboarding)? How long before your tool solve their needs?
While the two metrics above look like they measure similar aspects, they don't. One tries to understand if you are fulfilling your promise to your customer. Is it solving their problems in the way you promised? The other is about how good the DX is for your product: how quickly can they set themselves up, how easy is it to learn your API/DSL/UI, and how quickly can they implement their changes?
For example, this is what those two metrics may look like in your team:
Increase customer satisfaction 15% on our monthly survey (Happiness)
Decrease 30% how long it takes to create a new service using the platform (Task Success)
At Babylon, for example, we started sending quarterly questionnaires to our customers (internal teams) to measure these two. I was lucky that I was able to involve a great researcher to create an unbiased survey. Still, even if you don't, some data (qualitative or quantitative) on your service will go a long way to help you understand where you are anyway.
Not surprisingly, for us (although we didn't expect it to be that bad), the result of the questionnaire was quite bad. The teams felt that support was terrific, but onboarding wasn't great, and they still depended on us for a feature that was not self-service. We decided to focus on the second aspect.
Counterbalance your metrics
I firmly believe in trying to change one metric at a time. However, when you change one, there are likely secondary consequences, so what you want to do when you set your metrics is to define upfront how much you are happy to take as a hit on the other metrics when you improve one.
Decrease time your team spent onboarding a customer (efficiency) 50% but without reducing customer happiness by more than 5%. (for example)
Conclusions
I showed you the HEAT metrics to measure how great of a job your platform team was doing and walked you through how we used the metrics at Babylon. Choosing what metric to use in each phase of your product highly depends on your context. Hopefully, the examples I provided are going to guide you.
Finally, I want to reiterate that Platform products done well can multiply the value that other teams produce: improve their lead time and increase the impact of their work. However, I have seen multiple times the consequences of a platform developed without the correct product principles in mind, especially for mandated platforms. The risk of not using Product metrics to drive your platform development drastically increases your chance of negatively impacting teams without realising it. I like to call those platforms "diminishers": they will slow your teams down and reduce the impact of their work.