Crashes, unexpected changes in user behavior, and a flood of new user ratings and reviews.
These are a few of the reasons why the work of releasing an update to your mobile app continues well after the update is live. Your mobile team’s job isn’t done until you’ve verified that the new release is stable and performing well for all users, with no negative impact on revenue or other key metrics.
To catch problems in their tracks, teams turn to a range of different measures of release health. The most common metrics teams monitor during rollouts involve the app’s software stability—like crash-free user and crash-free session rates or out-of-memory crashes—plus measures of performance like network latency and startup times. Most of these are easily available with out-of-the-box stability and performance monitoring tools.
But a holistic view of release health requires more than measuring how your app is doing on a technical level. There are second-order effects to monitor during releases that may not be as obviously tied to the code that shipped. For example, check user behavior for any regressions—think signups, checkouts, and other important conversions—and track certain key performance indicators (KPIs) against each release. Digital analytics platforms like Amplitude can help here. Finally, signals from the app stores, such as user ratings and reviews, can also be early indicators of issues in the latest release.
Clearly, there are many things to keep tabs on to run a healthy mobile release—or to run an unhealthy one that still ends up with a good outcome! Given various tools and stakeholders are involved, how can your mobile team set up for success and run rollouts confidently and with full visibility? In this post, we’ll look at some best practices.
Roll out each release slowly—but not too slowly
To safeguard app health post-release, teams typically run “phased” (Apple lingo) or “staged” (Google lingo) rollouts where the new version is available only to a select percentage of users at first, and that percentage gradually increases each day after release. This gives the team time to monitor for unforeseen issues and limits the number of users impacted if (when) issues arise. It’s better to catch a problem while it’s affecting only 2% of users, so you can stop the rollout and fix the issue before it affects your entire user base.
At the same time, depending on the size of your active user base, avoid releasing too slowly. Ensure you have a big enough sample size of usage just after release to get real signals on health. Plus, if and when you can get a clear picture of health, why not roll out the latest and greatest version to all your users on an accelerated timeline?
Ultimately, the speed at which a release is rolled out is entirely up to your team, albeit with some constraints (and manual work) depending on the platform involved.
Phased rollouts for iOS
When you choose to do a phased rollout to the App Store, Apple incrementally releases your new app version to a random subset of users who have automatic updates enabled (any user can navigate to your app’s page in the store and install the new version manually). The phased rollout follows a fixed, uncustomizable schedule over seven days:
- Day one: 1%
- Day two: 2%
- Day three: 5%
- Day four: 10%
- Day five: 20%
- Day six: 50%
- Day seven: 100%
Although a given day’s percentage cannot be altered, you can pause the rollout (freezing it at whatever percentage it was last on) or accelerate it to 100% of users at any point over the seven days.
Staged rollouts for Android
Android’s approach to staged rollouts is both more flexible and more manual than Apple’s. You can set the percentage of users to any (non-zero) number you’d like, but there’s no concept of a schedule, meaning you have to manually increase the percentage yourself by going into Play Console and editing a number in a little textbox for each and every increment, each and every release.
As with Apple, you can both halt and accelerate rollouts, either by simply leaving the rollout percentage where it is or increasing it to 100%. But manually managing rollouts can be a lot of work. Ideally, start by automating a scheduled rollout on the Google side so that’s not a daily chore. And for both platforms, you could look to tie automations into your health metrics that can halt an unhealthy rollout, or accelerate a healthy one—no human intervention required.
Roll out each new feature slowly
It’s worth noting that the same phased or staged approach can be adopted one level down at the feature level using feature flags. By wrapping specific areas of code in feature flags, you can ship new changes that are initially invisible to your user base, then slowly increase the exposure on a defined schedule. The considerations covered above remain the same: Think about the impact of your sample sizes and monitor app health and relevant KPIs as you roll the feature out.
“Shift left” by monitoring app health pre-production
What’s better than limiting the exposure of app health issues to a small subset of real users? Limiting app health issues to a small subset of not-so-real users. As long as you have a meaningful enough sample size of users and sessions to work with, there’s no reason why you can’t start monitoring app health before your release goes live in the stores. Enter: internal distribution and beta testing.
It takes some work to set up and maintain robust dogfooding, alpha, or beta programs, and it requires correctly instrumenting and surfacing health monitoring in pre-production environments. But if you are able to get pre-release versions into enough hands and monitor pre-release health comprehensively, it can be an especially good way to catch issues before they have any significant negative impact. File this strategy under “shift left,” a philosophy that suggests you move all forms of quality assurance earlier (leftwards) in the development and release lifecycle.
Take action on crash rates and performance
The most obvious kinds of problems to keep an eye on have to do with stability and performance. Not only are these often the most straightforward to measure and track, but they also have a very direct impact on users and your business. In an internal study, Google found that 50% of one-star user reviews mention crashes. Research by Wayfair, the popular ecommerce giant, showed that users who experience a crash in their app generate 7% less revenue. Quite simply, if users don’t have a good experience in your app, they’ll stop spending time and money in it. They might even uninstall it.
Monitoring stability and performance all starts with your app’s crash (or crash-free) rate, which can be measured in terms of the number of users experiencing crashes or the number of sessions in which crashes occur. Which one to track depends on the kind of app you ship and typical usage patterns. For example, apps with many short or important interactions (think gaming or fintech) might be better monitored with a focus on crash-free sessions, whereas crash-free users can be more appropriate for apps with a slower pace or a more sticky user base. In practice, mature teams will generally track both metrics and try to keep them above 99.9%.
Crashes are not the be-all and end-all of performance issues. The speed of your app also warrants careful monitoring. Speed to app start is especially important, and you’ll want to measure:
- Cold starts: The app launched from a crash or reboot.
- Warm starts: The app is launched when it’s partially in memory.
- Hot starts: The app is loaded while it’s still fully in memory.
With some variation, a cold start should take five seconds at most, a warm start two seconds, and a hot start a little over one second. Cold and warm starts happen in stages on both iOS and Android, and slowdowns can be caused by problems at any of the individual stages.
Also important: looking out for slow and frozen interactions. Unresponsive UI and jerky transitions or animations can wreck your user experience. This is typically quantified by looking at frame rates. Both Android and iOS device screens usually render at 60 frames per second (fps), though displays increasingly go up to 120 fps. At 60 fps, a frame has 16.67 milliseconds to render. If it takes longer, it’s a slow frame, and if it takes even longer (700 milliseconds to be exact) it’s typically considered frozen.
Apdex is a metric that teams increasingly use to consolidate different app stability and performance measures. It combines the kinds of signals covered above into a single score that captures just how pleasant or frustrating a given user session was.
Compare user behavior across releases and establish KPIs
Many teams start and end their release monitoring with a focus on stability and performance, but not every issue can be discovered with these kinds of metrics. Problems do not always manifest in such direct ways as crashes and slowdowns. New bugs and design changes can potentially significantly impact how users make their way around your app and whether or not they take the actions you want or expect them to take.
To get a complete look at app health, it’s important to measure how user behavior has changed—or not changed—compared to previous releases. And to make things actionable, you’ll want to establish certain expected values or KPIs that you can reference to identify and understand the severity of different customer pain points.
You might consider a range of different measures of user behavior, and where you focus will depend a lot on what your app does and how it’s used. For ecommerce, you may care about the number of checkouts, dollar value spent, or the number of items added to a user’s cart. A streaming app might track the number of plays or completed plays per session. Apps of all kinds may look at sign-ups, sign-ins, subscriptions, screen views, “likes,” etc.
For each metric, consider the appropriate “dimension” or unit to measure. For some, it might be am amount per daily active user (DAU) or per session. For others, you’ll look at an average value (e.g., average cart dollar value) or perhaps some percentile value (e.g., P90 or P99). And for all of these, you actually might want to keep a closer eye on changes from one version to the next (a “delta” value) instead of just the absolute amount.
Because there are so many different ways to measure user behavior, and because the right user behavior to measure depends so much on your product, it can be difficult to monitor these kinds of metrics consistently and holistically as a team. This is especially true for bigger apps and bigger teams, where you’ll likely have separate “feature teams” caring about different parts of the product and, thus, different metrics at any given time. Try to avoid the fragmentation and confusion this can cause by standardizing tooling and tracking of user behavior as much as possible.
Just as you might define service level objectives (SLOs) around stability and performance metrics you monitor, look to do the same with KPIs around user behavior. These will enable your team to effectively monitor release health and know when issues deserve escalation or need an immediate fix. In addition to granular KPIs tied to the product that will evolve as the product does, you could also identify and monitor against higher-level KPIs tied to larger business goals.
One more caveat about monitoring user behavior effectively: When it’s time to investigate a potential issue, you won’t have a clear stack trace or Git history at your disposal, as you might when digging into a crash. Instead, you’ll need to turn to other diagnostics, many of which might take a bit more of a nuanced approach (think session replays, user journeys, and event logging).
User ratings can be your best—or only—early warning sign
No matter how closely you monitor stability, performance, and user behavior, there will inevitably be gaps where everything looks OK, but some users are actually experiencing issues. In these situations, monitoring user reviews for each new release can be your best warning sign. It’s obviously indicative of a larger problem if your app’s average rating is a stellar 4.8, but new reviews coming in post-release give you 1 and 2 stars. Neither the App Store nor Play Store show version-specific ratings, so this is something your team will need to keep tabs on yourselves.
The nice thing with user feedback is that there’s at least some hope that the bad ratings will be accompanied by clarifying reviews that help spell out exactly what the problem is (sprinkled in with the usual messages like “lol, I hate this app now!” of course). It’s worth surfacing ratings and reviews to your wider team so different folks can help triage and interpret. For example, you could pipe your app store reviews into a Slack channel with product managers and customer support looped in and start threads off problematic reviews. Finally, an added bonus with user ratings and reviews as a monitoring channel is that you have the opportunity to respond, so you can ask clarifying questions and potentially mitigate a bad experience.
How can your team keep up with all these moving pieces?
It’s easy to recommend best practices around capturing a slew of different app health measures to monitor and take action on during releases. But actually doing that is a little more complicated. Your team needs to be able to stay on top of data from multiple tools and across multiple teams, make sense of it all, and continuously pick up on signals that something might be wrong. Then, of course, you need to investigate and make a determination on the severity of any given signal, potentially take the time to manually halt the rollout, dig into the root cause, and ship a fix.
How can you and your team handle all of this, ensuring you can catch and address critical issues without derailing and delaying other important work?
Bring all your health metrics into one place
No single tool can capture stability and performance, user behavior, and app store ratings and reviews, meaning you’ll waste time and make mistakes constantly context-switching across different dashboards. Instead, find a way to get all this data into one place. This could mean setting up all your tools to send alerts to the same place or using a platform like Runway to centralize alerting and surface all health metrics on a single dashboard. Larger teams, especially those with separate product groups or feature teams, will inevitably still have different dashboards in different places tracking different metrics. But even then, it’s worth consolidating at least some of these pieces into a holistic view. There will be times when the team will need to rally around a unified picture of health, and ideally that doesn’t have to play out as frantic back-and-forth questions and noise in Slack.
Create a contract around expectations for app health and tailor it to your product
Taking a holistic view of app health is hard, but actually making sense of diverse metrics and knowing when and how to act on them can be even harder. Add clear guidelines that define what “healthy” and “unhealthy” look like for your app to avoid the kind of negotiating and back-and-forth that comes with triaging issues. Having agreed upon rules that dictate when you alert, escalate, and fix will avoid any issues slipping through the cracks. Document the thresholds and expectations that define health for your app, perhaps starting with industry benchmarks but expanding to your own historical monitoring data. Continually adjust your contract to ensure you’re keeping up with changes to the product and your users’ evolving expectations.
Use automations to take quick(er) action when problems do—or don’t—arise
No matter how good a job you’re doing monitoring and acting on app health, you can’t be everywhere at once. Signals get missed and balls get dropped—sometimes with a significant negative impact on user experience or revenue. Look to automation to safeguard against inevitable gaps. You could, for example, add automations tied to your team’s defined health metrics, SLOs, and KPIs that will halt a rollout if metrics become unhealthy. In the other direction, you could also use automation to get healthy releases out to all users faster—a manual task that not only takes time during every release but also gets forgotten occasionally.
Get everyone on your mobile team involved
Anyone who plays a part in designing, building, and supporting your app should understand the value of your health metrics and the importance of monitoring and taking action when something is out of bounds. Engaging the entire team and creating a shared sense of responsibility for the outcome of everyone’s work can help establish a strong culture around quality. Plus, since solving the trickiest of issues often requires cross-team input, it makes everything easier to have more of those folks aware of and involved in the process from the get-go. One of the most direct ways to put this into practice is to set up a rotation whereby different team members take turns running rollouts and monitoring app health.
Boost the quality of your app—and user happiness
While all of this may seem like a lot of work, taking a holistic approach to monitoring release health can greatly impact the quality of your app and the happiness of your users. And with thoughtful coordination and tooling choices, you can minimize extra overhead and keep everything streamlined and actionable. The end result? A mobile team that can rally around quality and act on issues stress-free, and users who will reward you by spending more time in your app and less time looking around for alternatives.
Runway is a mobile release management platform that integrates with your existing tools – including crash reporting, digital analytics, performance monitoring, and the app stores – to give your team a unified, holistic source of truth for app health. You can define thresholds and monitor releases in one place, send alerts and automate rollouts based on schedules and health metrics, and much more. Check it out today.