Tidy Cloud AWS - working backwards

Author

Erik Lundevall-Zara

Published

June 6, 2023

Hello all!

Welcome to the next issue of the Tidy Cloud AWS newsletter! In this issue, I am coming back to a topic I have touched on before, which I think is worth bringing up again - working backwards and asking why?.

I will experiment with the content of the email newsletter, the Cloudgnosis.org website, and sites like The OPS community. The general idea is to adapt content more depending on where it is published. The schedules will be different also, with email newsletter being the most frequent.

Enjoy!


How working backward is the best way forward

Have you had a situation where you have chosen a specific solution to a set of problems because it seemed to be the right thing to use, only to discover later that it did not work well at all, was too expensive or no one used it? I have been there multiple times. It’s easy to solve problems without researching first.

An example: At a former employer, we used SumoLogic for a lot of log analytics and monitoring of AWS workloads. It worked very well, and it was quite useful for the operations group. Then some persons from that group joined a startup company and had to select the tooling to use when operating their AWS workloads. Here, it did not work out at all. The same tool was barely used, and the value was questionable.

A key reason for this was that the tool was picked before properly understanding what was needed compared to the old company usage:

  • Different and smaller organization, different responsibilities and roles
  • Workloads were not the same, use cases were different, although similar at a glance
  • Priorities were different

In short, the differences meant that applying the same solution pattern did not provide enough value to the organization. Not that SumoLogic is a poor tool - on the contrary, I think it is a very capable tool and can provide significant value.

Amazon and AWS pride themselves on customer-focused product development, beginning with a press release:

Working backward at Amazon You don’t need a press release like Amazon, but the same idea stands. A bit of research going backward would have helped to make a better decision for the monitoring solution:

  1. Identify stakeholders
    • What roles do need information about the state of the solution(s) and what are their responsibilities?
    • What do information they require, in what form, and when?
  2. Identify key performance indicators
    • What are the business values that we want to ensure are upheld for our customers?
    • How is that measured?
    • What is the definition of a healthy state for those values?
    • At what point is that healthy state at risk?
    • What actions should be performed when the healthy state is at risk?
  3. Map KPIs and solution resources
    • Identify resources/components that can provide data
    • Identify types of insights that can be gained (behaviour, faults, performance)
    • Identify the type of information to collect (logs, metrics)
    • Identify source to collect from (e.g. log files, system performance data)
    • Figure out threshold and/or patterns in resources to use for alarms
  4. Reports, alerts, actions
    • What to report to who
    • How to deliver report data
    • Which formats to use
    • Determine Severity
    • Actions (automated, manual)

The above is an iterative process - both to get into enough detail to answer questions, and changes in requirements.

The general pattern here applies to many areas, not just monitoring solutions. This also applies to automation. When should you automate a process or activity?

The answer is “it depends”, or rather it is not the right question to ask first. If you ask a question like that, you may get answers such as “When you have repeated it 3 times” or “Always”. But there is no context in such answers.

A better question to ask is why should this process or activity be automated? At the surface level, answers that may come up could be:

  • To save time for repeated tasks
  • To be (more) consistent
  • To avoid human error
  • To distill expert knowledge into something that others may use
  • To have some documentation of the steps of the process or activity

But these are still vague. They may all be valid to some extent, however, they should still go back to a defined business value or aim.

Start with your customer and work backward. It requires discipline though, and practice.

Even with Amazon Web Services, where they presumably practice this every single day and it is a part of their corporate culture, they can still provide customer experiences that are quite crappy. I think there are other aspects at play as well though, plus it might not always be clear who is the customer from AWS point of view.

What do you think about working backward from the customer?


You can find the contents of this bulletin and older ones, and more at Cloudgnosis.org. You will also find other useful articles around AWS automation and infrastructure-as-software.

Until next time,

/Erik

Back to top