Cloud is not on-premise
Engineers who design architecture and provision new resources in the cloud indirectly make purchasing decisions. Sometimes multiple times a day! Each decision affects the total amount on the bill issued at the end of the month by the public cloud provider. For organizations with a long history, this is a whole new reality. It is difficult for finance departments to find their way without long-term planning for IT hardware and license purchases. As a result, those controlling IT expenses face a new challenge that once did not exist. In the on-premise model, it is impossible to go over budget once an order has been completed. Regardless of how the servers were used, their operating cost was known upfront and, above all, fixed.
The flexibility of the cloud has changed the rules of the game. Purchases are made in seconds, and costs vary in real-time based on current usage.
Changing patterns follow changes in technology
The development of cloud-based solutions has made capacity planning outdated, and technological changes have prompted the development of new practices at the solution engineering and management levels.
It would be cliché to say that any IT problem can be solved in multiple ways. Architecture describes these solutions. Any experienced architect will admit that architecture is about compromises & trade-offs. However, as with security, it isn’t easy to agree to far-reaching concessions regarding cost.
To the traditional group of non-functional requirements, such as availability, performance, and over-mentioned security (there are many more), we should add the operational costs of the solution. IT systems should not be over-provisioned and thus overly expensive. Ideally, the solution’s costs should grow slower than the number of transactions it supports. This demonstrates the choice of the best architecture that flexibly, to our advantage, uses economies of scale, which is one of the basic principles of the public cloud.
How to deal with high cloud costs?
Often at the beginning of the journey, and especially at the stage when planned cloud budgets are exceeded, the finance department is tempted to take charge of technology selection and control who can create what and when. To some extent, this approach makes sense:
For example, suppose we do not carry out millions of simulations while working on a new drug. In that case, our organization will not need costly virtual machines equipped with graphics cards on which these calculations can be performed, so access to this category of servers can be blocked “top-down,” preventing their accidental launch.
Unfortunately, the far-reaching approach of strict control has many adverse effects that management must consider. First of all, it blocks most of the benefits of using the public cloud by:
- limiting access to new services and updates (which appear at least every quarter and often make the current “good” architecture obsolete).
- preventing the rapid construction of PoC.
- avoiding experiments, thus depriving IT staff of the possibility of development by practically learning about new solutions.
Secondly, IT blocking is a source of conflict and builds us versus them culture. It puts programmers and engineers in the position of people the organization does not trust and therefore controls them with financial police officers.
The final pragmatic reason is that even initially approved and tested services can be misconfigured, duplicated, or abandoned (someone used them once, then stopped, but forgot to delete them – we professionally call such cases waste), thus generating unnecessary costs.
In my opinion, the above arguments are sufficient to discourage abuse of power and not introduce any far-reaching gatekeeping.
So, what can be done to control the rising cloud costs?
Before proceeding, you should realize that some of these unplanned costs may be valid and appropriate because they benefit the organization. The challenge for the finance department here is to understand which ones! You must learn to separate the wheat from the chaff and be able to skillfully accomplish cloud cost optimization.
As mentioned above, a state-of-the-art system will automatically provision or destroy cloud resources to support its usage patterns. We will say that it is elastic. Therefore, if you have planned to serve an average of 10,000 customers per month and have estimated a budget for such a volume, you need to take certain events into account. For example, whether a recent marketing campaign caused an increased demand for your company’s products/services, resulting in a significant increase in the number of customers and scaling up the system, which increased AWS cloud costs.
Of course, this is a positive example (and let’s hope this is the only case for you) showing the close correlation between business and technology. Higher AWS cloud costs are not always a problem! Other examples are more difficult to convert into money because they are difficult to evaluate:
All could be done to save time and increase safety.
While such unplanned and uncontrolled decisions can be justified post-factum, especially as they show the expertise and commitment of technical staff, we cannot ignore the fact that they are not a rule and are always associated with unnecessary costs.
Various market studies seem to believe that, on average, 20% of cloud costs are unnecessary. Some go as high as 35% in their estimates (which I’m inclined to believe based on my experience!). Even for small and medium-sized companies, this means substantial sums of money, reaching tens or hundreds of thousands and often millions of dollars annually.
Cloud costs should be optimized first and foremost in this area – by looking for savings and eliminating losses!
How? See point 3 in the paragraph below.
New challenges for “finance” in the cloud
To the best of my knowledge, the responsibility of the CFO or CIO in the context of cloud finance is the broadly understood construction of the FinOps culture, which mainly consists of the following issues:
1. Cost allocation – a method of assigning cloud costs to specific systems (projects) or organizational units. A non-trivial task aimed at creating clear criteria to ensure the transparency of cost measurements. Sometimes it is challenging to separate costs between projects; clear rules will help IT teams understand how they are internally evaluated.
One of the first key tasks in this area will be to define and implement a resource tagging strategy.
At a higher level of maturity, you will have to choose one of the billing models: chargeback or showback. This will allow transferring more responsibility for the costs of cloud solutions to the edge of the organization – to the teams responsible for them.
2. Leading activities aimed at developing internal practices and standards facilitating the management and optimization of cloud costs already at the design stage. These activities must be carried out with the technical departments and engineers. Attempts to impose rules from books and best practices on the Internet are rarely successful because of technological and cultural limitations.
The shameful truth is that technical people hate criticism of solutions they have created and take it quite badly from non-technical people. Hence, I recommend building cooperation mechanisms based on partnership. A recipe for success; show IT departments the common goal of low operating costs. Programmers and engineers love challenges, so presenting low costs (while maintaining other non-functional requirements) as a technical achievement (of an engineering nature) will be a hundred times better motivator than any commandments and orders originating in a foreign financial department.
More tangible motivators are also worth considering; perhaps some of the money saved by the project can “return” to its budget, allowing the hiring of another team member who offloads the team?
3. Initiating activities to optimize the cloud’s current and future costs. This is the only thing that can bring an immediate reduction in running costs, especially when done for the first time, as it usually provides a range of low-hanging fruits. Our offer includes precisely such service, and I strongly encourage you to use it.
But beware, this is not a silver bullet for all cost problems!
It is perfectly reasonable to start with this step even when other aspects are still at the planning stage. However, it is worth knowing that it is the decision-making leader responsible for planning the FinOps culture-building strategy in the organization, so the whole activity is also focused on drawing conclusions, changing the way of working, and designing architecture. Only such an approach can result in savings in the future (cost avoidance). Otherwise, the optimization of cloud costs will temporarily reduce the bill but will not develop the technical staff’s skills, resulting in the same errors reoccurring over and over again and further high bills in the future.
Cloud cost optimization in practice.
Three conventional categories
Cloud cost optimizations start with analyzing your current and historical bills and checking which and how AWS cloud services are used by your organization (or a specific system in your AWS account – depending on the scale of the optimization).
Based on this information and insight into the architecture of the implemented solutions, the architect-economist prepares a list of recommendations along with the estimated savings generated once implemented. The recommendations are highly technical.
From my experience, these recommendations can be divided into three groups in terms of the time and resources needed to implement them:
- Fast and simple – in most cases, they concern misconfigured or completely redundant resources but do not affect the solution architecture. The turnaround time is days, and sometimes just hours! Very often, you can see the effect of lower costs the next day.
- Medium – here, we face problems that require more planning. Additional analysis of the potential impact on other components or services of the organization is often required. This is especially important in the case of irreversible operations, e.g., termination of potentially redundant resources that were initially uncertain. For this reason, we can expect cost reductions within days or weeks.
- Complex – this category always requires significant changes to the architecture of a solution that are never quick or easy to implement. Hence, a work period of several weeks must be considered before the estimated savings are achieved.
In practice, this three-level categorization is very convenient when planning a repair and determining the sequence of actions.
It is worth mentioning here that our subconscious mind leads us astray and makes us believe that only complex optimizations will bring significant (remarkable) effects on the bill. Nothing could be further from the truth! It all depends on the specific project, the AWS services used, their configuration, and the usage volume.
Our portfolio includes examples of Fast and simple optimizations that resulted in greater savings than the more time-consuming ones.
Examples of success stories for cloud cost optimization
At one of our clients, we conducted a cost analysis and implemented recommendations regarding optimizing disks in the EBS (Amazon Elastic Block Storage) service. The Fast and simple cost optimization brought immediate results: savings of 20%. On an annual basis, the customer saved about $240,000.
In another case from the Medium category, we identified waste (unnecessary resources) generating 68% of the cost of the entire service. On an annual basis, this translated into $4,730 in savings.
The activity that required the most effort and time was modifying the solution’s architecture to reduce the costs associated with Elastic IP. This was obviously a complicated change to make. Before the remedial action began, the customer paid close to $30,000 for IP addresses assigned to thousands of EC2 virtual machines. After more than a year of work, the costs were reduced by 67% to less than $10,000 per month. On an annual basis, this saved more than $123,000.
Remarkably, the conceptual work only initially required significant involvement of architects and engineers in the analysis and preparation of the new network architecture. Most of the practical work was done by less-skilled employees who performed the necessary steps on each EC2 server, thus saving the time of the most valuable specialists.
As you can see, real-world examples show that the savings achieved through cost optimization do not depend on the difficulty level and time-consuming implementation. Sometimes the most straightforward solutions yield the most significant savings, which doesn’t mean that the more challenging ones aren’t worth pursuing.
Of course, total savings (those in dollars) depend on the billing amount, but even the $4,730 saved per AWS service for this customer represented about 16% of total AWS costs and made a real difference.
I know you are well aware that establishing a FinOps culture is an organizational change that can take years.
In situations where quick action is required, such as those caused by budget overruns, I recommend conducting a quick cost optimization to get reliable information on potential savings and technical guidance to achieve them.
In the simplest form, such analyzes take several days. Obviously, it depends on the scope of AWS accounts, resources, and services used.
Fast and simple cloud cost optimizations will give you tangible and immediate savings. They will also be your success story that will pave your organization’s path to FinOps.