Some time ago I was involved in a project that was to provide HA Windchill Cluster – actually nothing new, the cluster itself does the job and basically I could end the topic here, but …
We host the entire solution (PROD, QA, TEST and DEV) in AWS and it would be nice to have it all automated enough not to waste time considering it and writing hundreds of pages of documentation on how to implement changes or how to fix an environment.
One can write another article about Windchill and its resistance to automation – but someone else will have to do it, since I will focus on automation related to keeping systems alive.
A few assumptions for a good start:
- We have everything defined in CloudFormation (IaaC) templates
- We use one template to create resources on all stages (PROD, QA, etc.)
- We have ready AMI, which – thanks to the start scripts – can adapt to every situation.
From the design requirements, there are also several requirements that are essential for further designing the solution:
- Backup with copy to a backup location (other AWS region)
- Restoring the QA environment from the current production state
Considering the above requirements and assumptions, it had to be done so that all this took place without our participation, and at the same time would be flexible enough to allow changes (e.g. QA restoration from the indicated AMI and Snapshot RDS)
As per the assumptions, everything that happens with the resources is to be done through CloudFormation templates, which we additionally keep in the repository to know who did what and when.
To put it simply, let’s take Windchill itself, for which we need two CloudFormation templates:
- Template with Auto Scaling Group definition and its LaunchTemplate
- Template with database definition (RDS)
In both templates we need to provide several parameters, which will be different for each stage and, at the same time – easily modifiable. We can use json files with parameters but their automatic update in the repository doesn’t seem to be a very pleasant task.
Here, the SSM parameters mentioned above come in handy.
What exactly are SSM parameters?
Systems Manager Parameters (Parameter Store) is, in simplified terms, a system that allows the storage of values under a specific key with the option of encryption, as well as enabling granular access management to a specific parameter and its version.
The parameter value can be changed by simply calling the API using, for example, AWS CLI or any of the available SDKs (like boto3)
Moreover, CloudFormation and SystemsManager itself have built-in integration with SSM parameters, allowing to read the value during creation and allowing to update the CloudFormation Stack or invoke SSM documents.
As a result, we obtain a simple-to-use place where we can hold CloudFormation template parameter values.
What does this enable?
Automation with CloudFormation and SSM
Let’s go back to Windchill (although it can be any other application), we have two CloudFormation templates that need several parameters:
Template from AutoScaling Group and Launch Template:
- AMI ID – to know which AMI should we use in Launch Template
- Desired Count – to know how many machines are to be started</li<
Template with database:
- RDS Snapshot Id – to know which Snapshot to unwind the database from
The diagram below illustrates all this.
Basically, it doesn’t differ much from the encoding of these values into the template, and yet – it does.
Parameters can be modified via the API, and thus, we can easily (i.e. with the Lambda function) change the value of individual parameter. Moreover, parameters can be grouped by path (like catalogs). In the diagram we have the path „/dev/”. As a result, we might create parameter sets for each stage: „/dev/”, „/qa/”, „/prod/” and keep there values appropriate for a given Stage.
We will change the parameter value and what next?
It depends. CloudWatch Events can detect a the parameter change, which we can use to run the Lambda function, which will trigger the CloudFormation stack update for us. For example, like that:
When the lambda stack update is launched, CloudFormation retrieves the current parameters values from SSM and applies to resources. Obviously, you should be careful and check in the documentation change of which parameters is safe and which will cause the exchange of the resource (e.g. database).
The update-stack function itself has a “–use-previous-template” switch that allows to use the current template to update the stack, which is very useful in this case. We don’t need to worry about:
- keeping the template in S3,
- accessing the template from the function level,
- using the wrong template.
How does this have to do with backup?
As a backup system, we can use AWS Step Functions, which allow orchestration of the Lambda function. Basically, this kind of system performs 3 actions:
- Creates AMI from a running instance of EC2
- Updates the SSM Parameter
- Sends AMI to the Disaster Recovery region
- Optionally, it can launch the Stack update function if launching from CloudWatch Events is not implemented
And how to check this backup?
Performing a backup is one part of the process, however, to make sure that the backup is working well and is correct, we should check it from time to time.
Here we recall the project requirements, which assume restoring the QA environment from the current state of production. Thus, every morning, an automatic process is started, which from the same CloudFormation templates builds the entire QA environment, using the latest AMI and Snapshots made by the Backup system, operating in complete ignorance of the existence of such a system.
What do we gain?
- Backup of such a solution is as simple as that
- By automatically updating CloudFormation stacks after changing parameters in SSM, AutoScalling Group always uses the latest available AMI (not one released a year ago)
- In case of failure of two cluster nodes, we only lose data from the period from failure to the last backup (RPO from the slang of backup environments)
- If we need to go back to a specific place in time, we just set the SSM parameters to the correct AMI and Snapshots and update the stacks
- We don’t keep parameters in files. With a simple Call API we are able display the current parameter values for a given stage
- We have granular access control to individual parameters (DEV only to path /dev/)
- Everything is done by itself
If you need help in automating your cloud services please contact us.