Some time ago I was involved in a project that was to provide HA Windchill Cluster – actually nothing new, the cluster itself does the job and basically I could end the topic here, but …

We host the entire solution (PROD, QA, TEST and DEV) in AWS and it would be nice to have it all automated enough not to waste time considering it and writing hundreds of pages of documentation on how to implement changes or how to fix an environment.

One can write another article about Windchill and its resistance to automation – but someone else will have to do it, since I will focus on automation related to keeping systems alive.

A few assumptions for a good start:

  1. We have everything defined in CloudFormation (IaaC) templates
  2. We use one template to create resources on all stages (PROD, QA, etc.)
  3. We have ready AMI, which – thanks to the start scripts – can adapt to every situation.

From the design requirements, there are also several requirements that are essential for further designing the solution:

  1. Backup with copy to a backup location (other AWS region)
  2. Restoring the QA environment from the current production state

Considering the above requirements and assumptions, it had to be done so that all this took place without our participation, and at the same time would be flexible enough to allow changes (e.g. QA restoration from the indicated AMI and Snapshot RDS)

CloudFormation templates

As per the assumptions, everything that happens with the resources is to be done through CloudFormation templates, which we additionally keep in the repository to know who did what and when.

To put it simply, let’s take Windchill itself, for which we need two CloudFormation templates:

  • Template with Auto Scaling Group definition and its LaunchTemplate
  • Template with database definition (RDS)

In both templates we need to provide several parameters, which will be different for each stage and, at the same time – easily modifiable. We can use json files with parameters but their automatic update in the repository doesn’t seem to be a very pleasant task.

Here, the SSM parameters mentioned above come in handy.

What exactly are SSM parameters?

Systems Manager Parameters (Parameter Store) is, in simplified terms, a system that allows the storage of values under a specific key with the option of encryption, as well as enabling granular access management to a specific parameter and its version.

The parameter value can be changed by simply calling the API using, for example, AWS CLI or any of the available SDKs (like boto3)
Moreover, CloudFormation and SystemsManager itself have built-in integration with SSM parameters, allowing to read the value during creation and allowing to update the CloudFormation Stack or invoke SSM documents.

As a result, we obtain a simple-to-use place where we can hold CloudFormation template parameter values.

What does this enable?

Automation with CloudFormation and SSM

Let’s go back to Windchill (although it can be any other application), we have two CloudFormation templates that need several parameters:

Template from AutoScaling Group and Launch Template:

  • AMI ID – to know which AMI should we use in Launch Template
  • Desired Count – to know how many machines are to be started</li<

Template with database:

  • RDS Snapshot Id – to know which Snapshot to unwind the database from

The diagram below illustrates all this.

SSM parameters diagram 1

Basically, it doesn’t differ much from the encoding of these values into the template, and yet – it does.

Parameters can be modified via the API, and thus, we can easily (i.e. with the Lambda function) change the value of individual parameter. Moreover, parameters can be grouped by path (like catalogs). In the diagram we have the path „/dev/”. As a result, we might create parameter sets for each stage: „/dev/”, „/qa/”, „/prod/” and keep there values appropriate for a given Stage.

We will change the parameter value and what next?

It depends. CloudWatch Events can detect a the parameter change, which we can use to run the Lambda function, which will trigger the CloudFormation stack update for us. For example, like that:

SSM paramters diagram 2

When the lambda stack update is launched, CloudFormation retrieves the current parameters values from SSM and applies to resources. Obviously, you should be careful and check in the documentation change of which parameters is safe and which will cause the exchange of the resource (e.g. database).

The update-stack function itself has a “–use-previous-template” switch that allows to use the current template to update the stack, which is very useful in this case. We don’t need to worry about:

  • keeping the template in S3,
  • accessing the template from the function level,
  • using the wrong template.

How does this have to do with backup?

As a backup system, we can use AWS Step Functions, which allow orchestration of the Lambda function. Basically, this kind of system performs 3 actions:

  1. Creates AMI from a running instance of EC2
  2. Updates the SSM Parameter
  3. Sends AMI to the Disaster Recovery region
  4. Optionally, it can launch the Stack update function if launching from CloudWatch Events is not implemented

And how to check this backup?

Performing a backup is one part of the process, however, to make sure that the backup is working well and is correct, we should check it from time to time.

Here we recall the project requirements, which assume restoring the QA environment from the current state of production. Thus, every morning, an automatic process is started, which from the same CloudFormation templates builds the entire QA environment, using the latest AMI and Snapshots made by the Backup system, operating in complete ignorance of the existence of such a system.

What do we gain?

  1. Backup of such a solution is as simple as that
  2. By automatically updating CloudFormation stacks after changing parameters in SSM, AutoScalling Group always uses the latest available AMI (not one released a year ago)
  3. In case of failure of two cluster nodes, we only lose data from the period from failure to the last backup (RPO from the slang of backup environments)
  4. If we need to go back to a specific place in time, we just set the SSM parameters to the correct AMI and Snapshots and update the stacks
  5. We don’t keep parameters in files. With a simple Call API we are able display the current parameter values for a given stage
  6. We have granular access control to individual parameters (DEV only to path /dev/)
  7. Everything is done by itself

If you need help in automating your cloud services please contact us.

How useful was this post?

Click on a star to rate it!

Average rating 5 / 5. Vote count: 3

No votes so far! Be the first to rate this post.

If you violate the Regulations , your post will be deleted.

    _All posts in this category

    How to implement Industry 4.0 smarter, faster, and easier?

    The concept associated with Industry 4.0 is Smart Factory – in other words “intelligent factory”. This type of plant is based on integrated…
    Read more

    ThingWorx AWS Connector

    The ubiquitous fourth industrial revolution, named Industry 4.0, is now one of the fastest growing IoT markets. The digital transformation journey is more…
    Read more

    How to achieve AWS cloud cost optimization with FinOps?

    The cloud is not on-premise, which means that IT purchases don't happen according to a strategic plan, but immediately when the architect provisions…
    Read more

    What should you know about serverless computing?

    Serverless cmputing still raises a lot of doubts, especially among those environments that are just starting to use cloud services or are just…
    Read more

    How to make use of Talend Open Studio in the medical industry?

    The use of modern technologies in medicine is getting more and more popular. Paper patient records are becoming obsolete and are being replaced…
    Read more

    _Let’s get in touch

    Contact us