Modern web applications that serve data and content to millions of users run in clustered environments. Dealing with a huge amount of connections at the same time requires many resources. That resources are CPU time and RAM. Each and every connection produces an additional load on the server. Having just one server would require a vast amount of resources and we already know that Moore’s Law is no longer valid. That said we need multiple servers ready to respond to users’ requests. Web applications use HTTP(S) protocol which works in the request-response communication method. Upon user’s request, the server will prepare the response and send it back. Every response needs to be processed by the server and that’s the key factor in provisioning resources for the server – requests may be different, starting from simple static data like CSS or images to more demanding like requesting data from database, filtering or calculating certain data.

The time between request and response is called latency. It is obvious that we are targeting the minimum latency. The first problem is, overflowing the server with requests to the point it no longer responds in a timely manner (or it simply degrades). In order to preserve latency, we introduce clustered environments. Clusters make it easy to build reliable, scalable and more flexible services. The other aspect of a cluster is High Availability – making sure an application is available even if some servers are down. The Gartner has calculated that downtime costs $5 600 per minute.

In the long run the server clusters will save money by reducing downtimes. There is a higher entry threshold because of hardware redundancy but it helps maintain the reliable services. Users’ experience with services that are always available, fast, and error free will make them come back more often. Companies may benefit from clusters because it reduces not only the downtime but also the engineering effort, especially when it comes to system recovery.

High Availability TTPSC

How clusters work

Cluster is a bunch of servers running the same application with exactly the same configuration and communicating with each other. All instances in a cluster work together to provide high availability, reliability, and scalability. Each server can handle the request making it easier for the whole system to carry the load. From user’s perspective it is transparent, it seems like the monolith server because users still use just one URL or IP to connect. The request is then routed to the appropriate server selected based on balancing algorithm. There are many different algorithms we can choose from.

The most popular are:

  • Roundrobin – servers are used in turns and load is equally distributed
  • Least connection – the server with the least connection receives the connection
  • Source – every user will use the same server that was selected in initial connection. Users connecting from the same IP address will always reach the same server
  • Uri – users reaching the same uri (either left or right side of question mark) will be directed to the same server
  • Hdr – similar Uri but uses HTTP headers to choose the server

Usually servers are also selected based on geo-localization – routing requested to the servers that are closest to the user (based on IP localization).

Let’s put everything together:

  • We have a balancing mechanism – used by Load balancer
  • Load balancer redirects user connections to the servers in a cluster which offers uninterrupted service and session data persistence by session replication
  • Servers handle the user requests

That’s pretty much it, it’s fairly simple yet very powerful and as usual not so easy when it comes to implementation. That’s why there are many different tools for helping us to manage clusters.

What is High Availability?

When system goes down it might be a disaster. It involves costs because company can no longer earn money and effort to put system back online. There are many reasons for the server to stop responding, it could be a system failure, outage, application error to name few. To prevent unexpected downtimes we can make use of clustered environments. If more than one server is able to handle requests and one goes down, web traffic can be routed to other server which is still online. This fallback is called High availability. No matter what happens to the server, there are still other servers which can take over the traffic. The hardware redundancy makes it easy for the whole cluster to stay up even after a failure – the possibility that all nodes will be down at the same time is low.

High Availability

Amazon guarantees 99.999% (or as they call it: “five nines”) of availability for emergency response systems. Why those 3 nines in decimal part? Wouldn’t 99% be enough? Interesting enough, 99% availability means ~87 hours of unavailability per year, that’s 14 minutes a day! For systems available around the clock that is certainly too much.

Achieving High Availability (or simply HA) can be made easy when using proper tools. The most important in a toolbox is mentioned earlier Load balancer. It is a guide which tells the request which way to go to end up in the cozy arms of the server, where it gets taken care of. The great thing about Load balancers is that they know how many requests were processed by each server and maintain balance (thus the name) – every server will be equally loaded. The next thing is Load balancers know exactly what servers are connected, which of them are performing well (comparing latency of similar requests), what servers are down and they can redirect the user to servers that are still up. Thanks to HA clusters we avoid single points-of-failure. It is like a car mechanic giving you a replacement car.

High Availability in ThingWorx

In ThingWorx, HA was introduced in version 8.0. It allows to create multiple ThingWorx instances and connect a load balancer to handle the traffic. There is one master server, ready to handle requests. The others (also called slaves) are a backup, waiting patiently for the main server to go down (for any reason) to join the stage and continue the show. That approach is called Active-Passive. Such cluster has multiple nodes but only one main node which is active at the time. It helps a lot in building High availability but is not very useful in scaling up the solution. Adding new nodes will only make system more available – it is a bit like adding a nine to decimal part.

In modern IoT applications, where everything is connected, it is critical to ensure server is available. In such a case everything is about the data. Devices continuously collect the data, but their limited resources require server to store and process it to make it into a value. Applications running a ThingWorx platform are critical to the whole infrastructure – it is a central database which also allows to analyze the data and apply Machine Learning models. As an outcome edge devices can be controlled based on current situation (e.g. start watering the plants if ground is dry) or projected activities (e.g. optimizing wind farms based on weather forecasts). In the situation when ThingWorx is not available none of above could happen. That’s the main driver for using Active-Passive clustering.

High Availability Overview

ThingWorx uses Apache Zookeeper for cluster management which exposes services like:

  • Configuration management
  • Leader election
  • Synchronization
  • Naming service
  • Cluster management

Zookeeper makes it easy to add new servers (nodes) to a cluster. All we need to do is to set up new server and add it to the cluster. No restarts, no downtime. Set up the server offline, test it and connect to the cluster. Zookeeper will make sure it’s available and align the configuration between all nodes. The usual set up would contain 1 master node – elected by Zookeeper upon cluster start, and 2 or more slave nodes – always up but not handling the requests. This is always aligned with the requirements, because it introduces redundancy.

High Availability diagram

In the diagram, all connections are handled by a load balancer. Client applications and devices are no longer connecting directly. That ensures the traffic is routed to the currently active ThingWorx instance. Load balancer knows about instance connected and constantly monitors their health. When cluster node goes down, load balancer will no longer receive data from that node and will automatically redirect the request to another node.

Beyond High Availability

To handle the peaks (heavy load or traffic in certain situations) we need another approach – Active-Active clustering. That option was added in ThingWorx 9. Instead of just waiting, other nodes can actively take part and respond to traffic. Such clusters not only support HA but also provide horizontal scaling. Such type of scaling helps adding new resources to the cluster. Instead of replacing hardware in a server which of course produces downtime, new server is prepared and added seamlessly. Capacity planning is a challenge but it’s much easier to adjust when using clustered environments.

Scalability of an application can be measured by number of requests that can be handled by a server simultaneously. The point at which an application can no longer handle additional requests effectively is the limit of its scalability. It is much easier to double the number of requests by using second server than to make the same in one server, especially if the base number is high. With ThingWorx 9 it is possible to build a scalable solution to handle hundreds of thousands of devices.

If you need help with implementing new or upgrading existing solutions based on ThingWorx, please contact us.

How useful was this post?

Click on a star to rate it!

Average rating 5 / 5. Vote count: 2

No votes so far! Be the first to rate this post.

If you violate the Regulations , your post will be deleted.

    _All posts in this category

    OEE: is your company stuck in a manipulation trap?

    If you think OEE has no secrets to you and your plant maintain highest OEE results… think again. Harsh truth is that most…
    Read more

    How to increase production efficiency without investments in the shop floor?

    You don't have to replace your machines with the new ones to make your production "smarter" and more efficient. Your shop floor is…
    Read more

    Navigating ThingWorx: Expert Solutions for IoT Challenges

    Whether you're a seasoned user or new to ThingWorx, overcoming these hurdles is crucial for IoT success. In this blog, we'll uncover the…
    Read more

    5 ways Manufacturers Can reduce energy Costs 

    Very often overlooked or accepted as-is due to being an integral part of the production – in times of crisis and cost-cutting, it…
    Read more

    5 steps to increase energy efficiency for Manufacturers

    According to the German Federal Statistical Office, energy consumption by the industry for the production of goods hardly changed between 1995 and 2019.…
    Read more

    7 ways how data visibility helps manufacturing improve efficiency

    In the manufacturing industry, efficiency is key to staying competitive and profitable. One way to improve efficiency is through data visibility. By having…
    Read more

    Energy Advisor for Manufacturing – energy savings for industry

    Energy consumption bills have been a major focus in virtually all areas of industry for many years, and the current global situation is…
    Read more

    Digital Transformation – technological trends for the successful evolution of the automotive industry

    The new opportunities associated with Industry 4.0 require companies to adapt to the new environment. In this article, we take a closer look…
    Read more

    Here's why you should use Azure Cloud for your IoT Solutions

    Digital transformation is reshaping the way how people, products, assets, data, and operations are connected to create an outcome for the customers and…
    Read more

    IoT Data Visualization

    Internet of Things systems have one huge advantage – they can collect lots of data. Temperature, items produced, amount of remaining liquids, humidity,…
    Read more

    Coronavirus & production continuity – IoT & AR for industry

    The sudden epidemic of SARS-CoV-2 coronavirus and the COVID-19 disease has affected every sphere of the modern world. Caused by an epidemiological threat,…
    Read more

    Horizontal Scalability in ThingWorx 9

    Scalability in web applications has become crucial in the past years. An ever-growing number of devices and clients, connected to the Internet, makes…
    Read more

    IoT Hub Connector for ThingWorx – Connecting ThingWorx Platform with The Azure Cloud

    The ThingWorx IoT Hub Connector is the bridge that connects the ThingWorx platform to the Azure cloud (more specifically to Azure IoT Hub).…
    Read more

    What is AIoT? Artificial Intelligence of Things in Industry 4.0

    Synergy, this is how you can define this natural combination of technologies that will significantly affect the appearance of the future of production.…
    Read more

    What's new in ThingWorx 9.0?

    This year's Liveworx, due to the global situation caused by the COVID-19 pandemic, has exceptionally been held only online. On the very first…
    Read more

    Machine Learning & ThingWorx vs COVID-19

    The evolution of artificial intelligence, as well as machine learning has gained momentum in the last few years. The constant drive to increase…
    Read more

    5G network – the key to the Industrial Internet of Things

    The Industrial Internet of Things changes our view on the classic concept of production today. The largest manufacturing companies are eager to reach…
    Read more

    How is the Internet of Things changing the aviation industry?

    The Internet of Things is being used more and more, and even global giants are already using the optimization or information gathering solutions.…
    Read more

    Rockwell Automation TechED EMEA 2019 shows that Industry 4.0 is closer than you think

    Following up our partnership with Rockwell Automation, our team of business and technical experts found their way to Munich, Germany to participate in…
    Read more

    Internet of Things and its impact on Automotive Industry

    Nobody would be surprised by a statement that the Automotive industry is innovative in its nature. Cars are machines that historically revolutionized many…
    Read more

    Internet of Things in logistics

    It might seem that the Internet of Things is dedicated only to industry. However, transport and logistics are closely connected to it, and…
    Read more

    The fourth Industrial (R)evolution

    Industry 4.0 and its foundation, the Internet of Things (IoT – especially in industrial version: IIoT), in the last years have dominated the…
    Read more

    How is ThingWorx different from Axeda and should you consider moving?

    PTC is phasing out its Axeda platform. Although they look similar at first sight, Axeda and ThingWorx differ very much in capabilities.
    Read more

    Are your data sources ready? InfluxDB support of ThingWorx

    While ago, PTC announced a new release of ThingWorx Platform (8.4). Among other exciting features, this release has OOTB support for integration with…
    Read more

    _Let’s get in touch

    Contact us