Modern web applications that serve data and content to millions of users run in clustered environments. Dealing with a huge amount of connections at the same time requires many resources. That resources are CPU time and RAM. Each and every connection produces an additional load on the server. Having just one server would require a vast amount of resources and we already know that Moore’s Law is no longer valid. That said we need multiple servers ready to respond to users’ requests. Web applications use HTTP(S) protocol which works in the request-response communication method. Upon user’s request, the server will prepare the response and send it back. Every response needs to be processed by the server and that’s the key factor in provisioning resources for the server – requests may be different, starting from simple static data like CSS or images to more demanding like requesting data from database, filtering or calculating certain data.
The time between request and response is called latency. It is obvious that we are targeting the minimum latency. The first problem is, overflowing the server with requests to the point it no longer responds in a timely manner (or it simply degrades). In order to preserve latency, we introduce clustered environments. Clusters make it easy to build reliable, scalable and more flexible services. The other aspect of a cluster is High Availability – making sure an application is available even if some servers are down. The Gartner has calculated that downtime costs $5 600 per minute.
In the long run the server clusters will save money by reducing downtimes. There is a higher entry threshold because of hardware redundancy but it helps maintain the reliable services. Users’ experience with services that are always available, fast, and error free will make them come back more often. Companies may benefit from clusters because it reduces not only the downtime but also the engineering effort, especially when it comes to system recovery.
How clusters work
Cluster is a bunch of servers running the same application with exactly the same configuration and communicating with each other. All instances in a cluster work together to provide high availability, reliability, and scalability. Each server can handle the request making it easier for the whole system to carry the load. From user’s perspective it is transparent, it seems like the monolith server because users still use just one URL or IP to connect. The request is then routed to the appropriate server selected based on balancing algorithm. There are many different algorithms we can choose from.
The most popular are:
- Roundrobin – servers are used in turns and load is equally distributed
- Least connection – the server with the least connection receives the connection
- Source – every user will use the same server that was selected in initial connection. Users connecting from the same IP address will always reach the same server
- Uri – users reaching the same uri (either left or right side of question mark) will be directed to the same server
- Hdr – similar Uri but uses HTTP headers to choose the server
Usually servers are also selected based on geo-localization – routing requested to the servers that are closest to the user (based on IP localization).
Let’s put everything together:
- We have a balancing mechanism – used by Load balancer
- Load balancer redirects user connections to the servers in a cluster which offers uninterrupted service and session data persistence by session replication
- Servers handle the user requests
That’s pretty much it, it’s fairly simple yet very powerful and as usual not so easy when it comes to implementation. That’s why there are many different tools for helping us to manage clusters.
What is High Availability?
When system goes down it might be a disaster. It involves costs because company can no longer earn money and effort to put system back online. There are many reasons for the server to stop responding, it could be a system failure, outage, application error to name few. To prevent unexpected downtimes we can make use of clustered environments. If more than one server is able to handle requests and one goes down, web traffic can be routed to other server which is still online. This fallback is called High availability. No matter what happens to the server, there are still other servers which can take over the traffic. The hardware redundancy makes it easy for the whole cluster to stay up even after a failure – the possibility that all nodes will be down at the same time is low.
Amazon guarantees 99.999% (or as they call it: “five nines”) of availability for emergency response systems. Why those 3 nines in decimal part? Wouldn’t 99% be enough? Interesting enough, 99% availability means ~87 hours of unavailability per year, that’s 14 minutes a day! For systems available around the clock that is certainly too much.
Achieving High Availability (or simply HA) can be made easy when using proper tools. The most important in a toolbox is mentioned earlier Load balancer. It is a guide which tells the request which way to go to end up in the cozy arms of the server, where it gets taken care of. The great thing about Load balancers is that they know how many requests were processed by each server and maintain balance (thus the name) – every server will be equally loaded. The next thing is Load balancers know exactly what servers are connected, which of them are performing well (comparing latency of similar requests), what servers are down and they can redirect the user to servers that are still up. Thanks to HA clusters we avoid single points-of-failure. It is like a car mechanic giving you a replacement car.
High Availability in ThingWorx
In ThingWorx, HA was introduced in version 8.0. It allows to create multiple ThingWorx instances and connect a load balancer to handle the traffic. There is one master server, ready to handle requests. The others (also called slaves) are a backup, waiting patiently for the main server to go down (for any reason) to join the stage and continue the show. That approach is called Active-Passive. Such cluster has multiple nodes but only one main node which is active at the time. It helps a lot in building High availability but is not very useful in scaling up the solution. Adding new nodes will only make system more available – it is a bit like adding a nine to decimal part.
In modern IoT applications, where everything is connected, it is critical to ensure server is available. In such a case everything is about the data. Devices continuously collect the data, but their limited resources require server to store and process it to make it into a value. Applications running a ThingWorx platform are critical to the whole infrastructure – it is a central database which also allows to analyze the data and apply Machine Learning models. As an outcome edge devices can be controlled based on current situation (e.g. start watering the plants if ground is dry) or projected activities (e.g. optimizing wind farms based on weather forecasts). In the situation when ThingWorx is not available none of above could happen. That’s the main driver for using Active-Passive clustering.
High Availability Overview
ThingWorx uses Apache Zookeeper for cluster management which exposes services like:
- Configuration management
- Leader election
- Synchronization
- Naming service
- Cluster management
Zookeeper makes it easy to add new servers (nodes) to a cluster. All we need to do is to set up new server and add it to the cluster. No restarts, no downtime. Set up the server offline, test it and connect to the cluster. Zookeeper will make sure it’s available and align the configuration between all nodes. The usual set up would contain 1 master node – elected by Zookeeper upon cluster start, and 2 or more slave nodes – always up but not handling the requests. This is always aligned with the requirements, because it introduces redundancy.
In the diagram, all connections are handled by a load balancer. Client applications and devices are no longer connecting directly. That ensures the traffic is routed to the currently active ThingWorx instance. Load balancer knows about instance connected and constantly monitors their health. When cluster node goes down, load balancer will no longer receive data from that node and will automatically redirect the request to another node.
Beyond High Availability
To handle the peaks (heavy load or traffic in certain situations) we need another approach – Active-Active clustering. That option was added in ThingWorx 9. Instead of just waiting, other nodes can actively take part and respond to traffic. Such clusters not only support HA but also provide horizontal scaling. Such type of scaling helps adding new resources to the cluster. Instead of replacing hardware in a server which of course produces downtime, new server is prepared and added seamlessly. Capacity planning is a challenge but it’s much easier to adjust when using clustered environments.
Scalability of an application can be measured by number of requests that can be handled by a server simultaneously. The point at which an application can no longer handle additional requests effectively is the limit of its scalability. It is much easier to double the number of requests by using second server than to make the same in one server, especially if the base number is high. With ThingWorx 9 it is possible to build a scalable solution to handle hundreds of thousands of devices.
If you need help with implementing new or upgrading existing solutions based on ThingWorx, please contact us.