Monday, October 31, 2016

Application clusters in the Oracle cloud

Traditionally (in the past) applications have been deployed commonly in a single instance manner. One application server running a specific application for a specific business purpose. When the application server encountered a disruption this automatically resulted in downtime for the business.

As this is not the ideal situation systems have been build more and more in a clustered fashion. Multiple machines (nodes) running all an instance of the application and balancing load between the nodes. When one node fails the other nodes take over the load. This is great model in which your end-users are protected against the failure of one of the nodes. Commonly an engineer would take the malfunctioning node and repair the issue and introduce it back to the cluster when fixed.

With the cloud (private cloud and public cloud) and the move to a more cattle like model the use of clustered solutions starts to make sense even more. In this model the engineer who traditionally fixed an issue and re-introduced the node back to the cluster will now be instructed to only spend a very limited time in fixing the issue. If he is unable to fix the issue on a node in a given set of minutes the action will be to “destroy” the node and re-deploy a fresh node.

Due to this model engineers will not spend hours and hours on fixing an issue on an individual node, they will only spend a couple of minutes trying to fix the issue. Due to this the number of nodes and engineer can maintain will be much higher, resulting in a lower cost for maintenance per node.

To be able to adopt a model where nodes are considered replaceable cattle and no longer pets a couple of things need to be in place and needs to be taken care of. The conceptual prerequisites are the same for a private cloud as they are for a public cloud even though the technical implementation might differ.

  1. Nodes should be stateless. 
  2. Nodes should be automatically deployable.
  3. Nodes should join the cluster automatically.
  4. The cluster needs to auto-aware.


Nodes should be stateless.
This means that a node, an application node, is not allowed to have a state. Meaning, it cannot hold transactions or application data. The application node is, simply put, to execute application tasks. Whenever a node is destroyed no data will be lost and whenever a node is deployed it can directly take its role in the cluster.

Nodes should be automatically deployable
A node should be deployable automatically. This means, fully automatically without any human interaction after the moment the node is deployed. Oracle provides a mechanism to deploy new compute nodes in the Oracle Public cloud based upon templates in combination with customer definable parameters. This will give you in essence only a virtual machine running Oracle Linux (or another operating system if so defined). The node will have to be configured automatically after the initial deployment step. You can use custom scripting to achieve this or you can use Puppet or Chef like mechanisms. In general a combination of both customer scripting within the VM and Puppet or Chef is the most ideal solution for fully automated deployment of a new node in the cluster.

Nodes should join the cluster automatically 
In many cases the automatic deployment of a new node, deploying Oracle Linux and configuring the application node within the Oracle Linux virtual machine is something that is achieved. What in many cases is lacking in the fully automated way of working is that this node is joining the cluster. Depending on your type of application, application server and node-distribution (load balancing for example) mechanism the technical implementation will differ. However, it is important to ensure that a newly provisioned node is able to directly become a part of the cluster and take its role in the cluster.

The cluster needs to auto-aware
The automatic awareness of the cluster go’s partially in to the previous section where we mention the fact that a new node needs to join the cluster fully automatically and ensure the node will take the requested role in the cluster. This means that the cluster needs to be auto-aware and aware of the fact that a new node has joined. Also the cluster needs to be automatically aware of the fact if a node malfunctions. In case one of the nodes become unresponsive the cluster should automatically ensure that the node is no longer served new workloads. For example, in case of a application server cluster which makes use of load-balancing, the malfunctioning node should be taken out of the balancing algorithm until the moment it is repaired or replaced. When using a product which is developed to be cluster aware, for example Oracle Weblogic this might not be that hard to achieve and the cluster will handle this internally. When you use a customer build cluster, for example a micro-services based application running

with NGINX and Flask and depending on load-balancing you will have to take your own precautions and ensure that this auto-aware mechanism is in place.  

Oracle Public cloud conceptual deployment
When we use the above model in the Oracle Public cloud conceptual deployment could look like the one below where we deploy a web-based application.


In this model, the API server will create a new instance for one of the applications in the application cluster it is part of. As soon as this is done the new server will report back to the API server. Based upon the machine will self-register at puppet and all required configuration will be done on the node. The latest version of the application software will be downloaded from the GIT repository and as a last step the new node will be added to the load balancer cluster to become a full member of the application cluster.

The above example uses a number of standard components from the Oracle cloud, however, when deploying a full working solution you will have to ensure you have some components configured specifically for your situation. For example, the API server needs to be build to undertake some basic tasks and you will have to ensure the correct puppet plans are available on the puppet server to make sure everything will be automatically configured in the right manner.

As soon as you have done so however you will have a fully automatic scaling cluster environment running in the Oracle Public Cloud. As soon as you have done so for one environment this is relatively easy to change into other types of deployments on the same cloud. 

No comments: