Pattern-type (Part 5): Scaling in/out based on your own monitoring collector with SmartCloud Application Services

From the Pattern-type (Part 4): Adding your own monitoring collector to a pattern-type with SmartCloud Application Services, we learned how to add a collector to a pattern-type in order to monitor the deployed services. It was a simple collector counting the number of files in a directory. What I propose in this article is to explain how to leverage this collector to add scaling in/out rules based on the metrics provided by this collector.

But first I would like to stress the difference between pattern-type and pattern. A pattern-type provides all components, links, scalability rules… to build a pattern. The best example is the pattern-type for Web-Application (see demo) which provides Enterprise Application, Database components and scalability rules to design your own pattern which you will be able to deploy multiple time. So in this article we show how to create a pattern-type which requires more effort than create a pattern based on a pattern-type. ISV and/or company with home made application which not fit one of the existing pattern-type would be interested in this as pattern-type allows to put all the infrastructure intelligence in the pattern-type in order to deliver the best pattern for your customer.

To implement this scenario we have to modify some files in order to:

1) Capture the scaling rules arguments while designing the pattern:

– The files directory to monitor.

– min/max threshold in terms of number of files.

– min/max number of VMs we would like to spin off.

– time period reaction.

2) Reflect these values in the topology.

3) change the scripts to create the directory to monitor and pass this parameter to the collector.

Application Model

The metadata file of the application model have to be modified in order to capture the different parameters of the scaling rule and defines the scaling policy.

[
    {
        "id": "stddynfiles",
        "label": "Server component",
        "description": "Server component",
        "thumbnail": "appmodel/images/thumbnail/HCenter.png",
        "image": "appmodel/images/HCenter.png",
        "type": "component", 
        "attributes":[
            {
               "id":"Sattr1",
               "label":"Server Attribute 1",
               "sampleValue":"server str",
               "type":"string",
               "required": true
            }
         ] 
    },
    {
        "id": "ScalingPolicy",
        "label": "Server scaling policy",
        "type": "policy",
        "applicableTo": [
            "stddynfiles"
        ],
        "description": "Server scaling policy",
        "groups": [
            {
                "category": "Scaling Type",
                "id": "None",
                "label": "Static",
                "defaultValue": true,
                "attributes": [
                    "intialInstanceNumber"
                ],
                "description": "Description for the Slave scaling policy."
            },
            {
                "category": "Scaling Type",
                "id": "Files",
                "label": "Nb Files Based",
                "defaultValue": false,
                "attributes": [
                    "nbFiles",
                    "dirToMon",
                    "scaleInstanceRange1",
                    "triggerTime1"
                ],
                "description": "Indicate that the action of adding or removing instances will be triggered by average CPU usage of existing instances."
            }
        ],
        "attributes": [
            {
                "id": "intialInstanceNumber",
                "label": "Number of Instances",
                "type": "number",
                "min": 1,
                "required": true,
                "sampleValue": 1,
                "invalidMessage": "This value is required, its valid value greater or equal to 1",
                "description": "Specifies the number of cluster members that are hosting the web application. The default value is 1. Acceptable values range from 1-n."
            },
            {
                "id": "scaleInstanceRange1",
                "label": "Instance number range of scaling in/out",
                "type": "range",
                "min": 1,
                "max": 10,
                "required": true,
                "sampleValue": [
                    1,
                    10
                ],
                "description": "Specifies the scaling range for instance members that are hosting the web application. Acceptable values range from 1-10."
            },
            {
                "id": "triggerTime1",
                "label": "Minimum time (sec) to trigger add/remove",
                "type": "number",
                "max": 1800,
                "min": 30,
                "required": true,
                "sampleValue": 120,
                "invalidMessage": "This value is required, its valid value range is 30 to 1800",
                "description": "Specifies the time duration condition to start scaling activity. 
 The default value is 120 seconds. Acceptable values range from 30-1800."
            },
            {
                "id": "nbFiles",
                "label": "Scaling in/out when number of files is out of threshold range(#)",
                "type": "range",
                "required": true,
                "max": 100,
                "min": 1,
                "sampleValue": [
                    20,
                    80
                ],
                "description": "Specifies the #files threshold condition to start scaling activity. 
When the average #Files of your application platform is out of this threshold range, your platform will be scaled in/out. 
The default value is 20 - 80. Acceptable values range from 1-100."
            },
            {
                "id": "dirToMon",
                "label": "Directory to monitor",
                "type": "string",
                "required": true,
                "sampleValue": "/home/idcuser/files",
                "description": "Specifies the full path of the directory to monitor"
            }
        ]
    }
]

The ‘applicableTo’ attribute specifies to which component this policy is applicable.

The groups determines which types of policies we would like to implement. Here I implemented two policies, the ‘Static’ and the ‘Nb Files Based’. Of course, we will concentrate on the ‘Nb Files Based’. For each group, we have to specify which attributes will be captured for this specific policy:

            {
                "category": "Scaling Type",
                "id": "Files",
                "label": "Nb Files Based",
                "defaultValue": false,
                "attributes": [
                    "nbFiles",
                    "dirToMon",
                    "scaleInstanceRange1",
                    "triggerTime1"
                ],
                "description": "Indicate that the action of adding or removing instances will be triggered by average CPU usage of existing instances."
            }

So the attributes will be ‘nbFiles’, ‘dirToMon’, ‘scaleInstanceRange1’ and ‘triggerTime1’.

Now we have to specify the nature of these attributes, this is done by listing each attribute in the ‘attributes’ element.

        "attributes": [
            {
                "id": "intialInstanceNumber",
                "label": "Number of Instances",
                "type": "number",
                "min": 1,
                "required": true,
                "sampleValue": 1,
                "invalidMessage": "This value is required, its valid value greater or equal to 1",
                "description": "Specifies the number of cluster members that are hosting the web application. The default value is 1. Acceptable values range from 1-n."
            },
            {
                "id": "scaleInstanceRange1",
                "label": "Instance number range of scaling in/out",
                "type": "range",
                "min": 1,
                "max": 10,
                "required": true,
                "sampleValue": [
                    1,
                    10
                ],
                "description": "Specifies the scaling range for instance members that are hosting the web application. Acceptable values range from 1-10."
            },
            {
                "id": "triggerTime1",
                "label": "Minimum time (sec) to trigger add/remove",
                "type": "number",
                "max": 1800,
                "min": 30,
                "required": true,
                "sampleValue": 120,
                "invalidMessage": "This value is required, its valid value range is 30 to 1800",
                "description": "Specifies the time duration condition to start scaling activity. 
 The default value is 120 seconds. Acceptable values range from 30-1800."
            },
            {
                "id": "nbFiles",
                "label": "Scaling in/out when number of files is out of threshold range(#)",
                "type": "range",
                "required": true,
                "max": 100,
                "min": 1,
                "sampleValue": [
                    20,
                    80
                ],
                "description": "Specifies the #files threshold condition to start scaling activity. 
When the average #Files of your application platform is out of this threshold range, your platform will be scaled in/out. 
The default value is 20 - 80. Acceptable values range from 1-100."
            },
            {
                "id": "dirToMon",
                "label": "Directory to monitor",
                "type": "string",
                "required": true,
                "sampleValue": "/home/idcuser/files",
                "description": "Specifies the full path of the directory to monitor"
            }
        ]
    }

For each of them, we define their ‘id’, ‘label’, ‘required’, ‘sampleValues’, ‘description’ and depending on the ‘type’ some other extra attributes such as ‘min’ and ‘max’. We can also specify an invalid message via the ‘invalidMessage’ attribute

Once this is done and we import the plugin, we can see in the pattern editor that we have the possibility to add a scaling policy on the component and specify the attributes of this policy.

Topology

The *.vm file must be adapted in order to inject the scaling policy attributes in the topology.

{
    "vm-templates": [
        {
#set( $spattrs = $provider.getPolicyAttributes($component, "ScalingPolicy") )
            "scaling":{
            	"role":"stddynfiles",
#if_value( $spattrs, "nbFiles", '
                "triggerEvents": [
                    {
                        "metric": "files.nbFiles",
                        "scaleOutThreshold": {
                            "value": $spattrs.nbFiles.get(1),
                            "type": "CONSTANT",
                            "relation": "&gt="
                        },
                        "conjection": "OR",
                        "scaleInThreshold": {
                            "value": $spattrs.nbFiles.get(0),
                            "type": "CONSTANT",
                            "relation": "&lt="
                        }
                    }
                ],')
#if_value( $spattrs, "intialInstanceNumber", '"min": $value,')
#if_value( $spattrs, "intialInstanceNumber", '"max": $value')
#if_value( $spattrs, "scaleInstanceRange1", '"min": $spattrs.scaleInstanceRange1.get(0),')
#if_value( $spattrs, "scaleInstanceRange1", '"max": $spattrs.scaleInstanceRange1.get(1),')
#if_value( $spattrs, "triggerTime1", '"triggerTime": $spattrs.triggerTime1')

            },
            "name": "${prefix}-Server",
            "roles": [
                {
                    "parms": {
                        "Attr1": "$attributes.Sattr1",
#if_value( $spattrs, "dirToMon", '"DirToMonitor": "$spattrs.dirToMon"')
                    },
                    "type": "stddynfiles",
                    "name": "stddynfiles",
        			"dashboard.visible":true
                }                 
            ],

            "packages":["Server"]
        }         
    ]
}

First we have to retrieve the policy attributes from the application model, this is done via the first velocimacro and we store it in $spattrs.

Then we have to specify to which role this policy is applicable and finally the policy itself via the ‘triggerEvents’ attribute. We define the policy only if a nbFiles value have been defined. In the ‘triggerEvents’ we define which metric have to be use. The metric value is composed by the collector category and the metric, here respectively ‘files’ and ‘nbFiles’. The scale-in and scale-out rules are defined by injecting the first and the second element of the ‘nbFiles’ range defined in the application model. The scale-in/out are link by a ‘OR’ conjunction, meaning the event have to be triggered either if the scale-in or scale-out condition is satisfied.

PS: If you cut/paste this code, please change ‘&lt’ and ‘&gt’ by respectively ‘<” and ‘>’.

After that we setup the min and max instances for both policies ‘Static’ and ‘Nb Files Based’ and also the ‘triggerTime’.

You can notice also, we add a parameter in the roles.parms ‘DirToMonitor’ initialized with the ‘dirToMon’ value of the application model. We did that because we will need it in the scripts that will setup our collector.

Scripts

In the previous article I hard-coded the directory to monitor in the scripts and I did it in the install scripts, here as it is now a role parameters, it must be placed in a role scripts either in the configure.py or start.py. Obviously it is a configuration of our role, so we will put it in the configuration scripts.

So, the configuration scrips will look like:

import maestro;
import logging;

maestro.role_status = 'CONFIGURING';
logger = logging.getLogger("configure.py")
logger.debug("Configure Server");
maestro.export['Slave_IP'] = maestro.node['instance']['private-ip'];
nodeName=maestro.node['name'];
roleName=nodeName + '.' + maestro.role['name'];
logger.debug("Node:%s" % nodeName);
logger.debug("Role:%s" % roleName);
dir = maestro.parms['DirToMonitor']
# mkdir for the collector.
rc = maestro.trace_call(logger, ['mkdir', dir])
maestro.check_status(rc, 'Failed to create directory')
maestro.monitorAgent.register('{\
		"node":"%s",\
		"role":"%s",\
		"collector":"com.ibm.maestro.monitor.collector.script",\
		"config":{\
			"metafile":"/home/idcuser/collector/Server-meta.json",\
			"executable":"/home/idcuser/collector/files.sh",\
			"arguments":"%s",\
			"validRC":"0",\
			"workdir":"/tmp",\
			"timeout":"5"}}' % (nodeName,roleName,dir));

Note, how the ‘DirToMonitor’ is retrieved from the topology and pass to the directory creation and the collector registration.

You will find all information on how to setup the collector in the previous article.

Be careful, for this example I changed the role name from ‘Server’ to ‘stddynfiles’.

PS: In the previous article we hard-coded the creation of the monitored directory in the install.py, so now as the creation is handled by the configure.py, don’t forget to comment the creation in the install.py as follow:

rc = maestro.trace_call(logger, ['chmod', '+x', filesScript])
maestro.check_status(rc, 'Failed to chmod filecounter.sh')
# mkdir for the collector.
#rc = maestro.trace_call(logger, ['mkdir', '/home/idcuser/files'])
#maestro.check_status(rc, 'Failed to create directory /home/idcuser/files')

# download collector script
installerUrl = urlparse.urljoin(maestro.parturl, '../files/collectors/filecollector.json')

 

Final Test:

Now, if you start a pattern designed as the application model above. One server will be deployed.

Then login in the server and add 5 files in the /home/idcuser/files directory.

After few minutes, you will see a warning health on the deployed pattern

and a new server will be automatically deployed.

Once the server is deployed, remove the files from the server and you will see that automatically the pattern will scale-in and a server will be removed.

Conclusion:

On top of being able to write your own collector, you can implement scalability rules based on the collected metrics and this in just few steps.

References:

IWD 3.1 InfoCenter