Subscribe to Kaavo blog via Feedburner
Subscribe to recieve Kaavo blog by email
OpenStack started with the primary objective to provide an open standard cloud computing platform and prevent vendor lock. It has been almost three years since the launch of OpenStack. During this time almost all major companies in the cloud computing space baring Amazon have gotten behind OpenStack. I thought it would be interesting to review how OpenStack is doing in achieving its primary objective.
After working with various OpenStack based clouds we found that not all OpenStack based cloud are created equal and we have to account for the differences to be able to support application deployment and management across various OpenStack based clouds.
To see how standard OpenStack implementations are across various cloud vendors, as an example let’s compare Rackspace Open Cloud and HP Cloud, two of the major OpenStack based clouds. We found following differences.
Functional DifferencesSome of the functional difference between HP Cloud and Rackspace Open Cloud are:
Syntax DifferencesThere are also some annoying syntax differences:
It will be interesting to see in future if various OpenStack implementations will converge to a single standard per OpenStack charter or if they will diverge and each vendor will try to implement their own flavor of OpenStack API to differentiate themselves. Checkout the blog on Cloud APIs - Why all the fuss around Cloud API Standards? for additional information and thoughts.
If you want a standard interface for application and workload management across various OpenStack based clouds, checkout our integration with Rackspace Open Cloud, HP Cloud, and OpenStack Cloud (API version 2.0). In addition to OpenStack we also support serveral other public and private clouds, checkout the list and details for using other supported clouds here.
Use of cloud for DR solutions is becoming more common, even the organizations which are not using cloud for mission critical production applications are moving towards using cloud for application DR.
Business Case for Using Cloud for the DR
Faster Recovery Time Objective (RTO): Typically DR requires lengthy manual processes to fully restore the business applications at the DR site. Having backup data and servers at the DR site is easy, however, restoring the entire application or service takes time. E.g. full application restoration requires starting services in specified order, performing dns and other configuration updates etc. In Cloud, the IaaS APIs provide ability to use automation solutions like Kaavo IMOD to fully restore the business applications automatically without manual intervention. As a result organizations get predictable recovery and reduced RTO. Automating the service or application recovery can reduce RTO to minutes from hours or days.
Shorter Recovery Point Objective (RPO): Instead of relying on offsite tape backups, organizations can reduce their RPO to minutes by maintaining near real-time data backups in the Cloud. For faster transfer of large data dedicated lines can be established between the customer datacenters and the cloud. The cost of the dedicated line depends on the distance of the customer datacenter from the cloud providers’ peering point. For most use cases VPN lines over internet are sufficient for transferring data between customer datacenter and the cloud.
Lower Costs: Typically organizations pay high price for standby infrastructure, especially servers at the DR site. Using cloud there is no need to pay for the servers when they are not in use at the DR site. Pay as you use infrastructure model significantly reduces DR costs without compromising the service levels.
Following are some of the best practices and lessons learned from the Cloud DR solutions we have implemented so far:
Cloud DR is Different than Traditional DRUnlike traditional DR solutions which relies on having a backup infrastructure for the entire datacenter requiring large and costly implementation, Cloud DR can be implemented incrementally application by application. For example it is common for organizations to have a large shared database with multiple schemas supporting various applications. In majority of cases this sharing is driven by server consolidation to increase the utilization of internal infrastructure. Not all applications using a shared database have same service level requirements. Some applications are more critical than others, so as long as schemas and application data is different, it is better to remove the dependency on shared database by having the right size database for each application in the cloud. This allows optimal prioritization and incremental delivery of the DR project based on the service levels of the individual applications.
Migration of Applications Using Single Sign-on with LDAPWhen planning DR for individual applications it is important to identify the dependent services and making sure that the dependent services would be available as a part of the DR solution. Enterprise customers typically use Single Sign-on with LDAP for managing authentication. So best practice is to treat the Single Sign-on Service as the critical application and implement the DR solution for bringing up the Single Sign-on Service first during the DR process. An automation solution like Kaavo IMOD enables customers to restore applications and services in the specified order automatically during DR without any manual intervention. During a real DR scenario there are many things going and it is easy to make mistakes under pressure if the application restoration process is not fully automated. To prevent surprises during actual DR, it is important to have a fully automated solution for restoring applications and services.
Restoring Back to Normal Operations after DRThis is one area which is often overlooked or under planned in DR projects. For companies using their own datacenters for production applications and using cloud for DR, processes and automation must be implemented to fully restore the applications in the customer production datacenter using the latest data from the cloud DR once the primary datacenter is back online. This step is not required for applications which are using cloud as their primary site. E.g. if an application is running in one cloud zone and after DR it is running in a different cloud zone there is no need to restore it back to the first cloud zone as long as service levels for both cloud zones are same. If you are deploying new applications it best to design for failure. E.g. a distributed application running across various regions and cloud providers eliminate the need for traditional DR planning for the application as handling of failure of individual components is built in the design and deployment model of the application.
Handling Compliance in Cloud e.g. HIPAA, PCI, SOX, SAS-70 etc.Using available security technologies and processes several companies have implemented applications in the cloud compliant to various compliance standards, e.g. HIPAA, PCI, SOX, SAS-70 etc. Each compliance standard has its own nuances; basically with proper planning you can address all compliance related issues. This is a big topic on its own so please contact us if you have specific questions about this. Cloud providers have published various case studies and best practices, e.g. white paper by Amazon on HIPAA compliance.
Handling Public and Private DNSA common use case for enterprise applications is to have a public DNS for public access and a private DNS over internal network for accessing the backend services and databases etc. In these situations it is best to use virtual private cloud like AWS VPC or to overlay a private network with the same IP address range as internal datacenter on any public cloud using Open Source solutions (refer to this blog - Building a Private Cloud within a Public Cloud for details on how to implement a secure private network on any public cloud). For updating the public DNS entries for the restored application in the cloud we use DNS automation services like AWS Route 53 or EasyDNS. Leveraging these services, Kaavo IMOD automatically updates the Public DNS for the applications as a part of the restoration during DR.
Keeping Application Database Up-To-DateIt is common for applications to have large databases. Moving the data to the cloud and keeping it current requires first loading the entire database in cloud and then sending and merging incremental data to the database in the cloud. To address this use case instead of maintaining a hot backup we use Kaavo IMOD to automatically bring up the database servers in cloud whenever the new incremental backup is available and merge the incremental backup then save the merged database and shutdown the servers in the cloud. This way in case of DR we always have the latest merged database available for restoring the application. This approach provides reasonable RTO without incurring the additional costs of maintaining a hot database backup.
Applying and Maintaining PatchesA typical application requires following two types of updates during its lifecycle:
Updating Application Code: This is quite easy as using Kaavo IMOD we setup automation to pick up the latest code and configuration for the application from the production deployment. This automation ensures that the application code and configuration changes for the new release of the application or service are available in the cloud for the DR.
OS Patches and Third Party Software Updates: Sometimes custom patches or updates to third party software or OS are required. For these types of changes it is best to include them as a part of change control process requiring sign-off from the team owning the DR process. The DR team can review the change and if required make and test the needed changes to DR automation for the application.
The growing use of Infrastructure as a Service to run custom applications, SaaS, and PaaS offerings is increasing the demand of management solutions to leverage the automation offered by IaaS. One of the biggest advantages of IaaS is the ability to automate tasks which were not possible before. E.g. a crashed server can be restored with an API call by launching a new instance. Effective use of the IaaS APIs can dramatically reduce the time and resources it takes to deploy and provide production support for services and applications.
Various solutions and standards are available for automating the application management tasks which are usually performed manually. Before looking at various solutions and standards, let’s take a look at what is needed to fully automate the deployment and runtime management of applications and business services on IaaS.
Low level orchestration for configuring cloud resources. E.g. launching cloud servers and configuring them via dynamic scripting, performing lifecycle tasks.
High level orchestration for handling complex workflows and long running processes. E.g. ability to bring up an entire multi-tier application stack with a single click, performing dependent workflows spanning across multiple application tiers and distributed components.
Some of the common tools and approaches which handle low level orchestration are solutions like Puppet, Chef, RightScripts along with RightScale Server Templates, etc. In Kaavo IMOD, we handle this type of automation using Actions which dynamically generate scripts or configuration files. The files are generated and copied on the servers just in time using the Velocity Template Engine and the available deployment application metadata. Unlike other approaches, Kaavo Actions don’t require software agents, see my blog on agent vs. agentless cloud management approach.
A very basic use case for High-level orchestration is the ability to bring up an entire complex multi-tier application stack with all dependent resources and services with a single click. Standards and solutions addressing this are Amazon CloudFormation Service, Kaavo IMOD / Kaavo System Definition, OVF, and TOSCA. OVF is one of the early standards which tried to tackle this basic use case using custom images and using startup time delays for ordering the boot sequence for servers (e.g. bring up application server 60 seconds after the database server). OVF standard is from the pre-cloud virtualization days when there was no IaaS and hasn’t kept up with new developments, e.g. there is no concept of dynamic configuration or runtime maintenance workflows for an application. Also, maintaining images for all the dependent servers as a part of the OVF package makes it difficult to apply and maintain patches on a running application. In Kaavo IMOD, we do support OVF by providing the ability to bring up vApps as part of our integration with vCloud Director.
Amazon CloudFormation Service captures the higher level orchestration for bringing up the entire service stack; however, you need to use it with your own custom scripts or external low level orchestration solution like Chef or Puppet to properly configure the resources. Kaavo IMOD handles the high-level orchestration using a workflow engine. Kaavo System Definition captures both high level and low level orchestration information in one XML file and then Kaavo engine uses the information in the System Definition (XML) to automate both the deployment and runtime management of the application or service. For more on comparison of Amazon CloudFormation Service and Kaavo IMOD please refer to my earlier blog on the subject.
Topology and Orchestration Specification for Cloud Applications (TOSCA) is an interesting emerging standard, it is in very early stages and there is no working implementation for TOSCA. Version 1.0 draft of the specification was recently published. At a high-level, TOSCA vision is very similar to what we are doing at Kaavo, i.e. capture the deployment and operational behavior of a service or application in one template to not only achieve automation but also to achieve portability of an application across various clouds. We believe that the biggest benefit of using cloud will come from leveraging cloud to automate the operational processes for deployment and production support of applications.
Interestingly, similar to Kaavo, TOSCA is using XML to capture the deployment and management information for any given application. The major difference between TOSCA and Kaavo System definition is that TOSCA calls for using BPEL (Business Process Execution Language) for specifying the operational behavior of the services. At Kaavo, we allow users to define workflows by sequencing one or multiple actions in response to any event. So for example, if you have to shut-down a database cluster, as a part of the shut-down process you may have to run vendor provided scripts or tools to take a backup of the database before shut-down. You can’t do these types of tasks using BPEL as most of the current software and middleware packages don’t provide web services api for performing this type of low level orchestration work and we have to use the tried and tested command-line tools. We will keep an eye on TOSCA as it evolves and matures. At Kaavo we already have an implementation which is very close to TOSCA vision, to learn more about how we have done it, checkout IBM Developerworks article on Application Centric Cloud Management. Also checkout the guide for Kaavo System Definition.
Sam Charrington recently posted a nice article called The Disintegration of PaaS. In it, he describes how early PaaS providers (PaaS 1.0) locked developers into their stacks and essentially prevented the movement of these apps onto other PaaS platforms. Sam also describes the coming of “PaaS 2.0,” a more open platform that allows different development stack modules for databases, application servers, etc. to be included within your application stack. PaaS 2.0 providers will also cultivate the community of tools that can be used within a deployment stack. The hope is that PaaS 2.0 will level the playing field and focus these providers on what matters most – service quality and customer service.
That takes me to the part that no one is really talking about when it comes to PaaS – the management aspect. Up to this point, we have more often talked about the development side (pre-production). But, PaaS then hosts the application created. How will organizations dictate the SLAs to be in place when the application is live for others to consume (i.e. scale-up, scale-down, etc.)? How will they relay their DR requirements, or moreover how will PaaS providers allow for different RTO/RPO strategies?
In order for PaaS to live up to its hype of becoming mainstream and making it ever easier for consumers to use the cloud, providers need a mechanism for capturing an application’s service level requirements. Building an app and placing it in the cloud is one thing, but if we actually want to allow IT to leverage the cloud there are best practices and specific business policies that need to be communicated in some way. Without this capability, PaaS will only be used for non-mission-critical workloads.
This is where cloud management solutions come in. For IaaS, we have talked about how cloud management solutions have helped to deploy and manage applications and workloads in the clouds. I think without these tools, PaaS players will not be able to provide the second part of their offering. PaaS providers need to leverage these mainstream tools and provide some interface to them.
As I write about this, I am also thinking about how we are further validating the need for an application-centric approach. As PaaS adoption continues it will become more apparent that it is not about provisioning and managing servers or groups of servers in the cloud (infrastructure-centric), but rather the application or service. We need context! Your customers care about consuming your applications and services, and expect a certain level of service; they don’t care about servers. The notion of PaaS is to further hide the IaaS complexity behind the main goal: build and deploy apps. At Kaavo, we have always preached the application-centric approach to our solutions, because we have recognized that at the end of the day it’s all about the apps. We were probably early to talk about this back in 2007, but now I think is the time for this approach to take center stage.
To learn more about the application-centric approach, please go and download our new eBook on the topic.
Automating the deployment and runtime management of large application deployments running on multiple cloud resources across various cloud providers is a challenging problem. Managing remote resources is not a new problem the big difference is that in cloud the resources are added and removed dynamically and at a larger scale. Before cloud, people built management systems using two approaches:
Historically the major trade-off between the agent and agentless approaches has been control vs. rollout time and maintenance costs. You usually get more control with the agent approach compared to the agentless approach. Whereas, the agentless approach is easy to deploy and manage as it doesn’t require rolling out new agents and maintaining their versions.
At Kaavo, for deploying software and performing runtime configuration management we chose to use SSH instead of requiring proprietary agents to be installed on the managed cloud resources. Some of the design considerations for our decision are as follows:
Easier Rollout and Ongoing Maintenance: Using SSH for management gives you the same or more control and security when compared to an agent-based approach, without the overhead of deploying and maintaining proprietary agents on all images across all supported datacenters.
Better Security: Both the agent and agentless (SSH) approaches require communication between the group of manager servers and the servers/resources being managed. As a result, you have to manage the firewall rules on the communication ports (for incoming and outgoing packets) properly on the cloud servers/resources to avoid holes for potential intruders to exploit. However, since the SSH protocol has been around for a while and has been well-tested on a large scale, it is less likely to have any unknown security vulnerability compared to writing your own proprietary agent or protocol. Customers who are very sensitive about security and don't want an application running in the cloud to manage their private cloud infrastructure can choose onsite deployment of Kaavo IMOD.
Greater Control and Flexibility: In an agent-based approach, the agent code that is executed on the servers or the server-side scripts can’t be changed on the fly in case there is any unexpected change in the environment. Whereas in our agentless approach, we generate the configuration scripts and files just-in-time and send them to the servers for execution. This on-demand just-in-time generation of scripts gives us greater flexibility and control in managing cloud resources.
To rollout a scalable agentless approach for cloud management we had to solve some interesting complex problems like managing firewall rules automatically for managed servers in a dynamic cloud environment, handling distributed event queues, and handling execution order dependencies for parallel processes. We solved these hard problems because we didn’t want to compromise and take the easy way out by using configuration agents. If you have any further questions about this or want to learn more about why we choose an agentless approach for cloud management, please contact us.