We all live in the clouds, or at least want to. Some of us live in many clouds, or at least want to. Cloud life has never been easier, but the reality is most folks still struggle to design, build, manage, monitor, secure, troubleshoot and optimize their cloud operations. How to make everyone’s lives easier?
Today’s cloud tooling and operations are scattered, across the clouds, through the stack, across various domains & features, and all along the spectrum from traditional IT to DevOps.
There are dozens of cloud ops tools,
and you often need dozens to get stuff done.
We need a better way, via the proverbial single-pane-of-glass unified system that brings together all the world’s key clouds, all the life cycles phases, all through the stack, across all the features and uses cases in use today.
What would such a system look like?
Unified cloud operations means just that, unified, first bringing together all the clouds, and then bringing together all the key ops tools & features for all.
Much like ERP systems like SAP unified diverse business operations, Integrated Cloud Ops tools are needed to unify all the clouds, systems, and services into a manageable environment most companies can use.
The first requirement is working with a variety of clouds, both public and private, plus pseudo-clouds like Kubernetes. So this means handling AWS, Azure, and Google plus VMWare and OpenStack, along with global or Chinese clouds including AliCloud, Tencent, and Huawei.
Clouds look all the same …
in reality they’re really different, in ways big & small.
Engineers should be able to design cloud infrastructure and services in three ways. In a perfect world, one would be best, but real systems need a mix of methods to suit various teams, systems, and situations:
- Visual Designer — See what you are building, especially how it relates to other components, with property editors, validators, wizards, etc.
- Component Catalogs — Use pre-designed and approved components to build up more complex systems with confidence and governance.
- Infrastructure-as-Code — Automation-driven infrastructure using Terraform or similar, via git, CI/CD, etc. with a DevOps focus.
Tools should also be able to reverse-engineer existing systems, as not every project is a shiny new system, and many things need to be taken over and managed or updated as they are. This is quite challenging as existing systems are often very diverse, undocumented, and have evolved in a myriad of ways over many years.
Once designed, tools should provision BOTH the cloud resources and the Full Stack software services such as Nginx, Tomcat, MySQL, etc. as needed. This means using API or tools like Terraform, plus Ansible, git, configs, etc. to handle the myriad of services required.
Once a system is built, or reverse-engineered, the tools need to be able to change or update it, at all levels, from changing cloud VMs of services, to updating nginx vhost SSL options.
Clouds are dynamic and ever-changing, so tools need to keep up with them. New VMs, services, and containers pop-up and disappear continually and thus must be tracked, monitored, secured, etc. alongside their longer-lived brethren.
CMDB & Dependencies
Modern tools need to incorporate all of the cloud infrastructure and running services into the CMDB, including deep & detailed parsing of every config of everything, from VMs to disks to RDS to OS, web servers, JVMs, and more.
In addition, dependencies must be auto-discovered, so users know what is talking to, or dependent on, what. This is especially true in today’s world of micro-services and containers, where dozens of pieces & parts interact in complex and every-changing ways.
Change in modern systems is constant, so users need to be able to see and change things, from basic start/stop VMs to dealing with disks, networks, security groups, services, users, and more.
This can be done in a structured way through the Design & Build processes, but also more ad hoc through a unified cloud console that can do all of the most useful tasks in a single interface.
Cloud monitoring is a core need for every user, usually through the cloud’s service such as CloudWatch, providing metrics along various dimensions and limited alerting.
However, many cloud services such as RDS also need monitoring via direct connections, to get vital data not available through CloudWatch.
Being full-stack, unified tools need to also directly monitor the OS, web servers, databases, app code, containers, and more. This is used to build a full-stack, full-picture of the system, plus build dependency trees used for diagrams, alert grouping, and troubleshooting.
Clouds and their services emit an astonishing array of logs, often in different formats and via different methods, channels, and endpoints.
Tools need to collect, visualize, search, and analyze the logs, from core cloud functions like CloudTrail and cloud automation, to services like Load Balancers and RDS, to container logs and serverless logs from things like Lambda.
Being full-stack, unified tools also need to capture service logs such as from the OS, web servers, databases, app code, containers, and more. All of these are also in different formats, some standardized, but many dynamic ones, too.
Govern & Secure
Cloud Governance is a key aspect of larger and more dynamic systems, to ensure that things get built and changed in safe, secure, reliable, and approved ways.
This means applying sets of best practice and customized policies to the entire infrastructure and OS/service base, across all the clouds. Users can then get dashboards, summaries, and actionable alerts about their entire system.
Good governance & security also includes procedural, manual, and automated healing, such as closing wide-open security groups or not allowing VMs with unencrypted drives.
Clouds help save money, but can waste a lot of it, too. Tools need to retrieve detailed billing data, ideally tagged, and provide analyses, alerts, budgets, and recommendations for things like Reserved Instances.
Cost control also includes evaluating under-utilized capacity and supporting enterprise chargebacks via tagged cost & resource allocations.
Clouds are complex beasts, and lots of folks keep many of them around. Thus they need unified tools to tame all the diversity & complexity of dynamic multi-cloud environments.
Common cloud operations and management platforms provide a way to begin to tame these complexities, to streamline user interfaces and training, and to focus on delivering systems for users, rather than fighting in the clouds.