What Comes After the Cloud?

There's nothing wrong with the cloud, per se, but there's certainly enough to be unhappy about that might make someone contemplate the next move for their architecture.

What Comes After the Cloud?
Photo by Sharosh Rajasekher / Unsplash

Prognosticating on the future of things, especially technology, and even more especially in the cloud/devops/infra/SRE world, is fraught with dangers seen and unseen. At best you look like a wizard, a modern day Nostradamus able to predict the rather fickle winds of change – at worst you come out looking like Rasputin after the whole being made "un-alive" ordeal. But at the risk of losing my street cred I would like to look forward and see what the future of computing might look like. Why don't you come with me on this magical journey...

What's Wrong With the Cloud Anyway?

There's nothing wrong with the cloud, per se, but there's certainly enough to be unhappy about that might make someone contemplate the next move for their architecture.

Cost

This is probably the most egregious issue with public cloud platforms like AWS, Google Cloud, Azure, and others over a traditional datacenter approach. The cost associated with operating a fully baked cloud environment can be astronomical, especially if your requirements include any "advanced" computing, like AI/ML or literally anything that touches a GPU, bandwidth-intensive applications like video streaming or content delivery, or anything that involves storing large volumes of data. Instances that have access to GPU are crazy expensive, egress bandwidth is outrageous, and paying for storage on a per-month basis is not cost-effective at all when compared to a solution where storage is purchased once and used until it dies. The same can be said for compute capacity, too.

Complexity

Have you ever taken a gander at the AWS "Well-Architected" documentation? Have you ever Googled how to do a seemingly simple thing only to be met with a spaghetti graph of AWS services, glue code, and a potential price tag that would shock an Emirati oil baron? Yeah, me too, and the complexity of it all is quite bonkers. But more than the surface-level complexity of simply using cloud-provider services, there's a complexity of underlying systems that seems to go ignored when trying to figure out how stable an environment is or will be. If we're being honest we all know why it's a bad idea to run a single-region application in AWS' us-east-1 region: because that region tends to go down more often than the others. The systems that underpin public cloud offerings are large, complex, and require an extraordinary amount of manpower to maintain, yet we treat these abstracted services as discrete physical things: irreducible units of compute, network, and storage. But, like old models of atomic structure, these units are not irreducible, merely collections of resources from a larger pool of resources that have been labeled and provisioned for the user. This means that as you use multiple resources in your application stack you are multiplying uncertainty – not decreasing it – and only increasing the likelihood that your application, service, or whatever will have some kind of downtime event due to underlying infrastructure issues.

Security

When speaking about complexity it's only natural to flow into a discussion about security. After all, the more complex and environment then the harder it is to properly secure that environment. Not only that, the very nature of public cloud environments ensures that they will continue to be targeted by bad actors looking to gain access to customer environments and data. As long as governments, financial institutions, schools, Fortune 500 companies, and other criminal organizations use providers like AWS, any customer that migrates to those providers is opening themselves up to a large and scary world of security nightmares. As we saw in the Capital One hack a few years ago, all it takes is one motivated person and a leaked IAM Role ARN to compromise one of the largest banks in the United States – if it was that easy for them, what other more complicated attack vectors don't we know about?

What The Future May Hold

Looking at the trends of modern computing, the software that we're writing and producing, and the kinds of problems that have yet to be solved, it seems that cloud computing might be here to stay as the dominant force, but I don't think that will be the case. There are a few key technologies that are emerging that I think will play an outsized role in the seismic shift to come.

ARM

Don't get me wrong, I think Apple really dropped the ball on the switch to ARM-based processors. Apple's historic disregard for enterprise customers and smug smugness when it comes to dealing with software vendors has made the rollout of the Apple Silicon chips in their laptops and desktops frustrating at best. Launching a brand new architecture without support for basic development tools like Docker was a gross oversight, but the move to ARM in general was not a mistake.

As the industry as a whole has shifted towards adopting open standards, the hardware game has been pitifully behind in this adoption of open-ness. Intel, and by proxy AMD, were free to do whatever they wanted and held software vendors captive because of it. But, with the rise in popularity and power of ARM-based processors, and with ARM being an open standard, the time is right for the general shift of most computing devices from x86 to ARM. Not only are ARM processors generally less power-hungry and provide better performance-per-pound than x86-based ones, but they're simpler to program for because of their reduced instruction sets. Simplicity means speed and stability. And because ARM is based on an open standard, secondary effects like more stable virtualization should find their way into new devices too. With macOS being primarily ARM-based now, Windows 10+ being able to support ARM natively, and Linux making its transition, the time is right for ARM to become the de-facto processor architecture.

Blockchain

No, it's not what you think. I haven't been abducted and replaced with some weird crypto-bro slovenly slurring about HODL DOGE or some other nonsense. I have however been paying attention to how this stuff works, and there's one bit of tech at the heart of blockchain that I think is ready to explode into non-blockchain-based solutions: DHT.

Distributed hash tables, or DHT, in the context of a distributed application, is a method for discovering, cataloging, and communicating with peers on a network. By discovering and maintaining a list of all its peers, an application can easily share information across a large network very quickly. While this idea might sound similar to RAFT, unlike RAFT there is no requirement for a "leader" in the network, enabling developers to write "masterless" distributed applications. These applications can be as simple as using DHT to share some state between them, or as complex as a fully distributed, masterless database. How do I know that a fully distributed, DHT-based key-value database would work? Well, I wrote one, and it works amazingly. Each node broadcasts every change to the rest of the network so that every node gets every update and they all stay in sync. New nodes download copies of the database from their nearest peer and join the update network once they're consistent with their peers. The database I wrote – sorry the source isn't available just yet – uses the Kademlia module from the Etherium Golang library, but there's nothing to say that there can't be a standalone DHT-based clustering protocol. And, if all of this is sounding somewhat familiar to you, it's because it's essentially the same tech that P2P file sharing and torrenting are built on.

4th Generation Container Orchestration

I believe that containers are the future and will be the fundamental building blocks for platforms and services moving forward. Serverless has its place and serverless functionality can be built on top of container solutions, but I think the container is the best application-level packaging medium available to us. However, I don't think it will happen without another evolution of container orchestration technologies.

First-generation orchestration focused on local machines only, and there was really only one solution: Docker. While Docker by itself was immensely powerful and enabled developers to write and deploy software that was portable, and changed the face of application development in doing so, it really only represented the first step in the evolution of containers.

Second-generation orchestration came along in the form of tools like Docker Swarm and Apache Mesos. Both are powerful tools and enabled organizations to take great strides in containerizing their production workloads, they had some real fundamental shortcomings or lack of features that still make running them in production somewhat risky. Swarm and Mesos are schedulers only and don't provide any real services beyond that; networking, storage, routing, provisioning, etc. were all left up to external tools. Moreover, Mesos required a running Apache Zookeeper cluster in order to properly track and orchestrate containers, requiring yet another layer of complexity and potential issues, making it riskier and more complicated to deploy properly. As such, Mesos only saw mild adoption across organizations.

Sitting somewhere between 2nd and 3rd generations – like a 2.5th-gen – is Hashicorp Nomad. While it doesn't natively manage storage or routing, it at least has the capability to extend and understand providers for those capabilities. However, like Mesos, it requires a Consul cluster to function properly and suffers from issues of scaling.

Third-generation orchestrators are a category of one: Kubernetes. Kubernetes, or k8s, has dominated the orchestration landscape for a few years now and will continue to play a pivotal role in the container ecosystem. However, I don't believe that k8s is the be-all/end-all solution to orchestration. It suffers from the same shortcomings that a lot of Google-led products do: lots of YAML, lots of choices, and no opinions. Kubernetes is great in that it allows the freedom to do almost anything you could ever want with it, but it's that same freedom that I think is its downfall. All of the options with no opinions on how to use them is a recipe for overly complex, opaque, frustrating software. This is no more apparent than when looking at Kubernetes manifests – configuration files – and trying to piece together what it is they're trying to do. Terminology is esoteric at times, important options can or must be defined in multiple places, and often it takes multiple configuration files to do a single simple deployment.

This is where I think a fourth-generation orchestrator has an opportunity to really shine. A platform that is powerful for the power-users to get what they need out of it, but that provides a "golden path" for the majority of its users. One that leverages the cost savings of ARM-based computers, that can pool resources across a wide array of hardware and intelligently deploy containers or functions to nodes based on multiple strategies, that simplifies its configuration like Nomad but is powerful enough to create advanced configurations without much trouble. An orchestrator that doesn't rely on cluster leaders or election voting because it's using DHT as its network backbone. Additionally, a 4th-generation orchestrator needs to have the ability to understand and communicate with other orchestrator clusters – to be globally aware – and possibly remotely configure and route traffic between them. That Kubernetes is not natively able to manage multiple clusters of orchestrators seems like an oversight a14nd fits right into the vision that Google had for it.

So What Comes Next?

As the pendulum swings from on-prem to colo to cloud I think the next ten years will be defined by a rationalized look at the cloud age with an emphasis put more on smaller distributed computing cells. Not necessarily true "edge" computing at its core, but a scheme where instead of having 3 regions in AWS to serve your US traffic organizations pivot to having 6, 7, or 8 compute cells in strategic cities, putting the applications, data, and compute power closer to the customer without losing economies of scale. And I think all of this will be powered by ARM-based servers – they're quieter, cheaper, based on open standards, and can be made smaller due to reduced cooling requirements. This will allow companies of a certain size to focus more on rapid innovation, customer experience, and flexibility without the oversized price tag of a public cloud footprint.


So what do you think? Am I close or way off? Let me hear your opinion in the comments below!