As you may know, at Spreaker we are huge fans of extreme automation. Actually, it goes beyond that: automation is essential to our existence. It’s the only way we can handle our operations efficiently while continuing to deliver features and services to our users, even with a small engineering team. This was one of the big reasons we had to be early cloud adopters, moving our entire infrastructure to AWS early on.
The current state
Spreaker is composed of a universe of services. Each one of these services has a specific job and is critical to delivering the best possible experience to our users. As I mentioned, we currently host all our services (frontend, APIs, databases and a number of workers) on AWS. This allows us to keep a competitive advantage by leveraging what AWS provides: no big upfront commitments, dynamic scaling to accommodate load spikes, load balancing to spread traffic across multiple instances, and advanced monitoring, just to name a few.
This has served our purposes well so far, yet we still have a number of pain-points that need to be addressed in order to accommodate business growth and everything that comes with it, like the growing traffic and user-base.
What could be improved
It only takes a few minutes to scale, which is great when your traffic grows slowly and organically. However that speed still doesn’t allow you to efficiently handle sudden and huge spikes that will cause your machines to suffer while new servers get booted and provisioned.
With a single server acting as the infrastructural atomic unit (where each server delivers a single service), we might find ourselves in the situation where we have to scale a cluster of machines while, at the same time, others don’t do anything. A more practical example: we might need to scale frontend machines while workers are doing little-to-nothing. This leads to unnecessary costs.
Lower maintenance effort
Automation comes at a cost. Automating releases, processes, and infrastructure provisioning requires you to write lot of code that requires maintenance, and it needs to be updated over time.
Due to the way we currently handle updates on our systems, upgrading a single package on provisioning tools like Puppet will result in upgrades being propagated on our entire infrastructure, influencing multiple services. More often than not, this is less than ideal as it might trigger unwanted side-effects.
Docker and Kubernetes
In case you didn’t know, Docker is a containerization technology that allows you to run your applications and services in perfect isolation. While containers are nothing new, Docker provided, for the first time, an easy-to-use interface to implement them. Docker containers are very powerful as they are perfectly deterministic, ephemeral, and light- weight. This means it only takes a few seconds to spawn them and add additional horse-power to deliver a specific service, as well as easily kill them when the job is done.
Unfortunately they are only half of the equation, as with container-based infrastructures you have to be ready to face a whole new level of complexity due to their highly dynamic nature. This is where Kubernetes comes into play.
Kubernetes is an open-source orchestration framework for containers coming from Google after years of experience running services in their data centers. Kubernetes aims to provide a number of very important features like:
- Self-healing services, meaning that if a container fails, Kubernetes is able to detect the error, kill the container and spawn a new one from a fresh state
- Resilient Services as they are spread across all your machines
- Lightening-fast upscaling and downscaling, thanks to the nature of containers where spawning new instances of a specific service takes mere seconds, as well as live for only the minimum required time and then be killed
- Efficient physical resources usage. Kubernetes is in fact abstracting away the role of a specific server, meaning that every machine becomes capable of running any service
Podcasting at scale
These are some of the reasons it was worth investing in containers. As always, technology itself is useless if it doesn’t serve a purpose. In our specific case, the main goal is to become able to provide the best possible experience to our users with an infrastructure intelligent and scalable enough to accommodate every possible scenario, and actively support business growth.
There are a number of challenges we are sorting before rolling the next generation infrastructure to production. Kubernetes is incredibly powerful but not trivial, meaning that in order to integrate it with an existing infrastructure, you’re required to sort advanced topics like autoscaling policies, custom logging, services monitoring and cluster provisioning. We are extremely excited for what we are preparing, and we are confident this is going to positively impact the overall quality of our services.
We are working really hard to migrate our entire infrastructure to Docker and Kubernetes. This will allow us to be more reactive to quick traffic changes while optimizing overall efficiency as well, even as we pay the price in complexity (Kubernetes is a pretty hard beast to master). We’ll keep you posted with more in-depth articles on the challenges we’re facing soon!