On the one hand, great. The other hand, one of the main criticisms of Kubernetes...

NathanKP · on July 10, 2023

Yeah I work on the team that builds Amazon Elastic Container Service so I can't help but compare this implementation with how we solved this same problem in ECS.

Inside of an ECS task you can add multiple containers and on each container you can specify two fields: `dependsOn` and `essential`. ECS automatically manages container startup order to respect the dependencies you have specified, and on shutdown it tears things down in reverse order. Instead of having multiple container types with different hardcoded behaviors there is one container type with flexible, configurable behavior. If you want to chain together 4 or 5 containers to start up one by one in a series you can do that. If you want to run two things in parallel and then once both of them have become healthy start a third you can do that. If you want a container to run to completion and then start a second container only if the first container had a zero exit code you can do that. The dependency tree can be as complex or as simple as you want it to be: "init containers" and "sidecar containers" are just nodes on the tree like any other container.

In some places I love the Kubernetes design philosophy of more resource types, but in other aspects I prefer having fewer resource types that are just more configurable on a resource by resource basis.

jauntywundrkind · on July 10, 2023

Your approach sounds a lot like systemd's, with explicit dependencies in units coupling them to each other.

It's pretty cool how one can have a .device or what not that then wants a service- plug in a device & it's service starts. The arbitrary composability enables lots of neat system behaviors.

politelemon · on July 10, 2023

As a consumer, ECS + Fargate is my happy path. I appreciate the lack of complexity. Thanks.

rajamaka · on July 10, 2023

Deploying Fargate with CDK has to have been the most pleasant developer experience I have ever had with any product so far.

If image caching becomes a reality with Fargate I can't imagine a need to ever use anything else

https://github.com/aws/containers-roadmap/issues/696

NathanKP · on July 11, 2023

So I can give some behind the scenes insight on that. I don't think image caching will be a thing in the way people are explicitly asking, but we are exploring some alternative approaches to speeding up container launch that we think will actually be even more effective than what people are asking for.

First of all we want to leverage some of the learnings from AWS Lambda, in specific some of the research we've done that shows that about 75% of container images only contain 5% unique bytes (https://brooker.co.za/blog/2023/05/23/snapshot-loading.html). This makes deduplication incredibly effective, and allows the deployment of a smart cache that holds the 95% of popular recurring files and file chunks from container images, while letting the unique 5% be loaded over the network. There will be outliers of course, but if you base your image off a well used base image then it will already be in the cache. This is partially implemented. You will notice that if you use certain base images your Fargate tasks seems to start a bit faster. (Unfortunately we do not really publish this list or commit to what base images are in the cache at this time).

In another step along this path we are working on SOCI Snapshotter (https://github.com/awslabs/soci-snapshotter) forked off of Stargz Snapshotter. This allows a container image to have an attached index file that actually allows it to start up before all the contents are downloaded, and lazy load in remaining chunks of the image as needed. This takes advantage of another aspect of container images which is that many of them don't actually use all of the bytes in the image anyway.

Over time we want to make these two pieces (deduplication and lazy loading) completely behind the scenes so you just upload your image to Elastic Container Registry and AWS Fargate seems to magically start your image dramatically faster than you could locally if downloading the image from scratch.

0xbadcafebee · on July 10, 2023

Ditto. ECS/Fargate has always been the easiest, most flexible, most useful containerization solution. It's the one AWS service with the most value to containerized services, and the least appreciated.

scarby2 · on July 10, 2023

there was a pretty big feature gulf between it and K8s when it first launched. I found myself wishing i had a number of kubernetes controllers initially (Jobs (with restart policies), Cronjobs, volume management etc).

From what i've head they've made a great many quality of life improvements but as is often the case it can be hard to regain share when you've already lost people.

smarterclayton · on July 10, 2023

In general, the intent here is to leave open room for just that.

dependsOn was proposed during the kep review but deferred. But because init containers and regular containers share the same behavior and shape, and differ only on container restart policy, we are taking a step towards “a tree of container node” without breaking forward or backward compatibility.

Given the success of mapping workloads to k8s, the original design goal was to not take on that complexity originally, and it’s good to see others making the case for bringing that flexibility back in.

perryizgr8 · on July 12, 2023

I've a question that I've been wondering about for a while. Why does ECS impose a 10 container limit on a task? It proves very limiting in some cases and I've to find hacky workarounds like dividing a task into two when it should all have lived and does together.

orf · on July 10, 2023

I like it this way to be honest. We needed to create a custom controller for Dask clusters consisting of a single scheduler, an auto-scaling set of nodes, an ingress and a myriad of secrets, configmaps and other resources.

It wasn’t simple, but with meta controller[1] it was relatively easy to orchestrate the complex state transitions this single logical resource needed and to treat the whole thing as a single unit.

I’m not saying Kubernetes can’t make simple patterns easier, but baking it into core leads to the classic “tragedy of the standard library” problem where it becomes hard to change that implementation. And the k8s ecosystem is definitely all about change.

1. https://metacontroller.github.io/metacontroller/intro.html

theptip · on July 10, 2023

This is all true, and if you read the KEPs they were thinking about this. One camp was advocating for solving the problem of specifying the full dependency graph spec (of which sidecars are one case), another advocating for just solving the most needed case with a sidecar-specific solution to get a solution shipped. The latter was complicated by a desire to at least leave the door open for the former.

Pragmatism won out, thankfully IMO.

Edit to add: see this better description from one of the senior k8s maintainers: https://news.ycombinator.com/item?id=36666359

penciltwirler · on July 10, 2023

There's lots of tools built on top of K8s to accomplish this tho. For example, Argo, Tekton, Flyte etc.

jauntywundrkind · on July 10, 2023

Absolutely, no shortage of things atop. Helm is probably the most well used composition tool.

It seems unideal to me to forever bunt on this topic, leaving it out of core forever. Especially when we are slowly adding im very specialized composition orchestration tools in core.

birdyrooster · on July 10, 2023

Requiring the user to write their own operators to manage state using the kubernetes api is very much a feature and not something which is missing.

smarterclayton · on July 10, 2023

Agreed that is a feature and not a bug.

But! The one thing that custom orchestrators can’t do is easily get the benefit of kubelet isolation of containers and resource management. Part of slowly moving down this path is to allow those orchestrators to get isolation from the node without having to reimplement that isolation. But it will take some time.

birdyrooster · on July 11, 2023

Oh I see orchestrating runtimes is quite different. Good points!

numbsafari · on July 10, 2023

Helm really solves a different use case than this.

This is about describing the desired coordination among running containers. Helm is about how you template or generate your declarative state. You could certainly add this description to your templates with Helm, but you couldn't actually implement this feature with Helm itself.

jauntywundrkind · on July 10, 2023

I bundled both composition & orchestration under the same header.

It so happens that pods have multiple containers, which is another example of Kubernetes having a specialized specific composition or orchestration implementation. One that started as composition, and here iterates towards orchestration.

hosh · on July 10, 2023

Compositions of blocks of state may not end up with a more reliable software. Each of state management are controlled by independent processes that may interact with each other (example: horizontal pod autoscalers are not directly aware of cluster-autoscaler). The whole system is more like an ecology or a complex adaptive system than it is something you can reason directly with abstractions.

In the Cynefin framework (https://en.wikipedia.org/wiki/Cynefin_framework), you can reason through "complicated" domains the way you are suggesting, but it will not work when working in the "complex" domain. And I think what Kubernetes help manage is in "complex" not "complicated" domain.

0xbadcafebee · on July 10, 2023

Orchestration of k8s wouldn't be necessary if they had made K8s' operation immutable. As it stands now you just throw some random YAML at it and hope for the best. When that stops working, you can't just revert back to the old working version, you have to start throwing more crap at it and running various operations to "fix" the state. So you end up with all these tools that are effectively configuration management tools to continuously "fix" the cluster back to where you want it.

I hope the irony is lost on no one that this is an orchestration tool for an immutable technology, and the orchestrator isn't immutable.

ed_mercer · on July 11, 2023

You can use gitops (eg fluxcd) to revert to previous cluster states.

0xbadcafebee · on July 11, 2023

If you wanted to do the opposite of what I'm saying, sure