Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

On the one hand, great.

The other hand, one of the main criticisms of Kubernetes is that it has no composition or orchestration capabilities. It's great about defining pieces of state, but managing blocks of state & multiple things at once is left almost entirely to external tools.

The ability to compose &sequence multiple containers feels like a very specific example of a much broader general capability. There's bedevilling infinite complexity to trying to figure out a fully expressive state of state management system - I get why refining a couple specialized existing capabilities is the way - but it does make me a little sad to see a lack of appetite for the broader crosscutting system problem at the root here.



Yeah I work on the team that builds Amazon Elastic Container Service so I can't help but compare this implementation with how we solved this same problem in ECS.

Inside of an ECS task you can add multiple containers and on each container you can specify two fields: `dependsOn` and `essential`. ECS automatically manages container startup order to respect the dependencies you have specified, and on shutdown it tears things down in reverse order. Instead of having multiple container types with different hardcoded behaviors there is one container type with flexible, configurable behavior. If you want to chain together 4 or 5 containers to start up one by one in a series you can do that. If you want to run two things in parallel and then once both of them have become healthy start a third you can do that. If you want a container to run to completion and then start a second container only if the first container had a zero exit code you can do that. The dependency tree can be as complex or as simple as you want it to be: "init containers" and "sidecar containers" are just nodes on the tree like any other container.

In some places I love the Kubernetes design philosophy of more resource types, but in other aspects I prefer having fewer resource types that are just more configurable on a resource by resource basis.


Your approach sounds a lot like systemd's, with explicit dependencies in units coupling them to each other.

It's pretty cool how one can have a .device or what not that then wants a service- plug in a device & it's service starts. The arbitrary composability enables lots of neat system behaviors.


As a consumer, ECS + Fargate is my happy path. I appreciate the lack of complexity. Thanks.


Deploying Fargate with CDK has to have been the most pleasant developer experience I have ever had with any product so far.

If image caching becomes a reality with Fargate I can't imagine a need to ever use anything else

https://github.com/aws/containers-roadmap/issues/696


So I can give some behind the scenes insight on that. I don't think image caching will be a thing in the way people are explicitly asking, but we are exploring some alternative approaches to speeding up container launch that we think will actually be even more effective than what people are asking for.

First of all we want to leverage some of the learnings from AWS Lambda, in specific some of the research we've done that shows that about 75% of container images only contain 5% unique bytes (https://brooker.co.za/blog/2023/05/23/snapshot-loading.html). This makes deduplication incredibly effective, and allows the deployment of a smart cache that holds the 95% of popular recurring files and file chunks from container images, while letting the unique 5% be loaded over the network. There will be outliers of course, but if you base your image off a well used base image then it will already be in the cache. This is partially implemented. You will notice that if you use certain base images your Fargate tasks seems to start a bit faster. (Unfortunately we do not really publish this list or commit to what base images are in the cache at this time).

In another step along this path we are working on SOCI Snapshotter (https://github.com/awslabs/soci-snapshotter) forked off of Stargz Snapshotter. This allows a container image to have an attached index file that actually allows it to start up before all the contents are downloaded, and lazy load in remaining chunks of the image as needed. This takes advantage of another aspect of container images which is that many of them don't actually use all of the bytes in the image anyway.

Over time we want to make these two pieces (deduplication and lazy loading) completely behind the scenes so you just upload your image to Elastic Container Registry and AWS Fargate seems to magically start your image dramatically faster than you could locally if downloading the image from scratch.


Ditto. ECS/Fargate has always been the easiest, most flexible, most useful containerization solution. It's the one AWS service with the most value to containerized services, and the least appreciated.


there was a pretty big feature gulf between it and K8s when it first launched. I found myself wishing i had a number of kubernetes controllers initially (Jobs (with restart policies), Cronjobs, volume management etc).

From what i've head they've made a great many quality of life improvements but as is often the case it can be hard to regain share when you've already lost people.


In general, the intent here is to leave open room for just that.

dependsOn was proposed during the kep review but deferred. But because init containers and regular containers share the same behavior and shape, and differ only on container restart policy, we are taking a step towards “a tree of container node” without breaking forward or backward compatibility.

Given the success of mapping workloads to k8s, the original design goal was to not take on that complexity originally, and it’s good to see others making the case for bringing that flexibility back in.


I've a question that I've been wondering about for a while. Why does ECS impose a 10 container limit on a task? It proves very limiting in some cases and I've to find hacky workarounds like dividing a task into two when it should all have lived and does together.


I like it this way to be honest. We needed to create a custom controller for Dask clusters consisting of a single scheduler, an auto-scaling set of nodes, an ingress and a myriad of secrets, configmaps and other resources.

It wasn’t simple, but with meta controller[1] it was relatively easy to orchestrate the complex state transitions this single logical resource needed and to treat the whole thing as a single unit.

I’m not saying Kubernetes can’t make simple patterns easier, but baking it into core leads to the classic “tragedy of the standard library” problem where it becomes hard to change that implementation. And the k8s ecosystem is definitely all about change.

1. https://metacontroller.github.io/metacontroller/intro.html


This is all true, and if you read the KEPs they were thinking about this. One camp was advocating for solving the problem of specifying the full dependency graph spec (of which sidecars are one case), another advocating for just solving the most needed case with a sidecar-specific solution to get a solution shipped. The latter was complicated by a desire to at least leave the door open for the former.

Pragmatism won out, thankfully IMO.

Edit to add: see this better description from one of the senior k8s maintainers: https://news.ycombinator.com/item?id=36666359


There's lots of tools built on top of K8s to accomplish this tho. For example, Argo, Tekton, Flyte etc.


Absolutely, no shortage of things atop. Helm is probably the most well used composition tool.

It seems unideal to me to forever bunt on this topic, leaving it out of core forever. Especially when we are slowly adding im very specialized composition orchestration tools in core.


Requiring the user to write their own operators to manage state using the kubernetes api is very much a feature and not something which is missing.


Agreed that is a feature and not a bug.

But! The one thing that custom orchestrators can’t do is easily get the benefit of kubelet isolation of containers and resource management. Part of slowly moving down this path is to allow those orchestrators to get isolation from the node without having to reimplement that isolation. But it will take some time.


Oh I see orchestrating runtimes is quite different. Good points!


Helm really solves a different use case than this.

This is about describing the desired coordination among running containers. Helm is about how you template or generate your declarative state. You could certainly add this description to your templates with Helm, but you couldn't actually implement this feature with Helm itself.


I bundled both composition & orchestration under the same header.

It so happens that pods have multiple containers, which is another example of Kubernetes having a specialized specific composition or orchestration implementation. One that started as composition, and here iterates towards orchestration.


Compositions of blocks of state may not end up with a more reliable software. Each of state management are controlled by independent processes that may interact with each other (example: horizontal pod autoscalers are not directly aware of cluster-autoscaler). The whole system is more like an ecology or a complex adaptive system than it is something you can reason directly with abstractions.

In the Cynefin framework (https://en.wikipedia.org/wiki/Cynefin_framework), you can reason through "complicated" domains the way you are suggesting, but it will not work when working in the "complex" domain. And I think what Kubernetes help manage is in "complex" not "complicated" domain.


Orchestration of k8s wouldn't be necessary if they had made K8s' operation immutable. As it stands now you just throw some random YAML at it and hope for the best. When that stops working, you can't just revert back to the old working version, you have to start throwing more crap at it and running various operations to "fix" the state. So you end up with all these tools that are effectively configuration management tools to continuously "fix" the cluster back to where you want it.

I hope the irony is lost on no one that this is an orchestration tool for an immutable technology, and the orchestrator isn't immutable.


You can use gitops (eg fluxcd) to revert to previous cluster states.


If you wanted to do the opposite of what I'm saying, sure




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: