After 15 months & more than 100 million requests served by our Phoenix + PostgreSQL app running on Fly.io, I would be hard pressed to find a reason to complain.
- Some deploys failed, and re-running the pipeline fixed it.
- Early July 2023, 9k requests from Frankfurt returned 503s. Issue lasted 10 seconds.
- While experimenting with machines, after many creations & deletions, one volume could not be deleted. Next day, the volume was gone.
That's about it after 15 months of running production workloads on Fly.io.
I'm sorry to hear that many of you didn't have the best experience. I know that things will continue improving at Fly.io. My hope is that one day, all these hard times will make for great stories. This gives me hope: https://community.fly.io/t/reliability-its-not-great/11253
changelog.com used to be WordPress, then became a Phoenix app because it needed features that were hacky to implement & then manage in WP. It's more of a podcasting platform these days rather than a CMS.
My first Supermicro just turned 9 and it's still running strong, with a fresh install of Ubuntu 20.04 & k3s over the holidays. The second Supermicro turned 5, and has been running FreeBSD all this time like a champ. They are both loft guardians.
A bunch of bare metal hosts run on Scaleway / Online, and different VMs & managed services run in Digital Ocean, Linode, AWS & GCP. I sometimes spin the odd bare metal instance on Equinix Metal (former Packet).
A diverse fleet means that there's always something new to learn and try out. A single large host would make me anxious, as no internet provider or power grid is 100% reliable and available. Also, software upgrades sometimes fail, and things get messed up all the time, which is when I find it most efficient to just start from scratch. A single host makes that less convenient.
Every approach has its pros and cons, which is why my main workstation is a 20 Xeon W with 64GB RAM & 1TB NVME : ). Yes, there is a backup workstation which doubles up as a mobile one meaning that it can work without power or hard internet for almost a day. Options are good ; )
> Does this imply there is a cloud abstract layer that should come
crossplane.io comes closest afaik
> And is k8s the simplest possible abstraction? And if not - what is?
If you are asking about the simplest possible abstraction for container scheduling and orchestration, then I believe Nomad from HashiCorp or Docker Swarm are simpler. As for managed solutions with wide adoption in all types of environments and the largest investment to date, I am not aware of anything on par with K8S.
We are both! I would also add lazy to that paradox. My surname is a letter off, and that's at close as it gets : )
The devil is in the details, there is more to it than dynamic & static content, we are using Fastly, otherwise we couldn't serve all the traffic that we do.
The best part is that it's all public - https://github.com/thechangelog/changelog.com - and we welcome contributions, especially those that simplify our setup without compromising on resiliency and availability. I'm looking forward to yours ; )
K8S is an API that the majority is agreeing on, which is rare. There is a lot of amazing tooling, a staggering amount of ongoing innovation, all built on solid concepts: declarative models, emitted metrics (the /proc equivalent, but with larger scope) and versioned infrastructure as data (a.k.a. GitOps).
For someone that is known as the King of Bash (self-proclaimed) - https://speakerdeck.com/gerhardlazu/how-to-write-good-bash-c... - and after a decade of Puppet, Chef, Ansible and oh wow that sweet bash https://github.com/gerhard/deliver - even if all my workstations and work servers (yup, all running k3s) are provisioned with Make (bash++), I still think that K8S is the better approach to running production infrastructure. The advantage to using simple and well-defined components (e.g. external-dns, ingress-nginx, prometheus-operator etc.) that adhere to a universal API, and are maintained by many smart people all around the world, is a better proposition than scripting in my opinion.
At the end of the day, I'm in it for the shared mindset, great conversations and a genuine desire to do better, which I have not seen before K8S & the wider CNCF. I will go on a limb here and assume that I love scripting just as much as you do, but go beyond this aspect and you will discover that it's more to it than "thin install scripts that deploy containers" (which are not just glorified jails or unikernels).
I thi k you've hit your head on the nail - the point is not just the kubernetes, it's that you can build standard infrastructure on top. Any software can be (in theory) setup with a helm script, configured in a standard way through YAML configmaps rather than some esoteric configfiles or scripts which are diffetent for every piece of software
The primary reason behind the move was not wanting to manage CI. Since there were no options for a managed Concourse in 2018, we migrated to Circle, one of the Changelog sponsors at the time.
Concourse worked well for us, we didn't have any issues that were being enough to remember. You may be interested in this screenshot that captured the changelog.com pipeline from 2017: https://pipeline.gerhard.io/images/small-oss.png
I missed the simple Concourse pipeline view at first, but CircleCI improved by leaps and bounds in 2020, and the new Circle pipeline view equivalent is even better (compared to Concourse, clicking on jobs always works): https://app.circleci.com/pipelines/github/thechangelog/chang...
The Circle feature which I didn't expect to like as much as I do today, is the dashboard view (list of all pipeline/workflow runs). This is something that Concourse is still missing: https://app.circleci.com/pipelines/github/thechangelog
In 2021, I expect us to spend one migration credit on GitHub Actions, as a Circle replacement. Argo comes as a second close, but that requires an innovation credit which is more precious to us. Because we are already using GitHub Actions for some automation, it would make sense to consolidate, and also leverage the GitHub Container Registry, as a migration from Docker Hub. Watch https://github.com/thechangelog/changelog.com to see what happens : )
Yes, it does make sense to move static files to object storage, especially the mp3s. There is some ffmpeg-related refactoring that we need to do before we can do this though, and it's not a quick & easy task, so we have been deferring it since it's not that high priority, and there are simpler solutions to this particular problem (i.e. improved CDN caching).
Other static files such as css, js, txt make sense to remain bundled with the app image, which is stateless and a prime candidate for horizontal scaling. Also, CDN caching makes small static files that change infrequently a non issue, regardless of their origin.
The managed Postgres service from Linode's 2021 roadmap is definitely something that we are looking forward to, but the simplest thing might be to provision Postgres with local volumes instead. We are already using a replicated Postgres via the Crunchy PostgreSQL Operator, so I'm looking forward to trying this approach out first.
CockroachDB is on my list of cool tech to experiment with, but that will use an innovation token, and we only have a few left for 2021, so I don't want to spend them all at once.
Yea the small static files that are part of your webapp can stay with it, but media files are best on S3. If you need a block interface though, I recommend something like ObjectiveFS: https://objectivefs.com/
If you're using an operator then local volumes is a good middleground if it automates the replication already. CockroachDB also has a kubernetes operator although it's only for GKE currently. There are also other options like YugabyteDB which is another cloud-native postgres-compatible DB.
This is what the GHA workflow currently looks like: https://github.com/thechangelog/changelog.com/blob/c7b8a57b2...
FWIW, you can see how everything fits together in this architecture diagram: https://github.com/thechangelog/changelog.com/blob/master/IN...