Monitoring Elixir Apps on Fly.io with Prometheus and PromEx

Bayart · on July 1, 2021

I've got no real use case for Fly.io at the moment, but ever since their big thread the other day I went through their blog posts and I've got to say they're pretty much all fantastic as far as delivering information I'm interested in. They seem to have a really good team. Having toyed with the idea of deploying some of my endless TO-DO-LIST of tiny projects in Elixir and wondering what monitoring would look like, PromEx just happens to scratch an itch.

mrkurt · on July 1, 2021

If nothing else, we hope to make Fly.io a great place to deploy Elixir side projects. :)

tiffanyh · on July 1, 2021

Maybe I'm misreading into your comment to mean that you're making it easier to deploy Elixir at fly.io, and please don't take this the wrong way, but after skimming the docs - I'm not understanding how exactly hosting Elixir/Phoenix at fly.io is any easier.

Would you mind elaborating. I'm definitely looking for a PasS for Elixir/Phoenix + Postgres.

mrkurt · on July 1, 2021

Oh we're _working_ on it, but we have a ways to go. I can imagine `fly launch` just working for all Elixir apps someday, right now you have to write a Dockerfile. People seem to like our guide but I don't want to suggest it's already the easiest possible place to deploy an Elixir app: https://fly.io/docs/getting-started/elixir/

That said, our stack does make things Elixir folks need very easy. In particular:

* Private network and built in service discovery mean clustering just works

* You can connect to your private network with WireGuard and then use iex or similar to inspect running Elixir processes: https://fly.io/blog/observing-elixir-in-production/

* Postgres setup is pretty magical. "fly postgres attach" gets your app all connected and auth'ed to your Postgres cluster.

* And, most importantly, we actually are the best place to run an Elixir app if you're using Phoenix LiveView. You can push your app servers out close to people and make LiveView incredibly responsive: https://liveview-counter.fly.dev/

jamra · on July 2, 2021

If I were to host Elixir, I don’t think I would opt for docker. This is the one thing that prevents me from trying out fly.io

pablodavila · on July 2, 2021

Why is that? I remember reading about the scheduler not being able to correctly "attach" to each core when using Docker but it seems that's not the case anymore. Is there any other disadvantage you perceive?

tiffanyh · on July 1, 2021

Thanks so much

spondyl · on July 1, 2021

Can confirm I finally launched one side project off the ground thanks to Fly

blutack · on July 1, 2021

Promex is a fantastic and well thought out library which I've deployed into production.

One slightly useful trick that the docs don't highlight is that you can run the /metrics endpoint on a different port to the rest of the application. If you have that firewalled off, a local grafana agent or prom shipper of your choice can happily run against your application without making metrics public.

akoutmos · on July 1, 2021

Thanks for the kind words :)

That standalone HTTP server in PromEx can also be used to expose metrics in non-Phoenix applications. In the coming weeks you'll also be able to run GrafanaAgent in the PromEx supervision tree so you can push Prometheus metrics via remote_write. Stay tuned ;)!

blutack · on July 1, 2021

Excellent, looking forward to it! Right now I run GrafanaAgent separately which isn't too much headache and it works great feeding grafana cloud (especially paired with the Loki docker log driver).

acidity · on July 1, 2021

This is not a fly.io specific question but I always wonder how in these globally distributed system, how are databases handled?

I understand you can put your application server in any location but generally there is only one storage so are these application servers doing cross region database calls?

Having only worked with single cluster setup web apps, I am always curious about this part.

Is the answer always - use a managed replicated database and send read queries to one near your location and all write queries goes to the primary instance?

mrkurt · on July 1, 2021

That's our answer, yes. Read replicas and we redirect requests that need writes to a "primary" region: https://fly.io/blog/globally-distributed-postgres/

We tried cross region database calls for HTTP requests. They were bad. They seem to work ok for long lived connections like websockets, though.

lawik · on July 2, 2021

Using something like CockroachDB you should be able to avoid the single-writer situation. It has it's own complexities I'm sure but seems very compelling.

matthewcford · on July 2, 2021

Would need to commercial licence for this to work though

aswinmohanme · on July 2, 2021

I have all but praise for Fly. It's really easy to deploy an entire Elixir Phoenix production running on multiple regions, and their CLI is really thought out and well put together.

The entire team is active on forums, and the response time are fast and issues get fixed and deployed in hours.

Running a Phoenix LiveView on a fly instance close to your users is the closest you can get to an SPA experience without going full JS.

benatkin · on July 1, 2021

> Fly.io takes Docker containers and converts them into fleets of Firecracker micro-vms running in racks around the world. If you have a working Docker container, you can run it close to your users, whether they're in Singapore or Amsterdam, with just a couple of commands.

FWIW these are OCI images, which can be built with other tools besides Docker. These are not the same thing as containers. I think Docker is overrated and am glad to be using a Containerfile instead of a Dockerfile (it's the same format).

Firecracker is great but there is an alternative: https://katacontainers.io/

There seems to be confusion as to what Firecracker is doing. Docker in Docker isn't really Docker in Docker when it's in a MicroVM - except for being run by an OCI image. https://community.fly.io/t/fly-containers-as-a-ci-runners/10... Docker uses LXC containers, and an LXC container inside an LXC container is likely to have performance problems. MicroVMs, on the other hand, are similar to VPSes.

fly.io suggests that "CPU cores, for instance, should only ever be doing work for one microVM". https://fly.io/docs/reference/architecture/ These MicroVM's are not cheap, and maybe it would be nice to be able to share a VCPU between multiple containers in a lot of situations. Fly.io could be burning a lot of investor money.

I think WASM and Deno are good ways to break down a MicroVM with a whole VCPU into smaller sandboxed entities. Also Docker run by an OCI image makes more sense in a MicroVM that has a whole CPU than it does in an LXC container with less than a whole CPU.

I do think fly.io is pretty cool. I hope they find ways to educate people more about the difference between a MicroVM (not invented by AWS) and a LXC container and to get people making more use of their heavier weight "containers".

Edit: I noticed that in their free tier and low cost plan, they have shared CPUs. https://fly.io/docs/about/pricing/

They are charging quite a bit more for one dedicated CPU than DigitalOcean, Vultr, and Lightsail. The cheapest one is $31.00/mo.

It's quite similar to Google Could Run. It looks like Cloud Run doesn't have an option to run less than a vCPU in memory though. That means you can pay fly.io $6/mo for a MicroVM with a gig of memory and not have a full cold start, but have a shared CPU which will of course cause latency but probably typically less than a cold start, but you can't pay Google $6/mo and get that. https://cloud.google.com/run/pricing Also websockets can stay open with a shared CPU, but can't in between the VM stopping and starting. I don't know if the $6/mo 1GB MicroVM with a shared vCPU is sustainable, but if so, it's a pretty impressive way to host an app that uses websockets for super cheap.

Edit 2: I said "They are charging quite a bit more for one dedicated CPU than DigitalOcean, Vultr, and Lightsail." - I was actually wrong here. It is more for self-hosting if you would run a database like PostgreSQL/MySQL in a VPS (they don't discourage it, and it's pretty simple to do) but wouldn't run a database like PostgreSQL/MySQL in a Fly.io VM (they might discourage it, and it's a bit tricky to do).

mcintyre1994 · on July 1, 2021

They have a pretty cool blog post on OCI images! I think it was the first article of theirs I saw. https://fly.io/blog/docker-without-docker/

mrkurt · on July 1, 2021

vCPU can mean entirely different things between providers, which sometimes makes it hard to compare prices.

In general, we treat "vCPU" as a single hardware threads, which is pretty common. Our hosts use Epyc CPUs with two threads each, which is also pretty common.

So a single CPU dedicated VM on Fly is equivalent to owning a hardware thread, which makes $31/mo comparable to other places. These are roughly the same as DigitalOcean's "general purpose" droplets.

Basic Droplets, and most Vultr VMs, use shared CPUs.

Providers use wildly different CPUs too. You'll sometimes find people surprised how fast Fly VMs are because we standardized on Epyc CPUs pretty early. Much of what you can buy runs on either older Intel or consumer grade processors. Which is actually fine! The people who buy our dedicated CPU VMs just so happen to need a lot of power because they're frequently transcoding or generating images.

benatkin · on July 1, 2021

I took a closer look, and indeed, it's almost the same as DigitalOcean. The 2 CPU plans under basic confused me, and I forgot that the whole section was Shared CPU.

mrkurt · on July 1, 2021

They confuse the heck out of me too. The world of VM CPU pricing is bonkers.

benatkin · on July 1, 2021

Indeed. Still, I think these shared CPU thread VPSes work pretty well up until a certain point, and it's probably true of a fly.io shared CPU thread MicroVM. It's good to be aware of the options and to switch. I would recommend using OCI containers with VPSes, so you can easily switch between plans and providers.

Zababa · on July 1, 2021

> Firecracker is great but there is an alternative: https://katacontainers.io/

What does it offers over firecracker?

mrkurt · on July 1, 2021

Kata containers are QEMU based, so they use similar KVM virtualization as firecracker. QEMU is larger in scope than Firecracker, though, so you can (among other things) expose GPUs and other devices to the QEMU based containers.

We picked Firecracker because the smaller scope makes it easier to understand and, thus, trust for our particular workloads.

benatkin · on July 1, 2021

This is new and may not be used much, but it is possible to use part of Kata with part of Firecracker. https://github.com/kata-containers/documentation/wiki/Initia...

Zababa · on July 1, 2021

Thanks.

xrd · on July 1, 2021

Is it naive to ask why any of this is better than running your own DO droplet with Dokku? I am sure there is a better security story for CloudRun and Fly.io if you are sharing space with other apps, but if you are the only one running apps on it, is there anything to gain with using katacontainers/firecracker?

mrkurt · on July 1, 2021

Firecracker and katacontainers (and other KVM based "containers") are valuable for multiple tenants that don't trust each other. They are less helpful if you trust what you're running.

In fact, a Fly Firecracker VM is not all that different from a DO droplet. We expose them as containers, but you could run Dokku within a Fly app and have a similar set of stuff under the covers.

matthewcford · on July 2, 2021

They are coming out with a Postgres product