Structuring Your Infrastructure as Code

redeux · on Aug 19, 2023

I also think about layers when I set up IaC, but I'm more focused on how things connect and relate rather than sticking strictly to the OSI stack model. In my mind, it's all about grouping things that might influence each other. This approach usually leads me to think in three layers: foundation, shared services, and applications.

Starting at the bottom, the foundation layer holds the basics like networking, storage, accounts, and permissions. The shared services layer is where I place tools like certificate managers and secret storage. I keep services that interact closely together, while separating those that work more independently. At the top, I lay out the applications. This is where I slot in services like auto-scaling groups, individual server instances, load balancers (depending on whether they're communal or specific), and pods in platforms like Kubernetes. Depending on the complexity of the environment there may be 1 or multiples of each layer.

By structuring IaC this way, I find it’s clearer and more intuitive.

FujiApple · on Aug 20, 2023

This is very similar to the layering approach I ended up with for a service I built a couple of years ago (AWS using Pulumi).

Global - Things from the AWS global region, notably DNS (DelegationSet and Zone) and IAM

Core - Semi-permanent per-stack resources such as secrets and certificates

Network - Network resources per-stack (ie. VPC & EC2)

Database - Database resources per-stack (i.e. RDS) and rotating secrets (via Lambda)

Application - Application resources per-stack (i.e. ECS)

Breakglass - Resources for breakglass shell access to the DMZ subnets

hermanradtke · on Aug 20, 2023

What I like about this approach is that it maps well to different teams. The ops team can own the bottom 3 and product teams can own the top 3.

stevekemp · on Aug 20, 2023

I'm using something similar at the moment, and it mostly works. There are issues now and again though - if the developers of a service own the "top" layers, and start adding new services then that often requires changes to the core.

For example if a team were to suddenly start using DynamoDB then the IAM roles for their application, in their account, suddenly need the permission to add/get records.

Most of the time the layers are distinct and the "ops" team can handle the core, leaving the application/service-specific stuff to the developers, but things do come up that have to be scheduled and coordinated across the layers/owners. It's a pain, but so far tolerable one.

crabbone · on Aug 19, 2023

I was looking for the explanation about how this grouping is like the OSI model, but found none...

Also, I think where OP uses "principal" they mean "principle".

The whole article reads as an advertorial for Pulumi. :|

OP also never bothers to ask themselves questions like "what if I'm wrong?" or "what to do with this obvious claim that doesn't add up?".

For example: why is "Data" layer below "Compute"? -- that's the kind of question that's never addressed by OP. I mean, most people in the industry wouldn't think about this as being layers, and definitely not being one on top of the other. To convince someone you need to give a very solid argument here... but there's nothing there...

icedchai · on Aug 19, 2023

Layers 4 through 6 make some conceptual sense if you consider the lower layers being support/infrastructure for the higher layers. In the old days, before cloud, we used to draw plenty of diagrams that were essentially DB Server -> App Server -> Web Server -> Load Balancer... it's the same kind of thing.

I say some sense because layer 3 "permissions" sticks out to me like a sore thumb. Whenever I work with terraform I spend 50% of the time on permissions. I'd hesitate to call it a "layer" given the pervasive nature of IAM roles/permissions across all resources.

drewcoo · on Aug 20, 2023

> explanation about how this grouping is like the OSI model

Because there are 7 of them, just like the toes on your left foot, one eye, and your right thumb? It didn't make sense to me either.

And seemed like an ad.

paulddraper · on Aug 19, 2023

Because (1) the layers build progressively more sophisticated upon more basic layers (2) there are 7....well 8, but it starts at layer 0.

I guess

jen20 · on Aug 19, 2023

Because the “data” layer covers entirely managed services…

msie · on Aug 19, 2023

I don't believe everything should be in code. I think code is too verbose and a config file just make sense in many cases.

jjnoakes · on Aug 19, 2023

Well-written code in an expressive language can be very close to a config file, but also has the nice ability to scale up in complexity when required.

cdogl · on Aug 20, 2023

It's not either or. A config file checked into git and applied by a CI/CD pipeline is infrastructure as code.

jjnoakes · on Aug 20, 2023

Config files work fine. But so does expressive code with the right abstractions. And I think code scales up better and down almost as well. Just my opinion.

fullstackchris · on Aug 20, 2023

if you're talking yaml... you could argue that config file is also code, just a different language

agumonkey · on Aug 19, 2023

the recent years often remind me of alan kay talking of objects made up of object talking to objects, i wonder if IaC amongst other trends is not an incarnation of that on a wide scale

mdaniel · on Aug 19, 2023

That description seems to be what System Initiative is attempting, as best I can tell https://news.ycombinator.com/item?id=37149271

agumonkey · on Aug 20, 2023

Cool, I skipped that link not knowing what it was. There really seems to be a strange conceptual redundancy, you used to wire bits of assembly into classes and plug objects together into cohesive graphs, now it's the same except at the upper layer, containers as entity of logic, network bridging as information transport.. and you suffer from the same issue.. you can have adhoc state in your system config files, inability to initialize a sub component without the rest, lack of logical interfaces..

martini333 · on Aug 19, 2023

Great read, but could benefit from some proofreading...

jaxxstorm · on Aug 20, 2023

thanks, I've been through and fixed some issues, but any others you find please let me know!

lijok · on Aug 19, 2023

This has to be the worst take on IAC organization I have ever seen. I would have never thought someone would try to apply the osi model to infra code management.

How long does it take to deploy a new service with this approach? A week?

pickledish · on Aug 19, 2023

A few reminders from the HN commenting guidelines:

> Be kind. Don't be snarky. Converse curiously; don't cross-examine. Edit out swipes.

> Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something.

https://news.ycombinator.com/newsguidelines.html

lijok · on Aug 19, 2023

Normally I would absolutely agree, but this article is the equivalent of one called "Flying a Boeing 737" in which the author advises you to invert the plane, fly at sea level and turn off all instruments. That is how little the proposed approach in this article makes sense.

Given that, how is one supposed to reply critically to such a post? I'm genuinely curious and open to suggestions, as it's something I'm clearly not good at.

marcinzm · on Aug 19, 2023

>Given that, how is one supposed to reply critically to such a post? I'm genuinely curious and open to suggestions, as it's something I'm clearly not good at.

Link to or describe a better approach and explain specifically why it's better.

Your only specific part was that it's slow to deploy a new service which for most organizations is somewhere on the bottom of their priority list. In fact many organizations probably prefer slow deployments as that implicitly discourages unnecessary services and infrastructure bloat. That in turn lowers the long term maintenance burden and technical debt. Five hundred services that are in reality owned by no team or whom no one knows about are not what you want in an organization.

primax · on Aug 19, 2023

You reply critically to such a post by using experience, references and rhetoric to explain your position. You know this.

These Reddit level comments don't belong on this site.

mathisfun123 · on Aug 19, 2023

Do you know what tone policing is? hn is one of the most conservative echo-chambers I visit (yes I should stop) because constantly everyone is always being tut-tut-tut-ed.

thrixton · on Aug 19, 2023

I disagree, in a lot of cases a new service would require an update to 1 or 2 stacks only, and only those 2 stacks need deploying.

It is in some cases required to do some version of this as the vendor API support does not allow for proper feedback when an operation is complete so it needs to settle.

Or, building docker images which run each time and take a long time / resources unnecessarily (I believe Pulumi have a fix for some version of this).

If your stacks are deployed via CI/CD, it’s not really a big deal to deploy 10x stacks in sequence, or just.

This may be overkill for a lot of projects but it’s valuable insight from a respected organization / individual.

wodenokoto · on Aug 19, 2023

So what is a good take?

scarlson · on Aug 19, 2023

Whoa, you expect someone to drop an inflammatory opinion AND offer a reasonable alternative?

What's next, real world data to back up their claims? Research papers offering corroborating evidence?

This is the internet, we don't do that here.

intelVISA · on Aug 19, 2023

I'll also accept a peer reviewed HN comment chain in lieu of academic journals.

lijok · on Aug 19, 2023

Haha, you're on point

Dave3of5 · on Aug 19, 2023

I'd say dont do any layering and stick with the standard naming convention of "stacks". Start with a common stack with all your common stuff and application stacks with stuff that specific to some application say all the resources for a microservice or everything for a BI system ...etc.

Avoid splitting this up as it introduces too much complexity. The IAC code should be very simple such that any dev can pick it up just coming off the tutorials.

Company I'm in has 3 layers and dozens of stacks and it's made the whole thing impossible to reason about. No one wants to touch it anymore which means we now have a Platform team that screws around with this chap for months on end.

Note: Lee Briggs works for Pulumi as a Principal Platform engineer so its in their interest to make this too complicated.

jen20 · on Aug 20, 2023

> I'd say dont do any layering

> Start with a common stack with all your common stuff

> application stacks with stuff that specific to some application

So... layers! Right then.

intelVISA · on Aug 19, 2023

> Note: Lee Briggs works for Pulumi as a Principal Platform engineer so its in their interest to make this too complicated.

ding ding: we have a winner

windowsworkstoo · on Aug 19, 2023

Global namespace shared resources, regional namespace shared resources, then each app provisions its own bit, consuming/linking the two aforementioned layers.

Everyone gets here eventually and you can just fight over stuff like “is an alb shared regional or app specific”

SirensOfTitan · on Aug 19, 2023

Put stateful resources in one bucket, non-stateful in another, and then do that until it causes obvious issues then revisit later.

Infra should be simple as possible, and the simple infra should inform simple app design.

robluxus · on Aug 19, 2023

In what mature organization does it take any less than a week to deploy a new service?

throw3823423 · on Aug 20, 2023

In what mature organization takes it more than a week? My current org is quite mature: You probably use it. It's also not a cloud provider: Our aws bill is in the high 8 figures a month. And yet launching a new service not directly pingable from the internet, and deployed in, say, 5 regions, is a matter of 3 PRs, adding the service to CI included. I've gone from having no repo at all to deployment in 4 days, because we were in a big hurry. All the infra-defining PRs will get eyes from an SRE or three, but the team that is writing the service is writing the PRs.

I bet we have far more instances under our name than the people that write this article, and yet we have nowhere near that level of complexity in our IaC definitions. And yet, somehow we manage. I guess we are immature?

lijok · on Aug 19, 2023

Takes 30 minutes in ours, tops. Provision a new AWS account, 10~ min, copy/paste generic service template, plug in the vars and deploy, 20~ min.

We however had the advantage of building IAC from ground up and had the time to do it properly.

jiggawatts · on Aug 19, 2023

There’s a huge difference between “identical clone of an existing service” and “a new service”.

My challenge at $dayjob is that it takes months to spin up a new cloud service because they’re new.

Either a new app that wasn’t on the cloud before — in which case the templates need extensive customisation.

Or, new app in the sense that the devs just cracked open Visual Studio and have no idea yet what they actually need from the cloud.

I get maybe 10-20 copies of a template (dev/tst/prd + ha/dr), and then I have to start from the beginning.

Guidance on how to maximise reusability would actually be very useful.

Unfortunately, in the real world, this seems difficult. Many small variations in requirements tends to make abstractions leaky.

For example, one vendor requires active-passive load balancing for licensing reasons. Millions of dollars worth of licensing reasons. Neither AWS nor Azure support anything but active-active in any of their load balancers. (They do in DNS, but for various reasons that won’t work for us.)

Another “new” app (industrial air quality monitoring) is actually from the stone ages and doesn’t support PaaS databases or even 3 of the 4 clustering modes available in IaaS. So a custom load balancer solution is required… just for it.

This is the issue. Everyone that loves the cloud and says it’s simple has easy mode turned on: cookie cutter clones they can stamp out for many identical customers or whatever.

Some people play the game with “big government” difficulty.

lijok · on Aug 19, 2023

I could have elaborated. When I say "generic service template", I mean a service with any cloud requirements (that we've had at any point before) can be assembled from building blocks (TF modules) in 20~ min. This ofcourse doesn't work if every new service has new unique requirements.

Happy to talk through the setup we're using and some other setups I've seen work if you're interested, but it's quite extensive.

Our current setup is heavily inspired by the work Gruntwork (not affiliated) are doing. I highly recommend taking a look at how they do it and even subscribing to their service if the need is there. They provide pre-built modules for basically any usecase.

jiggawatts · on Aug 19, 2023

References to anything public would be nice

So far I’ve struggled to get any real modularity.

The biggest issue I’ve had is that every assumption I’ve made has been violated.

So either modules must be fully generic — saving no work at all — or make assumptions and then not be reusable.

bottled_poe · on Aug 19, 2023

It’s the same amount of code smh.

jaxxstorm · on Aug 19, 2023

thanks for the feedback!