If you're after feedback, nothing in this post sounds like a tangible business r...

pmarreck · on April 9, 2023

Being able to run a test suite in parallel on your own machine is massively more useful than having to run it across a bunch of machines in the cloud. First of all, clouds go down (the external-dependency problem), or become unavailable when Internet isn't available (and also cost extra money). Secondly, you get instant (more or less) feedback on whether you broke anything, and that allows you to maintain velocity without having to backpedal. Thirdly, if the suite is run concurrently in a deterministically-random fashion (where it outputs the seed before the suite starts), you can often identify reproducible concurrency bugs that will NOT (or be very unlikely to) show up on your multi-VM cloud test runner (which run all the tests single-threaded on separate machines). Fourthly, if we're talking a functional immutable language, those types of bugs (which are some of the worst and most costly of bugs to track down and fix... I know this from experience, a nondeterministic session-loss bug at Desk that it took me a month to fix likely lost thousands of customers, not to mention the cost of essentially benching me for a month plus the benching cost of the previous devs that took a stab at it and failed) are rarely found to begin with.

Every one of these leads to either more productive employees, lower ongoing costs, or happier customers. How are these not tangible business results?

mrkurt · on April 9, 2023

Oh I agree with the potential technical benefits. I think the business justification is pretty weak, though.

The business benefits you listed are defensive. That is, if the business is doing well, those are things you might optimize for. The same way you'd, say, try and decrease power consumption.

There are companies where an order of magnitude decrease in ongoing costs opens allows a whole new kind of product. But order of magnitude increases in the things you just listed are rarely developer process improvements or refactors.

Customer happiness is a fun one. If you have a buggy or unreliable app, you'll frustrate customers (I am very aware of this, if you check my post history). You won't _make them happy_ just by building reliable software though.

As someone whose #1 priority is reliability right now – fixing bugs faster very low on the list of things that matter. The problems users have with our platform are a combination of the wrong architecture, and non existent tooling for customer communications.

Which gets back to the original post. We want to hire devs who can identify out what our users need most today, figure out what we can do about it (if we can at all), and then implement what makes sense.

pmarreck · on April 11, 2023

I see your points, and were I at a company currently (I'm looking), I would prioritize the business-leading things, of course.

> #1 priority is reliability

Yeah, mine is too. (In fact, I think we're approaching a crisis of reliability out there, if we're not there already.) That's why I went with Elixir, and it's paid off so far. The only time the site's been down in years is due to my hosting provider, not me... (So of course I'm looking for another hosting provider...) I have to deal with the occasional 500 errors as well, which, as it turns out, are also largely due to my hosting provider and not my code (but sometimes my code...).

The thing I realized about reliability way too late is this: The utility of a service (or a person) to a person drops off VERY quickly if you dip below 99% reliability (or arguably, 99.9% reliability). Something (or someone) that is 90% reliable is practically useless, because it is bound to fail at the worst possible time. So in all the software I write, I focus on this, and in all my relationships with people (or orgs, or bosses, or my son, or my S.O.), I know that if I commit to them, I must be as reliable as humanly possible because the costs of every slip-up are huge and rise exponentially as they accrue.

An example of a service that is just unreliable enough to be a huge annoyance (which has also caused bad communication failures as well in my experience) is SMS. SMS failures are the worst.

pastage · on April 9, 2023

I am too stuck in single thread front end integration tests to be able to relate. Are you actually talking about testing full stack or just some backend tech. We almost never have problems with our cloud testing, it is expensive. We need it to have some control over the front end work, benefit is that running the back end stuff is trivial and fast.

pmarreck · on April 9, 2023

I have a design bias- I pretty much refuse to build anything for the front end that is not easily testable without using a headless browser as part of my test suite. Headless browser driving adds so much more resource consumption and time to the test suite over time that it defeats the entire purpose of a test suite, which is to (as quickly as possible) assess if any logic in the app has begun violating expectations, so you can proceed with building the next thing quickly without backpedaling and without waiting.

I completely understand that individual browsers may have this or that problem or idiosyncrasy and that may need to be tested but I'd put that in QA, not your official standard app test suite. At the most I'd test whether endpoints actually spit out HTML, CSS etc. but not actually check the rendering step. I do have a JS suite as part of my full test that uses jsdom to check DOM manipulation but it is only because it is not resource-intensive and runs fast.