ragona

What the hell is engineering?

2020-July-3

I'm not sure when the switch happened, but we call ourselves "engineers" now instead of programmers. Software engineers. Web development engineers. Full stack engineers. QA engineers. Front end engineers.

We're all doing engineering, apparently. We have colleges and boot camps pumping out engineers as fast as possible, but I don't think we have a shared understanding of what that part of the title means.

My unscientific guess of what happened is that software engineers started commanding unusual salaries. Those of us programmers with enough aptitude for algorithmic hoop jumping just learned the secret handshake 1 and became engineers, at least in title.

So is there a difference at all between a programmer and an engineer? I'm going to draw an arbitrary line and declare that yes, there is a difference, but it has nothing to do with data structures and algorithms, or even software.

Defining engineer

I'd like to take a stab at a definition. Mine will be as wrong as any definition and wronger than many, but here goes.

engineer: a person who designs systems to solve problems

Let's put this through the paces a bit with different types of engineering. A bridge feels comfortably like a system to me. It solves the problem of crossing a body of water.

Agricultural engineering? Sure, biological systems to solve production problems. Mechanical engineering? Yep. Even a shovel is a novel bit of engineering; it produces a full loop that dramatically amplifies the digging power of a human. If you made a shovel that couldn't be held it wouldn't be very useful. UX engineering? Absolutely, UX patterns are subtle, dangerous tools that produce highly counter-intuitive human behavior.

So this is my current line of thinking. Engineering is almost about everything but programming, which is just a single tool in the systems we create that ultimately combine hardware, software, and human effort to produce results.

Engineering is about considering and carefully designing the consequences of the system you're unleashing. Not just how to create it, but even more critically in how you will update, maintain, and eventually deprecate it.

What about good engineering?

When we call ourselves engineers we're using a word with deep tradition. Engineering covers dozens of branches of deep academic study. A history of triumphs and tragedies of great human proportions. There is a wealth of institutional knowledge for us to draw on, but before we get there, I want to talk about why this matters.

There are plenty of stories of professional engineering teams getting major physical public infrastructure dead wrong.2 I think that has lead to some of the most useful stories about real-life, on the job, critical engineering that we can learn from. In some situations engineers pull off miracle fixes. In others people die.

I propose that accepting this responsibility, owning it, and taking it seriously is what good engineering is about. Serious, good engineering is about understanding what the human impact of system failure is and building mitigations, be they technical or human. Stewardship of critical systems is critical work, because of the real human impact that problems in those systems can cause.

Humans aren't good at software engineering

The term "software engineering" wasn't coined until 1965. Mechanical engineering, to contrast, is as old as civilization itself, and experienced a real boom during the Islamic Golden Age. We've been learning how to make crankshafts since the twelveth century, but unit testing was invented in the last 20 years and let's be honest, we mostly didn't do it for most of that time.

Of course we aren't good at software engineering. It's still a brand new discipline, and we're still learning how to do it well.

Software often forgets about humans

In our rush to deliver code we LOVE to forget about the humans who will interact with that code, both our customers and ourselves. Imagine the impact of engineering a bridge without a way for crews to maintain it. Digging a mine and forgetting to install ventilation for your miners. Building a building that required humans to hold up walls because you ran out of time.

I assert that software teams get this wrong more often than they get it right, and I think backend engineering teams are especially guilty here. We're often working so hard to deliver additional functionality that we neglect to design a functioning system.

I have seen more teams get oncall rotations wrong than I have get it right. Even having an oncall rotation puts you ahead of most backend teams. Managing to create a healthy, well trained, well documented, well distributed oncall rotation that doesn't burn your team out is damned near magic that almost no one gets right. A well-written and maintained runbook is a rare and beautiful thing.

In software we ask engineers to perform ridiculous acts of dangerous individual heroism all the time to maintain critical infrastructure all over the world, and we have to do better. It's not only morally shit to treat your humans like robots, it produces unstable systems.3

Why do professional engineering teams do this?

I think that this is so common is mostly because software engineering is a new discipline and there isn't actually any consensus on how to do it well. We're mostly not even sure what it means to do it well, and we're printing money doing a mostly mediocre job, so why really do it well?

One of the most powerful and dangerous things about how malleable software is, as compared to more physical types of engineering, is that we really can just ship a mediocre system and then swap it out with a better one if we end up having enough success to need to. We do this all the time, in fact, it's the standard path for software companies. We create something sort-of-okay, we launch it, it starts breaking, we furiously fix it, repeat ad nauseum.

But it is constant careful work to stay ahead of the growth of your own systems, and it involves a lot of attempting to predict the future. Get this balancing act wrong and you cause yourself and your customers quite a bit of trouble.

Conclusion

I suggest that for solutions to our new field's issues we turn to the title we've all decided to wear. Engineering has a deep background of knowledge to draw from, and taking these things seriously allows us to build massive systems that span the entire world. It's magic.

Let's improve our discipline, understand and accept responsibility for the consequences of our creations, and spend time learning from the broader field of engineering.

Postscript: A personal example

I personally work on massive distributed systems. I've been reading a lot about industrial control systems and the academic literature around PID loops. This stuff is all about keeping systems stable within an unstable environment, and there are lessons everywhere.

In terms of other fields of engineering that are directly applicable to what we do day to day as backend engineers, this is just a buffet of useful stuff. I'm using a PID loop and control theory in a software project here if you're curious how I'm personally applying it.


  1. See: Shibboleth, From Hebrew shibboleth, meaning uncertain, perhaps either 'stream in flood' or 'ear of corn'. The English use originates in the Bible, in the Book of Judges 12: 5-6, where the Gileadites defeat the Ephraimites at the River Jordan: 'And the Gileadites tooke the passages of Iordan before the Ephraimites: and it was so that when those Ephraimites which were escaped saide, Let me go ouer, that the men of Gilead said vnto him, Art thou an Ephraimite? If he said, Nay: Then said they vnto him, Say now, Shibboleth: and he said, Sibboleth: for hee could not frame to pronounce it right. Then they tooke him, and slewe him at the passages of Iordan.' OED

  2. Links:

  3. If you want to cause service outages simply understaff your engineering teams while pushing them for new features. Works every time.