Overloaded Titles
by shoowaConfusion about the term DevOps and several titles reigns in corporations adjacent to the software industry.
- Site Reliability Engineer
- Production Engineer
- Platform Engineer
- DevOps Engineer
I heard a HR recruiter state that SREs are not software engineers, and that the company pays them less. Another suggested that SREs are simply glorified SysAdmins. A third thought that Production and Platform worked on the same thing. They all struggled to find suitable candidates.
They all believed that only developers creating features visible to the customers were valuable. That’s like saying only the sailors on a ship are important, and none of the shipwrights or dockworkers contribute to an economy.
The cause of this confusion is either ignorance or a conflation of aspirations. Many old corporations with stagnant software departments borrow phrases from the newer, pure software corporations in the Bay Area. They believe changing a few words will attract top talent. They didn’t re-evaluate the organization and reshape it to align with market trends, industry standards, and a new goal. They didn’t prepare a sortie.
This post will sidestep the debate about engineer and developer. And I won’t rant about the difference between data engineer and data scientist.
DevOps
This portmanteau is a culture. And culture is the unwritten way we do things around here. What does that really mean?
In the Texan Hill Country, a few people still kill, skin, quarter, butcher, and smoke a deer. There are plenty of people in Manhattan who believe red meat is always wrapped in plastic. Many people buy a jar of tomtato sauce across the nation, and others roast tomatoes and simmer them with onions & garlic for pasta on Sunday. In the Upper East Side, people hire attorneys to resolve disputes. In Bay Ridge, some people resolve matters with pugilism. The FBI trains with firearms. The CIA doesn’t carry sidearms, because they hire people who wield guns.
Rarely do any of these people think about these facts. They’re living day to day, assuming this is the best way to handle small tasks and big goals. They are living in different cultures.
DevOps is a culture of shared responsibility. Everyone helps when a problem emerges. And it is a culture of preventative action. Everyone prepares to make the next deployment easier and safer. It isn’t a duty that can be assigned to a single laborer. It must be led by a group of conscientious people, and the rest must follow.
Convergence of Duties
The other three types of programmers all touch servers in the production environment in various ways. I suspect laymen conflate these roles, because of this shared quality. The three roles differ in their concerns and audiences.
Production engineers usually deploy, configure, maintain, uprade, and decommission servers. They handle hardware and the operating system. They can swap out a NIC and tune Kernel parameters with SysCtl. They also respond to incidents. So they usually investigate networking errors. And they consider how different servers fulfilling different purposes can be arranged to satisfy consumers.
Platform engineers build a set of programs that help their colleagues. Usually, the programs shepherd code from laptops to production servers. Additonally, these programs might provide valuable feedback about the development teams’ various applications. In our industry, a “platform” is often a digital marketplace in which two or more parties can gather to exchange value. Inside a software firm, a “platform” is a set of apps that allow two or more teams to quickly & precisely create other apps.
SREs reveal the performance of hosts, applications, and databases by providing logs, metrics, and traces. The data is useless if no one measures it, so it is offered to all programmers. SREs look at this data when responding to an incident. Any solution to a repetitive problem will be written as code. They rely on these signals to identify weaknesses in the configuration and arrangement of servers, and weaknesses in the design of the applications.
SREs can halt the deployment of an application. Much like a sheriff maintaining order in a Wild West town, an SRE can maintain order in a software organization by denying a reckless development team the privilege of releasing an application to the public. Most organizations outside of Google that employ an “SRE” team never grant this authority, and without it a team truly isn’t able to sustain reliability. The organization chart can reveal the political importance of reliability. Is the SRE Director on the same level as the Engineering Director? Is the champion of reliability able to disagree with the champion of features? When an organization places the SRE Director under the Engineering Director, then expect new features to trump error budgets and freezes.
The Bottom Line
A smart company allows anyone from these three teams to float around and join the other teams for a couple of projects. Or even switch roles for a year to really understand another side of software and the business.
When any of these components break down, will several people volunteer to help? Will anyone feel comfortable helping? Will anyone feel comfortable resolving a problem that is caused by the interaction of the various components? If the answer to all of this is yes, then you are enjoying a DevOps culture.
#corporate #industry