During Bruno Connelly’s six years (and counting) as Vice President of Engineering at LinkedIn, the social networking behemoth has grown from around 80 million to over 450 million users. How is that sort of growth possible? One word– or, more accurately, one portmanteau: DevOps.
In an interview that Connelly gave to InfoWorld he provided some valuable lessons for any modern organization that needs to scale its business.
When Connelly joined LinkedIn in 2010, the company was really taking off and they were struggling just to keep the site up. At the time there were just six or seven software engineers on Connelly’s team, developers had no access to production. New versions of the entire LinkedIn.com site were deployed every two weeks using a branch-based model with the site rollout taking eight hours.
A Model To Emulate
As Connelly tells it, the first priority across the company was to stop the bleeding and get everyone to agree that site reliability trumped everything else, including new product features.
Along with that imperative came a plan to make operations “engineering focused.” Instead of being stuck in a reactive, break-fix role, operations would take charge of building the automation, instrumentation, and monitoring necessary to create a hyperscale Internet platform.
In this new model, operations people would also need to be coders, which dramatically changed hiring practices. The language of choice was Python – for building everything from systems-level automation to a wide array of homegrown monitoring and alerting tools. The title SRE (Site Reliability Engineer) was created to reflect the new mandate.
Many of these new tools were created to enable self-service for developers. Indeed, it may be said that in large part, it was this drive that paved the way for LinkedIn’s meteoric ascent.
Today, this goal is being met by operations all over the globe through no shortage of tools and tactics. Not only can developers provision their own dev and test environments, but there’s also an automated process by which new applications or services can be nominated to the live site.
Using such monitoring tools, developers can see how their code is performing in production – but they need to do their part, too
Top Takeaways For You: Successfully Leveraging DevOps
While the term “DevOps” may not have been shouted from the rooftops of LinkedIn HQ in those days, you can be sure that – whether they realized it or not – everything they did was in the service of building a well-oiled DevOps machine.
All too often, developers build what they’re told to build and hand it off to production, at which point operations takes on all responsibility. In the ownership model, developers retain responsibility for what they’ve created – improving code already in production as needed. Pride in software craftsmanship became an important part of the ethos at LinkedIn.
At its core, this is the takeaway that we should all look to emulate from LinkedIn. Successfully leveraging DevOps means instilling ownership and professional pride in the workplace, across departmental boundaries; a big picture mentality coupled with a profound appreciation for fine detail.
The most important lesson from LinkedIn’s journey is that the old divisions between development and operations become non-starters at Internet scale. Developers need to be empowered through self-service tools, and operations needs a seat at the table as applications or services are being developed – to ensure reliability and to inform the creation of appropriate tooling.
Call it DevOps if you like, but anything less and you’re liable to find yourself on shaky ground.