Equifax Inc. is a global leader in consumer, commercial and workforce information solutions that provide businesses of all sizes and consumers with insight and information they can trust. Equifax organizes and assimilates data on more than 820 million consumers and 91 million businesses worldwide.
In this article we feature Jim Grill, Senior Director of IT Software Engineering and Automation at Equifax about how the company is using Chef server as part of its release pipeline.
The Equifax environment
The Equifax internal cloud has two separate virtualization platforms. It uses Hyper-V largely as the hosting platform for Web apps that were originally on bare metal. It uses OpenStack for its 12-factor apps. There are thousands of virtual machines (VMs).
The Equifax workflow
The Equifax workflow relies on the Chef server, the Chef Automate compliance feature, Jenkins, testing tools, and Git for version control, hosted on Atlassian Bitbucket. Because of its automated pipeline, Equifax can deploy its applications and the supporting infrastructure together through a series of environments designed to catch errors well before production. In this article, we'll concentrate on how Chef cookbooks are developed. Here is a diagram of the Equifax workflow.
Jim explained that, for new functionality, a developer will either clone an existing cookbook or create a new one. Every cookbook must have a set of tests associated with it, but whether they're written before the cookbook is or after is up to the developer.
Next, the developer issues a pull request (PR), which triggers an automatic build. There is also a peer review, performed by the cookbook maintainers. These are largely people on Jim's team but there are also people from some of the app teams. When someone evaluates the PR to consider merging it, they can see if the automated tests have passed.
After the tests have passed and the changes have been approved, they are merged into the development branch and tested again. At this point, team members have a sense of what a release might look like so they select from the approved pull requests, which may have come from multiple developers, to create a release candidate. Creating a release candidate triggers a set of integration tests because, up to now, the changes have only been tested independently of each other.
If all goes well, team members look at the changes to decide what category they fall into, such as bug fixes or features, and assign an appropriate semantic version. They then issue a PR to merge the changes into a release branch. At this point, in general, teams follow the GitFlow model for handling branches.
Builds are tested again because there may be other changes that have come through the development branches to make up the release. If these tests pass, the CI tool creates a tag and pushes that tag back. "We've automated that step,” says Jim. "We've had problems in the past where people remember to bump the version but they forget to create a tag. We like to have a tag in Git because it always gives us a place where we can go back and reference it, no matter where it is. It always stays the same, just like a bookmark."
At this point, the team has a potentially shippable product. What happens next depends on whether the cookbook is an application cookbook or an environment cookbook. If it's an application cookbook, the team assumes they can publish it, which means they push it to the private Supermarket. If it's an environment cookbook the process is a little different.
Environment cookbooks aren't published to Supermarket. Instead, the team starts its continuous delivery (CD) process for delivering the cookbook to a series of environments. "We view cookbooks as artifacts,” Jim says. "They're building blocks. If you have Tomcat and Apache, those are separate cookbooks. If you have Java, maybe that's separate, too. But at some point, you're going to bring the three together. Maybe you'll configure Apache as a reverse proxy and install Tomcat, which needs Java. So the first time all these three things see each other and are put together in a final solution, that's where we start our CD process.
"The first environment is a Phoenix build, meaning it's built from scratch. If everything passes, then we move to the next phase, which is an environment where we will converge. Obviously, there's a VM or a server in that environment that's already been converged on previous versions and it stays there. The reason we do that is we want to make sure that not only will this Chef code work from zero, from the ashes to completely done, but we also want to make sure that incremental changes will converge on their own.
"There are rare instances where someone will create major, breaking changes and there's no way a converge will work. In that case, you have to stop what you're doing and redeploy. We try to avoid that if we can but there are just some times where there's a major change or a major improvement that's actually worth it. You know you can do a Phoenix deploy but there's no way you can converge from that point.
"I've seen things break for all kinds of reasons and that's why I wanted both a Phoenix and a converge environment in the pipeline. We need to make sure that we can do more than converge deltas of change over long periods of time.
"The last phase is the staging environment. What happens there depends on whether the deployment is a converge or a Phoenix deploy. Either the job creates a brand new VM or we just let the job converge an existing VM. It will create a snapshot before it does anything else because this VM should be exactly like whatever's in production right now. It's very important that we capture that state, then let Chef do its thing.
"If the deployment fails, it automatically rolls back because we want to be able to capture whatever the problem was and recreate it and make sure that it doesn't do that again. This rarely happens but that extra step is the ultimate in confidence. Any more work that needs to be done at this point, maybe people have something that's part of a larger cluster, it's a little more complex so they actually want to check, they actually want to log in and look at it and poke it with a stick, whatever it is they have to do, this is the stage where you do that because this is right before production.
"With our workflow, everything must pass one phase before you can go to the next. There's no way to bypass anything unless you know the system intimately. People on my team can help in the event of an emergency or a serious problem but, for the most part, for the average person, there's no way to circumvent the system.
"Our pipeline's pretty cool and I think most developers are excited about being able to deliver their code and the supporting infrastructure at the same time. They're interested. It's something they've wanted to do.
The benefits of Chef
"I think the challenge we have, and it's getting easier, is explaining the benefits of automation to business and really selling it to them," says Jim. They're used to doing things as fast as they can and hitting some short-term release date rather than focusing on long-term manageability and strategy.
"We explain that it takes a little bit longer up front to develop Chef code and automation and fix your releases. It's a significant investment but we have enough teams doing it now that it's easy to point to what the benefits are. Obviously, the first benefit is better quality. The second is speed. We can deploy environments very quickly. We can do A/B testing if we want to. We can do Phoenix deploys where we just tear down an environment and deploy all new stuff, switch load balancers around or something like that. Those activities really aren't possible when you're doing the things the old way, by hand.
"Plus, we have a much better disaster recovery story. Before, if there was some kind of disaster or you lost some servers or VMs, everybody was terrified because nobody remembered who set them up. With Chef, we run all of our tests once a day whether they need it or not so we don't have any code rot. We know with absolute certainty and confidence that we can redeploy our servers immediately.
"Another advantage is that people have a lot more autonomy. We have a self-service portal now that integrates with Chef and we're constantly improving. People don't even really need us anymore for a lot of things. They don't need to submit a ticket and wait or hope that someone has time to get to it. They can just go ahead and do it right now.
"Before Chef, we did everything by hand. You had to create a ticket, that ticket went into a queue, and it could take 3 weeks, 4 weeks, 6 weeks, 2 months, who knows. It depended on the complexity of the request. Also, you had to hope that you gave the right requirements and that the server got configured correctly. There's always a process. 'Hey, I need a server.' 'Okay, here's a server.' 'Oh, I need you to add this.' 'Okay, that's done.' 'Wait, I forgot to tell you, I need this, too.' People really don't think things through. Chef helps you to do that. You've written it all out and you've had time to iterate.
"We found that value stream mapping really proved our points. We saw that there was a huge amount of queue time, manual processes, and rework. That's wasted time that adds no value. Now, we're always adding value. We went from many weeks of wasted time to spinning up and converging a VM in as few as 10 or 12 minutes. It's certainly less than 30 minutes, depending on how many VMs you want.
"Value stream mapping is a good place to start when discussing the value proposition of automation to stake holders. Being able to visualize the current value stream's future state is quite powerful. That's how you get funding for automation and tools like Chef. You have to be able to articulate what the value is and show them what the process looks like without Chef and what it could be like with Chef. When looking at a process as complex as fulfilling requests for infrastructure you need to take baby steps and attack the worst parts of that value stream first. Keep doing that and frequently measure progress against your baseline. It's a very powerful way to tell a story and improve process."