We all know that, when working with important information, we should keep a copy of that information as a back up. This back up could be stored right along side the original but a better idea is to store it on another, secure location.
When you start treating your infrastructure as code, you need to follow some of the same practices that application developers do, and that includes keeping copies of your work in a safe place and keeping track of what's happened to the code as the project changes.
Imagine that you and your team are working on a cookbook. Developing and deploying the first release of the cookbook is only the beginning. Shortly after the release you begin working on the next version but people discover issues with the currently deployed version. Repairing these issues requires you to stop development on the new release and go back to the older release.
To do that effectively requires an entire history of the changes you made. That history:
- provides a way to return to a previous version.
- lets you know who made each change.
- describes why the change was made.
One way to safely store versions of your code, keep track of its complete history and involve the people on your team is to use a distributed version control system. Here's an example of a typical workflow.
When you first start developing changes you work on a local copy of the code. As you make changes, you'll come to various points where you haven't completed an entire feature or fix but have reached an important milestone. For instance, you may find a fix for a bug but discover that your solution is too slow or doesn't adhere to standards defined by the team. You save your work and, because you have your own local copy just as everyone else on the team has theirs, your work remains private. You aren't saving your work to a single copy that exists on a server and that everyone shares.
At the point that you think your code is ready for other people to evaluate, you send your local copy to a central repository. Once you've sent your changes to the repository, other team members can retrieve it and inspect it, test it and merge the changes. They now have their own up-to-date copy of the software on their local machine, where they're free to make their own changes without impacting anyone else.
The Chef community primarily uses Git as its distributed version control system. Using Git isn't required to write Chef code but it's required if you want to share your code with the rest of the Chef community. We should make clear that we're going to be talking about both Git and GitHub in this module. Git is a version control system that allows you to manage your source code. GitHub is a hosting service for Git repositories. You use these remote repositories for backup and collaboration.
In the next section we're going to help you get started with Git and GitHub by exploring their core concepts and explaining some of the commonly used commands.
This module only explores a small subset of the concepts and commands you'll use as you gain experience. To learn more, read the documentation at https://git-scm.com/doc
The first step to using Git for version control is to include your current projects in a Git repository, often called a 'repo' for short.
Within the parent directory of your project, run the
git init command to initialize it as a Git repository. This adds a
.git directory to that parent directory. This directory stores the history of the files and the changes that you commit.
If a project is already a Git repository then Git ignores the
You'll often want to know the current status of a Git repository. Running
git status tells you which files are or are not currently stored in the repository. Files not in the repository are called untracked. Initially, all the files in a repository are untracked.
To start tracking a file you add it to a staging area with the
git add FILEPATH command. The staging area is a temporary location that allows you to collect all your changes before you're ready to commit the work.
When you ask again for the current state of the repository with
git status, the output shows the untracked files alongside the files currently in the staging area. To save the changes in the staging area, use the
git commit command. This launches your default text editor, which allows you to write a commit message that is associated with those changes.
Once the commit message is saved and you exit the editor, the changed files are copied locally to the
.git directory. You can review the history of the commits you made with the
git log command. Each commit in the log shows a commit ID, the author, the date and the commit message.
When you make changes to files that you've previously committed, you'll notice that these files will be reported as modified. You can compare the current state of the files and the previously saved state by running the
git diff command. Every new, deleted, or modified line is displayed.
When you are satisfied with the new changes, you again add these files to the staging area (
git add FILEPATH) and then commit them (
git commit). A new entry will appear in the history that you can view (
All the changes that you've committed are stored locally in the repository on your workstation so they're still private. To share this repository and all the changes that you've made requires you to set up a remote repository. Initially, a local repository has no remote repositories. You can see if there are any remote repositories with the
git remote command.
There are many ways to create a remote repository. A common way is to create a new repository in GitHub. This page is an interactive form that creates the remote repository for you. You only need to supply a name. By convention, the name of the primary remote location is origin but you can use whatever name that makes the most sense to you.
Now, from within your local git repository, you need to set up the remote repository through the
git remote add REMOTE_NAME CONNECTION_INFORMATION command. You must provide the name of the remote and the connection information. The connection information is the remote URL, such as
https://github.com/user/repo.git. Here's an example of the command:
git remote add origin https://github.com/user/repo.git.
After the remote repository is set up you can verify that the remote repository is present by running the
git remote command again.
You can now start sharing your work by sending your local, committed changes to the remote repository with
git push REMOTE_NAME BRANCH_NAME. The name of the remote is the same name you used when you configured it. By default, the branch on your local repository, where you've done your work, is called master. Assuming that everything is the default, the command is
git push origin master.
There's a discussion about branches later in this module.
When you push to GitHub you may be prompted to enter your GitHub username and password. This ensures that you are the owner of the repository and have permission to push to it. If you don't want to be prompted for your credentials on every push, you can give GitHub the public part of your SSH keypair. The Git command line tool will use the private part of your SSH key pair to negotiate your identity.
In this section, you initialized a Git repository, added files to the staging area, committed them, viewed your history of commits, created a remote repository, and then pushed your changes. These few commands make up the core of the Git workflow but it's also important to understand how changes are merged with one another.
Merging with Git
Once you've pushed your changes to a remote repository, you've made them available to other team members. As we said, one of the big advantages of remote repositories is that it makes it easy for people to collaborate. This means that not only do you make changes available to other people but those people are also making changes that they want you to use. Incorporating the work other people have done with your own work is called merging.
Merging new commits from a remote repository
Let's begin with a common scenario. Now that your work is on the remote repository, other people have started making their own contributions. You're interested in seeing what they've done.
You navigate to your project directory and look at your local commit history with
git log. The log, of course, only shows commits made by you. This is because your local copy of the git repository is in the same state that you left it after you did a
git push. To get the changes other people have done, you have to fetch them from the remote repository and merge them.
First, you make sure that you're working with the correct remote repository with the
git remote -v command. The
-v flag means that you'll see the repository where you fetch other people's changes and push your own. To retrieve all the changes from a remote repository, you execute the
git fetch REMOTE_NAME BRANCH_NAME. If you're using the default names, the command is
git fetch origin master. Fetching from the remote repository retrieves the work and stores it locally within the
These changes are now in your local repository but still haven't been merged. Merging the changes means that the history of the changes from your commits is going to merge with the history of the remote branch. Merging is usually straight-forward when the changes in the remote repository happen after the changes that you pushed.
To merge the changes, use the
git merge BRANCH_NAME command, where the branch name is the name of the branch on the remote repository. The name of your local branch, by default, is 'master' as is the name of the remote branch. To differentiate between them, in the command, you preface the name of the branch with the repository name. In this case, you want to merge the 'origin/master' branch into your own 'master' branch. So the command is
git merge origin/master.
If the merge completes successfully, the commit history is updated. Use
git log to show the new commits.
Fetching and merging happen together so often that the
git pull REMOTE_NAME BRANCH_NAME command is a short cut that performs a fetch and then a merge. However, some people prefer using the separate commands so that they can examine the changes after they fetch them and before they merge them.
Handling merge conflicts
In another scenario, you push your commits to the remote repository, inform your teammates, and then head off for an international flight. During the flight, your teammates develop a few new features and fix an issue they find. They push their own changes up to the remote repository. Meanwhile, you continue working and committing your changes locally.
When you finally get a chance to connect to the Internet, you push your commits to the remote repository. Instead of everything going smoothly, you get an error message that informs you that the repository cannot accept the commits.
Errors like this are common when multiple people can access a repository. The history of commits on the remote repository is not the same as the history of commits on your local repository. Pushing your changes can cause errors when there is some conflict between what you've done and what your teammates have done. If the remote repository doesn't know how to merge the changes for you, you have to resolve the conflicts yourself. There are two ways to solve the problem.
The first approach is to use the same merge process that you saw in the previous example. From your local repository fetch and merge these changes (or do a pull). However, when you do this, Git has to weave together the history of commits that you wrote with those of your teammates. This usually means that Git prompts you to write a new commit that describes the merger of these two histories.
The second approach is to use a process called rebasing. Rebasing means that your local commits, which are different from the remote commits, are taken out of your local commit history, the remote commits are merged in and then your local commits are reapplied. When you review the commit history, you'll see that all of your commits appear as the most recent ones.
When should you choose
rebase? Merging maintains the commit history and uses a commit to merge the existing commits with your commits. Using a merge means that you want to ensure that the commits remain in a specific order in the commit history.
When should you choose
merge? Rebasing moves your commits and recreates them. This means that the code will be the same as if you merged it but it saves you an additional commit. A rebase maintains a linear project history.
One of the most common cases of a merge conflict is when two changes affect the same file on the same lines. Here's an example.
Here are lines that are either unchanged from the common ancestor, or cleanly resolved because only one side changed. <<<<<<< yours:sample.txt Conflict resolution is hard; let's go shopping. ======= Git makes conflict resolution easy. >>>>>>> theirs:sample.txt And here is another line that is cleanly resolved or unmodified.
Your local changes are contained between the '<<<<<<<' and the '======='. The changes that are being merged appear after the '=======' until the '>>>>>>>'. To resolve the conflict you need to choose the result that's correct. You might have to talk to your teammate to decide what's best. You can:
- choose all of your changes.
- choose all of their changes
- choose both sets of changes.
Once you know what to do, it's possible to open the raw file in your text editor and fix it but many people prefer to use a merge tool, which can make the job much easier.
Branching with Git
Up until now, we've talked about writing a single commit history into a single branch. This branch's default name is master. The problem is that, when everyone works on the same branch, you have to fetch and merge and, probably, resolve conflicts fairly often. This is because this one branch maintains the entire commit history. Using a single branch can also cause problems if you want to use the remote repository as a backup so you push work that's not ready for other people to see.
To solve these problems, Git lets you create additional branches so that everyone can develop their solutions on branches with isolated commit histories. These branches can be merged, as we saw in the last section, whenever the work is ready to be shared.
By default you start with a branch called master. You see the name of this branch when you run the
git status command. To create a new branch, use
git branch BRANCH_NAME. The name can be anything that makes sense to you.
You can see all local branches with
git branch. To switch over to a particular branch, use
git checkout BRANCH_NAME. A branch's commit history begins when it's created.
Files you create, change, and delete on this branch are committed just as you did with the master branch. These new commits appear in the commit history of the new branch. When you want to push these changes to the remote repository, use
git push REMOTE_NAME BRANCH_NAME. The branch name is the name of your local branch. The command automatically creates a new branch on the remote repository and pushes all the new commits.
Eventually, when you're ready to share your work, you will merge these changes back to the shared branch, which is often the master branch. First, you checkout the master branch with
git checkout master. You then merge the changes with
git merge BRANCH_NAME.
Services such as GitHub allow you to create pull requests from one branch to another. To set up a pull request through GitHub you first create the new branch, make your commits, and then push the branch to the repository on GitHub. You can use the GitHub web interface to initiate a pull request from your branch to the shared branch. The interface lets you write a summary of the changes, notes, and other thoughts. Your teammates can then review the differences, comment, and merge the changes.