Last Updated: 08/16/2010 10:33:00 PM

GIT: Ruling the Tri-State Area!

GIT: Ruling the Tri-State Area!

    Dr. Doofenshmirtz: [showing an acrostic poster] Anyway, today is the day we reveal to the Tri-State Area the existence of the

                                      "League Of Villainous Evildoers Maniacally United For Frightening Investments in Naughtiness!"
    Dr. Bloodpudding: You want us to be called "L.O.V.E.M.U.F.F.I.N."
    Dr. Doofenshmirtz: Oh, good grief. It doesn't matter what we're called. What's important is that we get our evil message out to the people.

Please don't think of GIT as some sort of event gateway that just watches what happens to your files.  I did and it really set me back for a while.  

GIT is at its core a content-addressable file system that has a version control system laid on top if it.  Pre GIT version 1.5 it really emphasized its file system nature more so than it's version control nature, people who used GIT back in those days complained that the Version Control user interface was to difficult to use.  That is a nasty (but untrue) rumor that still floats around today.

Awesome, so I just said some things that sound very important, but what the heck does it mean in practical terms. Content Addressable file system means that GIT cares about what is IN the file, more than the file itself.  Just like any emo-guy tells the girl he is interested in: "It is what's inside that counts", but GIT means it.

When you first clone a repository or create a new one and then add/commit files to it, the status of all those files is Committed and Tracked.  What does that mean? It means that GIT as a file system knows the content of every single file in the repository.  GIT also creates a SHA-1 Key as a unique shorthand way to refer to specific content in the file - not the file name - but the actual content of the file.  Change one space or letter in that file and the SHA-1 Key will change too, GIT will know that it is different content.

For instance the current README.txt file  that I got off from the FW/1 project has the following SHA-1 key (in green)

$ git ls-files -s README.txt
100644 228a04c2a987e40e2c167ba40df6e87d1acc26f3 0       README.txt

Every single FW1 README.txt file that is pulled down from gitHub with that same content has the same hash key.   I can change the name of the file and it won't make one difference to the key.  I will change the name of the file to foo.txt, then add/committ it back to GIT and the ask for the key:

$ git ls-files -s foo.txt
100644 228a04c2a987e40e2c167ba40df6e87d1acc26f3 0       foo.txt

Same key. Why? It is the same content. 

Now I will open open foo.txt and add one teeny, tiny little space character.

$ git ls-files -s foo.txt
100644 d2e259e8e28b787fbe0a2e45eb6ea1afe61fdd1d 0       foo.txt

Wow! What a difference a space makes, that key above is TOTALY different from the one prior to it.  All that from one little, bitty space added.

Ok, so why am I telling you all this? I introduced the article saying that I would talk about  modified, staged, and committed files.  I also just showed you how GIT only cares about the inside of your file and then gives you a SHA-1 key based on the content.  Ask yourself, how would I know if I were a GIT (careful) when to spit out the key? When a character is added? When a sentence is added? When so many KB of data has changed? Maybe when the file is saved? So would you say when the file is saved, hmm?  But think about it, if you are a smart developer you should be saving often. I know you are smart, so yes, yes, you are saving often.  Do you really need to track the content difference between save number 20 and save number 50?  Probably not.  See how hard it is to know when to take the snapshot? That is why Linus Torvald put that power in your hands.  You need to stage and commit the files yourself.

Tristate Area: Modified

When you do change a file, GIT is not ignorant of it, it sees it as "modified." Test this.  I changed the foo.txt file back to README.txt and added a some text to it. In my case I put "FW/1 is just one file!!" Save file.

Now use the git status command.  It will inform you of any changes made to any files in your work area. Screen shot below:

Notice, GIT knows that README.txt is modified. However behind the scenes the SHA-1 key still hasn't changed.  This file is now considered in the the Modified State.  You can continue to save to this file forever and it will forever be modified until something else is done.  Right now in this state if someone clones this repository they are NOT going to see the "FW/1 is just one file!!" comment.

What about adding a new file?  Would GIT know about it? Let us try!

In the FW1 folder type: git ls-files

You should see a list of ALL the files the GIT repository knows about.  For me the last file in the listing is: skeleton/views/main/error.cfm

So lets add a new file in that folder named error2.cfm and save it.

Then type: git ls-files 

error2.cfm is NOT there! GIT doesn't have it in the repository yet.  But GIT does know about it

So GIT has flagged our error2.cfm file as "Untracked."  Untracked means exactly what it sounds like, GIT is not tracking the contents of this newly added file.  So lets get on with figuring out how to put our Modified README.txt and our Untracked error2.cfm into the repository.

Tristate Area: Staged

Full disclosure: This is one one part of the "Tri-state Area" in which you may not spend much time.  In fact most people use commands or scripts to completely by-pass the staged area.  But it won't kill you to learn a little bit about it will you?

Think of the staging a bit like a loading dock.  You have a pallet that can fit 25 boxes, so you go in the warehouse get the boxes you need, arrange them on the pallet and wait for the truck.

In this illustration the boxes are your files, the dock the staging area, and the truck the repository.  So lets think about it now in terms of workflow.  You may have a ticketing system that you use to tell you what to work on for the day, you GIT work flow may be something like this:

  • Get Ticket SUP 54342
  • Open the code in your editor, modify 3 files and add 1 new file.
  • You test your work, unless your code is always perfect.
  • Everything looks good, you stage the files by doing $ git add .    (the . after the add indicates you want to stage any modified and untracked file)
  • Now put it in your local repository, with the ticket number as a commit comment: $ git commit -m "SUP 54342"
  • You can then use the push command to put it on a remote repository, maybe the shared development server so the QA folks can review it.

Now let's throw a common, kink in the link.  You are gladly working on SUP 54342, when your manager comes in and says, "Hey I got a severity one issue, ticket SUP 9566,I need you to stop what you are doing and fix it."  It is in the same project that you are working on, so you start modifying files relating to this new issue.  You find the issue and fix it, now when you do $ git status you have files for the first ticket and second ticket marked as modified, but you don't want to commit them together because they really are two different issues.

Instead of using $ git add .  you can specify each file to stage and then commit it. So lets say the files for the critical issue are: join.cfm and merge.cfm  here is how you stage them and then commit them:

Notice above I used the "git add command" to add a file to add join.cfm and merge.cfm to the staging area and the committed them.  GIT returned that only 2 files changed, that I added 8 lines and deleted no lines.

But lets say that you accident staged everything?  Muscle memory can be a pain, and I got used to typing $ git add .     No worries!

 How to Unstage stage file

If you type: git status after having added files to the staging area, GIT is kind enough to tell you the command of how to unstage it (don't worry I didn't see it till AFTER having googled it, either.)

So my otherfile.cfm was not supposed to be in my staging area.  So typing: git reset HEAD skelton/view/main/ took it out of the staging area, then typing:  git status again shows that this file is untracked. I can safely commit and only the files that should be in this commit will be.


This last part of the Tri-State area is the easiest to sum up.  When a file is committed it is safely stored in your local repository.  With git you can be sure that what you saved, is what was committed, GIT uses the SHA-1 hash to ensure that the content you submitted is what is there, any disk corruption, memory corruption, you are going to know about it when you commit not days later when it is too late. If any file within the object database is corrupted by a disk error, then its hash will no longer match, alerting us to the problem. By hashing hashes of other objects, we maintain integrity at all levels.  Also note that at it's core, GIT is designed to be easy to ADD content to, but very hard to truly REMOVE content from.

Congrats!  By the transitive powers of GIT you now rule the Tristate area!

Next let's learn about remotes!