2. Why Git?
Linus Torvalds hate CVS & SVN
●
Speed
●
Strong support for non-linear development (thousands of parallel
branches)
●
Fully distributed
●
Data size
– SVN occupied 143MB of a 140MB project
●
Perl, Eclipse, Qt, Ruby on Rails, Android...
3. Key concept
●
Nearly every operation is local
●
Not by file name but the hash value
●
The three states
●
Snapshots, not differences
●
Branch is cheap
4. Nearly every operation is
local
●
After cloning a repository, you saved all of the history.
●
No network requirement.
– Except for: clone, pull, push, and fetch.
5. Not by file name but the hash
value
●
Git use four types of objects to store the whole information, and
each of these objects have an unique 20 bytes SHA-1 key to
identify it.
●
What if two identical files with inconsistent file names?
7. The three states
●
First, the files that were in the last codebase are called tracked.
If not, those are called untracked.
●
All of the tracked files can be divided into modified, staged,
and committed.
– Modified (Working directory): You have just changed a file without doing
anything to put it into Git database.
– Staged (index): You have put a modified file into Git cache area.
– Committed (objects): The data is safely stored in local Git database
after taking a snapshot.
●
What is committed is what is currently in the index, not what is in
your working directory.
8. The three states
Working Directory
Working DirectoryModified Staged Committed
Staging area
(index)
Git repository
(objects)
git add
git commit
git checkout
9. Snapshots, not differences
● Opposite to other VCS, Git is more like a mini file system.
File 2
File 3
File N
File 2
File 3
File N
File N-1
File 4
File 1
File 2
File N-1
Time Line
One snapshot All of the
tracked files
10. Branch is cheap
●
Git is an addressable file system, and branch is a pointer.
– Create a branch is just as storing a 20 bytes file.
●
Not trace the file, Git trace the commit.
●
HEAD file pointing to the branch you’re on.
11. In a common case
An engineer is doing his normal job,
– Work on a project.
– Create a branch in the current codebase he works on.
– Work on this branch to implement a new feature.
At this moment, he receives a phone call from a customer in
mad who asks him to fix a terrible issue,
– Revert back to original production branch.
– Create a branch to add the fix.
– After the solution tested OK, merge the fix back.
– Switch back to the branch he worked at first.
– Merge new feature to production branch.
12. Get a repository
●
First of all, you should get a repository.
$ git init
• Create an empty repository
in your working folder.
• After the first commit, git start
to track files.
$ git clone [url]
• Establish a working folder
and create a .git/ inside.
• git pull the whole
history data from sever.
• git checkout the
newest code to your
workspace.
13. Add some modifications
●
Modified your files, then you can use git add to stage files.
– git add -A: for any tracked or untracked
– git add -u: update tracked files
– git add -i: interactive select
– Git also handle binary.
– Flexible: .gitignore
●
In git root, in directory of a project, or commit it.
14. Commit change
●
After doing commit, the change you make is safe.
– If go wrong, please use $ git commit –amend
– It is not a good habit to use $ git commit -a
workspace
stage area HEAD branch
git diff git diff HEAD
git diff --cached
The better way is,
Be used to $ git add after making any change.
If the work is done,
- $ git diff to check missed
- $ git diff -–cached to check commit
- $ git commit -m
15. Step One, create branch
●
Create a branch to work on new feature,
C0 C1 C2
master
new
$ git checkout -b new
It's the shorthand of:
$ git branch new
$ git checkout new
16. Step Two, commit something
●
Commit something on this new feature,
C1 C2 C3
master
new
After the completion of some functions,
he like to do one commit,
$ git commit -a -m ’add a new api’
It's the shorthand of:
$ git add -A
$ git commit -m ’add a new api’
C0
17. Step Three, receive an urgent
issue
●
Revert to production version, and create a branch for this urgent
issue,
C1 C2
C3
master
new
For working on issue, you have to save
current update, and rollback to stable
production branch.
$ git checkout master
and then create a new branch:
$ git checkout -b ’issue’
after fix it,
$ git commit -a -m ’fixed the issue’
C4
issue
18. Step Four, test fix OK
●
After passing test, merge the fix to master branch,
C1 C2
C3
master
new
After the solution tested OK, merge it
back to original master branch,
$ git checkout master
$ git merge issue
Merge with directly upstream is called
fast forward.
C4
issue
19. Step Five, finish the feature
● Delete the issue branch and switch back to the branch of new
feature,
C1 C2
C3
new
Delete existed branch,
$ git branch -d issue
Switch back to work-in-progress
branch, and finish it.
$ git checkout new
$ git commit -a -m ’finish it’
C4
master
C5
20. The whole commit steps
C0 C1 C2 C4
C3 C5
C6
Checkout C3 and
merge C5
After doing:
$ git checkout master
$ git merge new
master
new
21. Undo
●
Git provides some of the mechanisms for developers to regret
their mind.
– The latest commits are no longer needed.
– A specific commit is better to be rolled back.
– If you like to undo your modifications or give up the data in
stage area.
●
Reset, revert, and checkout is easy to be misused.
22. Undo – use Reset
●
Reset
– Let developer able to reset commit status, stage area, or workspace.
● $ git reset --soft HEAD~N
– Reset commit status to the latest Nth commit without changing any files.
● $ git reset HEAD~N
– Reset commit status with undo git add command.
ex. $ git reset HEAD play.c
● $ git reset --hard HEAD~N
– Not only commits, but also stage area and files.
objects
Working
directory
index
master HEAD
soft
none
hard
23. Undo – use Checkout
●
Checkout
– Move the HEAD pointer and checkout code.
● $ git checkout [file]
– Checkout staged file to cover the real one.
● $ git checkout [branch_name]
– Use the specific commit version to clean stage area and workspace.
– 'reset' changes the SHA-1 key of branch, but 'checkout' just moves the
HEAD.
24. Undo – use Revert
●
Revert
– Rollback files with creating a rollback commit.
– Reset is back and revert is forward.
● $ git revert HEAD~N
– Create a new commit which revert the latest Nth commit.
● $ git revert SHA-1
25. Rescue mechanism
●
Git store every move of HEAD, so don't worry.
– $ git reflog show master
– $ git reset -- hard master@{N}
●
But there are still some dangerous events, do not easily use it.
– $ git reset -hard
– $ git checkout HEAD
Branch is cheap!
26. Git communication
Local
Repository
Remote
name:
v_a
Remote
name:
v_b
Remote
name:
v_c
Remote
Repository
git://a...
Remote
Repository
git://b...
Remote
Repository
git://c...
push pull push pull push pull
add remote
If a local repository is exist, you can add
a remote (identify remote name and
remote repository url) to git pull or git fetch
data from remote repository, and use
git push to put your contribution on it.
If a local repository is exist, you can add
a remote (identify remote name and
remote repository url) to git pull or git fetch
data from remote repository, and use
git push to put your contribution on it.
What you obtained is the latest branch,
and the whole history.
What you obtained is the latest branch,
and the whole history.
Remote name is used to identify the project
resource, but tag and branch are used to
identify the timing of the project snapshot.
Remote name is used to identify the project
resource, but tag and branch are used to
identify the timing of the project snapshot.
27. Look inside .git
● Git --- The stupid content tracker
– Use compressed object which named as SHA-1 to store everything.
– objects: Full objects (commits, trees, blobs, tags).
– refs: Pointers to all of the branches and tags.
– logs: A history of where your branches have been.
– Current pointers
28. Object folder
●
objects: stores all of the commit, tag, tree, and blob objects.
00/
6d/
9b/
ac/
b0/
Info/
pack/
Loose objects
Store the files named like:
a9dca9a0fe0c031c996d308ab8a781ab7f358f
which store the objects compressed
by zlib.
Packed objects
Store the files named like:
pack-a9dca9a0fe0c031c996d308ab8a781ab7f358f.ixd
pack-a9dca9a0fe0c031c996d308ab8a781ab7f358f.pack
.pack: The contents of all the objects that
were removed from early loose objects.
.idx: Offsets into the pack file.
Totally 19 bytes
29. Refs folder
●
refs: stores all of the pointers.
head/
remotes/
tag/
After creating a tag, a file named as tag is created
here, and the content is the SHA-1 which tag point to.
Otherwise, a tag object also created in the object folder.
Each folders store the objects fetched from remote branch.
Stores files named as each branch, and the
contents are the SHA-1 which branch point to.
31. Current status
● HEAD: points to the current active branch.
● ORIG_HEAD: stores the previous HEAD before doing git pull, git merge.
– $ git reset --hard ORIG_HEAD
● FETCH_HEAD: record the branch you fetched.
● index: stores staged data.
– The next proposed commit snapshot.
32. Add and commit in low level
$ git add
● Updates the index
– Write to compressed file .git/index
$ git commit
● Stores blobs for the changed files
– Add a loose file to .git/object/
● Writes out tree objects (.git/object)
● Writes commit objects that reference the top level tree
● Modified HEAD and branch pointer (.git/refs/heads & .git/logs/refs/heads)
● Store commit msg (.git/COMMIT_EDITMSG)
33. Compete it with SVN again
●
Centralized vs Distributed
– SVN is one repo and lots of clients. GIT is a repo with lots of client repos.
– Checkout working copy vs whole repository.
●
Serial number or Lots of branch
– Corporate work or distributed version control?
●
Consistency vs Flexibility
– SVN makes everyone working on the same thing.
●
Ref: Please Stop Bugging Linus Torvalds About Subversion
34. Repo
● There are over 160 projects involved in Android source.
● repo init - to set up clone script
● repo sync
● repo start - create local branch
● repo upload
The definition of open:
"mkdir android ; cd android ; repo init -u
git://android.git.kernel.org/platform/manifest.git ; repo sync ; make"
Andy Rubin
repo init -u ssh://.../manifest.git -b xxx -m android.xml
- Verify your SSH public key
- Get repo
- Get a manifest.xml
- Clone the projects listed on manifest.xml
- Identify specific project: repo sync kernel/linux
- Transmit branch to Gerrit over an SSH connection
- Gerrit reviews each commits. It is better to run git rebase -i before repo upload.
35. Resource
●
Install git
– Cygwin or MsysGit
●
Get repo
– Curl -l -k http://android.git.kernel.org/repo
●
Reference
– Official: http://git-scm.com/
– Git Reference: http://gitref.org/
– Pro Git: http://progit.org/book/
– Repo: http://source.android.com/source/using-repo.html#init
– Get project to practice: http://www.kernel.org/pub/
– How Linus Torvalds talk about GIT: http://www.youtube.com/watch?v=4XpnKHJAok8
Linus Torvalds: linux kernel project 發起人
速度受限於網路,
Linus Torvalds use diff patch an tar to maintain Linux at first
Then he use BitKeeper -everyone has a reposity
其實需求是源自於一種對大型開源碼專案的需求
Fully distributed – 為了要實現開發者local端的自我版本管理
143MB的metadate還不包含歷史改變資料
幾乎所有指令執行都不依賴網路
commit object is simple: it specifies the top-level tree for the snapshot (把某一特定時間點的根目錄資訊儲存下來)
A "tag" is a way to mark a specific commit as special in some way. Tag無法被改變, 能對所有種類的object作標記
SHA-1就是ADDRESS, BRANCH就是POINTER,把POINTER移到哪,看到的東西就是蛇麼
GIT適用BRANCH來TRACE COMMIT的歷史資料
The HEAD ref is special in that it actually points to another ref. It is a pointer to the currently active branch.
All your history is stored in the Git Directory; the working directory is simply a temporary checkout place where you can modify the files until your next commit.
Creating a new branch is as quick and simple as writing 41 bytes to a file (40 characters
and a newline). 放在.git/refs/heads裡
Git的用意是讓使用者在local端方便的開發程式, 並把自己的貢獻開放出來, or push到主要的server, 對git來說, user local control也是同等的重要
Svn則是一個嚴謹的版本控管, 並不care開發者在local端的改變, 重要的是紀錄server上的code是如何改變的
Git允許較多開發者的想法, 而svn是力求簡單一制
在svn做的改變跟每一個人都有關, 而git只與有ref到你的人有關