Friday, January 20, 2012

git-fix-whitespace series 1: Knowing about `git diff -p`

In the last post, the first requirement of the project ( has been settled. The next requirement of this project is to read the git-diff patch in order to find any line changes that violate the whitespace rules specified in git config.

In a typical git-diff patch (see below), there are a few major parts.
  1. Line 1 provides metadata about the modified file. It gives the file path to the file, rooted at the git repository.
  2. Line 2 provides metadata about the git index and the file object discretionary access control list.
  3. Line 3 and 4 provides metadata about which file path is old and which is new.
  4. Line 5, 14, and 24 are metadata that tells which part of the file is modified. Let's take an example to explain this - "@@ -33,8 +33,7 @@ def sanitize_diff(git_diff):". "-33,8" tells that there is a diff hunk with 8 lines starts at line 33 of the old version of the file. "+33,7" tells that there is a diff hunk with 7 lines starts at line 33 of the new version of the file. As a result, the new version of the file gets one line fewer than the old version.
  5. The rest of the patch is the content in the modified file. Those lines are either prefixed by ' ', '-', or '+'. ' ' means no modification, '-' means removed line, and '+' means added line. Note, there is no line replacement since it is represented by '-' lines followed by '+' lines (see line 28-31).
After knowing the structure of a git-diff patch, the next step is to read and write the modified file.

Sunday, January 15, 2012

git-fix-whitespace series 0: GitPython vs libgit2

This is the first post of the git-fix-whitespace series. In this series, I will put some notes about working on the project - (NOTE: This series is a by-product of the git-fix-whitespace project, since my blog need some updates :P)

First of all, let me introduce the rationale of working on this git-fix-whitespace project. Most of the time, as a Python developer, I hate tab indentation and trailing whitespaces (read pep8). I know there are existing tools to "proof-read" a file using certain whitespacing rules. However, I insist to create my own tool to achieve the goal. Reason: I just wanna have a pet project that I LOVE to keep working on.

The very first requirement of this project is to support the configuration directives of git (see `man git-config` and look for `core.whitespace`). Hence, I need to find a tool that can read the git configuration files (both the user level git config file ~/.gitconfig and repository level git config file /.git/config). By using Google, I got GitPython and libgit2/pygit2. At first, I try to read the libgit2 python binding to see if I can read whitespace configuration by simple api calls. But it seems that it did not support that yet (as of 2012-Jan-15). Then I move on to GitPython. Gotcha! There's simple api call to read the core.whitespace configurations :) As a result, git-fix-whitespace got a dependency on GitPython at the moment.
© 2009 Emptiness Blogging. All Rights Reserved | Powered by Blogger
Design by psdvibe | Bloggerized By LawnyDesignz