Tuesday, March 8, 2011

Why Mercurial’s Revision Storage Model Is Superieor To Git’s

Git and Mercurial are both DVCS inspired by Monotone. They have a lot in common, but still differ a lot. Mercurial has named branches, Git has index, etc. But there’s one big difference in the implementation layer which is barely visible to end users, but still a big one: the way revisions are stored.

Git’s model is based on four objects: blob, tree, commit and tag. You commit object is basically a full state of your project tree at commit time: it contains references to blob objects which in turn contain corresponding versions of your files. It also references parent commits. This means that to build log of a particular file, git must scan through all commits to the project to find ones where the referenced blob for the file in question changes. Basically, git doesn’t have per-file history at all and has to calculate it. Linus is pretty religious about this and considers per-file history a bug: http://permalink.gmane.org/gmane.comp.version-control.git/39358.

Meanwhile, Mercurial stores history of each file separately in so-called revlog, so to read all changes to a particular file one must make one continuous read from the revlog. However, this architecture makes building log for entire project more expensive than git’s.

Still I argue that Mercurial’s approach is much better for modern software projects with lots of refactorings: the code you’re interested in can travel freely across the repository. At first glance it might seem that this makes git’s model a better choice, but I don’t think so.

Imagine that you don’t know beforehand where the code your’re looking for lies (e.g. the function was here at previous commit, but now it’s not). In this case you’ll have to scan your repository to look for that code every time you build a log for it. A better approach would be to build an index: store a versioned file which tells you in which file particular classes and functions reside. This kind of information is kept by any modern IDE, so you only need to store it and make sure it fully reflects the committed state of repository.

If you have such file under your version control, Mercurial’s seeming weakness becomes a strength: at first pass you read revlog for the index file to determine in which file your code is located, second read to the found file’s revlog gets you all the details. Bingo!

Monday, January 31, 2011

Promising and committing

Commit messages suck. If you don’t trust me, enter “svn/git/hg log” in your command line and see yourself. I’m personally tired of seeing endless “Fixes”, “Tests”, etc. in TeamCity build history. Nonetheless, when Mercurial asks me what the hell I was doing for the last hour before it agrees to share my changes with the team, I stare at the text field for couple seconds and, honestly, sometimes end with another “Fixes” poo.

Commit messages should generally follow the same guidelines as status reports at your daily scrum. They must be short, clear and (sometimes) inspiring. This is often not the case because of the huge stream of bugs, change requests, code smells, etc. pumping into your brain at incredible pace.

Experienced developers know that microcommits work best: a small change gets meaningful description easier. The problem is that controlling yourself is hard: when you start hacking your code you normally have a very broad ‘make this stuff even better’ vision in your head. And only in half an hour or more the system asks you to give a name tag to your actions.

Wouldn’t it be better if you could give your commit message before making change? You open up your Twitter favoride IDE and see a nice ‘What’s happening:’ ‘What are you doing:’ text field. You browse your issue tracker, take a look at the scrum board, reflect for a minute and write ‘Implement masterpiece wizard in Photoshop’. And this text is up there at the top of your code editor always reminding of what you’re doing. Now, whenever you switch to something else, you finish this thing first, commit and then change the message to start a new thing.

Thursday, January 27, 2011

LOD in Scrum Product Backlogs

We have recently started using Scrum in our teams and one of the things that immediately changed was the way we store features. We used to keep them along with issues in the bug tracking software (YouTrack in our case), but this doesn’t work well with Scrum as we need to arbitrarily order them (three priorities to sort by are just not enough).

So we switched to Checkvist (I should dedicate a separate post for this awesome piece of software) which is basically a very usable outliniing tool.

First we struggled with lack of structure in simple hierarchy of text labels: how to plan sprints and versions, how to group issues by user scanerios, etc.

Over the course of several weeks I came up with the following system which you might find helpful. It is based on LOD (Level-Of-Detail) approach used in computer games to determine how much details to put into rendering objects depending on distance to them: the monster you’re standing next to is very beautifully rendered with hi-res textures, bump-mapping and lots of vertices in the mesh. Trees and a church on the horizon are rendered with little details and smaller texture so you don’t waste CPU cycles (if you’re killed by the monster, you won’t reach the church, so why spend resources on it now).

The backlog is divided into five sections: INCOMING, SCENARIOS, DEFINED, PLANNED FOR SPRINT and DONE.

All incoming requests are put at the end of the INCOMING section. These are very basic requests from team members and end-users. If the request was submitted through the issue tracker, I attach a link to the request. I use tags in this area to mark requests with areas of interest: #performance, #overview, #code-intelligence, etc.

The next section is where I create end user scenarios to implement: these are combinations of improvements which enable users to more cool stuff in a specific area. In these stories I group requests from the INCOMING section and add missing pieces. An example of story would be: “Support AppEngine” or “Taggable work items”.

The next section, DEFINED, is where I get very specific about how to implement particular story: all items which are in DEFINED section have good HOWTO-DEMO descriptions. This section is where team members should look quite often to analyze if something is missing or should be done differently. This is where ideas turn into actions. This is the place where we take items from for planning, so items in this section are ordered by priority.

The rest two sections: PLANNED FOR SPRINT and DONE are easy: after planning meeting all items planned for the upcoming sprint are moved from DEFINED to PLANNED FOR SPRINT. All items which are completed as part of the sprint come back to DEFINED.

Pretty simple and easy to understand. So far works very well for us.