How Wired gets it wrong on the problem of code forking
February 21st, 2012 by ravi

GitHub is for very good reasons immensely popular these days. So it is understandable that Wired decides to shine a light on the service, but lamentable that they chose to do so under the link bait headline “How GitHub Tamed Free Software“, because it is arguable that Free Software is in need of taming and  even more tendentious that Git or GitHub is the solution for this imaginary problem (interestingly, Wired’s thesis is the converse of that of Adam Martin — to wit, GitHub is killing Open Source! — discussed earlier on this blog). Let us dig in.

First, the problem as laid out by Wired using as example of the large number of Linux patches received by Linus Torvalds that withered away in his Inbox:

This was the dirty little secret of open-source software. With the average free software project, large amounts of code — maybe even most code — never actually got used. It was often just too hard for casual users to show developers the changes they’d made and then easily merge those changes back into the open-source code base.

True. The core contributors, often a very small group, have little time to wade through all proposed patches. They have neither the time nor often the inclination. Poring through other people’s code is annoying, especially, I am guessing, when you are an ace coder yourself and could be solving more interesting problems. Visit Mozilla’s Bugzilla bug database for a sampling of the number of bugs with posted patches that those with approval powers have flat out ignored.

How to solve this?

Wired would like to make the case that you do so by inventing a new version control system. In doing so, they cast the problem as a technological one. Here’s the argument:

So in 2005, Torvalds created Git, version control software specifically designed to take away the busywork of managing a software project. Using Git, anybody can tinker with their own version of Linux — or indeed any software project — and then, with a push of a button, share those changes with Torvalds or anyone else. There is no gatekeeper. In practical terms, Torvalds created a tool that makes it easy for someone to create an alternative to his Linux project. In technical terms, that’s called a “fork”.

Back in the 1990s, forking was supposed to be a bad thing. It’s what created all of those competing, incompatible versions of Unix. For a while, there was a big fear that someone would somehow create their own fork of Linux, a version of the operating system that wouldn’t run the same programs or work in the same way. But in the Git world, forking is good. The trick was to make sure the improvements people worked out could be shared back with the community. It’s better to let people fork a project and tinker away with their own changes, than to shut them out altogether by only letting a few trusted authorities touch the code.

On a rare sunny February day in Portland, Torvalds demonstrates Git for Wired at his home office. With a few keystrokes, he quickly spots two new kernel submissions that change the same kernel code in different ways, a potential problem source.

The old regime “makes it very hard to start radical new branches because you generally need to convince the people involved in the status quo up-front about their need to support that radical branch,” Torvalds says. “In contrast, Git makes it easy to just ‘do it’ without asking for permission, and then come back later and show the end result off — telling people ‘look what I did, and I have the numbers to show that my approach is much better.’”

[emphasis added]

This is poorly reasoned. Here’s why.

Linus is talking about a particular SCM / release engineering process. If you are using centralised version control (CVS, Perforce, SVN, so on) anyone can still pull the sources into their own private space i.e., people can create their “own private version” of the codebase. They can then edit that code to their heart’s content, thus effectively forking the project/codebase. Nobody is shut out. Anyone can touch the code, not just trusted authorities. That’s the very idea behind Free Software.

Two problems arise with managing such a fork. First there is the issue of maintaining, i.e., version control for the forked code. Centralised VCS typically facilitate large code changes through the process of creating “branches” (hence Linus’s reference to “radical new branches”). But these branches reside on the server, to which the underprivileged new coder has no access. So she adds chunks of code over potentially large periods of time in the obscurity of her private view of the codebase. Which violates all sorts of best practices. And then, second, when she ventures to submit these changes for inclusion, the process can be a tedious one involving manual examination and application of patches by [a] gatekeeper(s) — the issue highlighted by Wired, and in a narrower sense by Linus.

Git solves the problem of managing development within a fork: the forked code in the private space of the new developer is still under the purview of Git. Git also takes a shot at making the merging of such forks easier. GitHub, as advertised, makes that process even easier.

The flaw is the assumption that the toughest problem with code forking is the technicalities of merging. If only, the argument goes, we could take the pain out of sending patches around in email or in attachments in Bugzilla… if only we could make forking and merging to and from private spaces a built-in feature of the version control system… then the problem of the “Trusted Gatekeeper model of open source” would go away. This, I submit, is wishful thinking. For one thing, the Trusted Gatekeeper is not going anywhere (try sneaking your changes into the Linux kernel without the approval of Linus or his lieutenants!).

No doubt Linus finds it easier to pull in changes with Git than before Git. The question is whether this is the significant pain. Linus reasons that with the Git model, the new developer can back up his changes by saying ‘look what I did, and I have the numbers to show that my approach is much better’. However, this was already possible with any decent Free Software project with well-defined test cases and metrics. The new developer could fork the code in private, build a new version of the software, run the test suite and advertise the improvement. Even in a pre-Git environment. In fact, to my knowledge, Git does not even address such interaction.

The significant pain in the “Trusted Gatekeeper model” is, I think, the fact that trusted gatekeepers are an unfortunate necessity to maintain any codebase and poring through hundreds of lines of patch code and/or maintaining a test suite of sufficient coverage to be a substitute or support such review likely lies in the NP part of any Complexity Venn Diagram. And it is a pain that all parties therefore will try to avoid. Hence Adam Martin’s contrary fear that GitHub is in fact making matters worse by over-enabling forking. Because merging will never be easy. And if I am right, then the progression needs reversal! Perhaps the Free Software model can tame the “I can ‘just do it’ without asking for ‘permission'” jungle?

One Response  
Leave a Reply

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

If you need help with one of my software projects, please click: Using that link, you might even find a previous report of (and solution for) the issue!
»  Substance: WordPress  »  Style: Ahren Ahimsa