Effects of Centralized and Distributed Version Control on Commit Granularity
by Jochen Wuttke, Ivan Beschastnikh, Yuriy Brun
Abstract:

Version control systems are critical for coordinating work in large software engineering teams. Recently, distributed version control (DVC) systems have become popular, as they have many advantages over their centralized (CVC) counterparts. DVC allows for more frequent commits, and simplifies branching and merging. These features encourage developers to make smaller, finer-grained commits that do not interleave changes related to different development tasks. Such commits improve accountability and ease certain tasks, such as reverting changes that later cause problems.

DVC systems are also better suited for repository mining techniques, making available more useful information about the development process. For example, approaches that infer collaboration patterns can benefit from the more detailed attribution of data in DVC. This can be used by an integration server to send email about failed test cases to just the subset of developers who authored the relevant code. DVC may also lead to smaller and more focused commits, which could benefit mining techniques that identify changes relevant to specific development tasks, such as refactorings.

However, to date, there has been no explicit evaluation of the practical differences in mining DVC over CVC, though some work acknowledges that developers might use DVC and CVC differently. We report on such an evaluation with one counterintuitive finding that raises doubts about certain DVC promises and opens research questions into what causes DVC and CVC differences. Further, our finding indicates that repository type should be controlled for in repository mining experiments.

Citation:
Jochen Wuttke, Ivan Beschastnikh, and Yuriy Brun, Effects of Centralized and Distributed Version Control on Commit Granularity, Tiny Transactions on Computer Science, vol. 1, September 2012.
Bibtex:
@article{Wuttke12tinytocs,
  author = {Jochen Wuttke and Ivan Beschastnikh and Yuriy Brun},
  title =
  {\href{http://people.cs.umass.edu/brun/pubs/pubs/Wuttke12tinytocs.pdf}{Effects of
  Centralized and Distributed Version Control on Commit Granularity}},
  year = {2012},
  journal = {Tiny Transactions on Computer Science},
  volume = {1},
  venue = {TinyToCS},
  month = {September},
  note = {},
  accept = {$\frac{50}{64} = 78\%$},
  
  abstract = {<p>Version control systems are critical for coordinating work in
  large software engineering teams. Recently, distributed version control (DVC)
  systems have become popular, as they have many advantages over their
  centralized (CVC) counterparts. DVC allows for more frequent commits, and
  simplifies branching and merging. These features encourage developers to make
  smaller, finer-grained commits that do not interleave changes related to
  different development tasks. Such commits improve accountability and ease
  certain tasks, such as reverting changes that later cause problems.</p>
  
  <p>DVC systems are also better suited for repository mining techniques, making
  available more useful information about the development process. For example,
  approaches that infer collaboration patterns can benefit from the more
  detailed attribution of data in DVC. This can be used by an integration server
  to send email about failed test cases to just the subset of developers who
  authored the relevant code. DVC may also lead to smaller and more focused
  commits, which could benefit mining techniques that identify changes relevant
  to specific development tasks, such as refactorings.</p>
  
  <p>However, to date, there has been no explicit evaluation of the practical
  differences in mining DVC over CVC, though some work acknowledges that
  developers might use DVC and CVC differently. We report on such an evaluation
  with one counterintuitive finding that raises doubts about certain DVC
  promises and opens research questions into what causes DVC and CVC
  differences. Further, our finding indicates that repository type should be
  controlled for in repository mining experiments.</p>},
}