Effects of Centralized and Distributed Version Control on Commit Granularity"/> Effects of Centralized and Distributed Version Control on Commit Granularity"/>
Version control systems are critical for coordinating work in large software engineering teams. Recently, distributed version control (DVC) systems have become popular, as they have many advantages over their centralized (CVC) counterparts. DVC allows for more frequent commits, and simplifies branching and merging. These features encourage developers to make smaller, finer-grained commits that do not interleave changes related to different development tasks. Such commits improve accountability and ease certain tasks, such as reverting changes that later cause problems.
DVC systems are also better suited for repository mining techniques, making available more useful information about the development process. For example, approaches that infer collaboration patterns can benefit from the more detailed attribution of data in DVC. This can be used by an integration server to send email about failed test cases to just the subset of developers who authored the relevant code. DVC may also lead to smaller and more focused commits, which could benefit mining techniques that identify changes relevant to specific development tasks, such as refactorings.
However, to date, there has been no explicit evaluation of the practical differences in mining DVC over CVC, though some work acknowledges that developers might use DVC and CVC differently. We report on such an evaluation with one counterintuitive finding that raises doubts about certain DVC promises and opens research questions into what causes DVC and CVC differences. Further, our finding indicates that repository type should be controlled for in repository mining experiments.
@article{Wuttke12tinytocs,
author = {Jochen Wuttke and Ivan Beschastnikh and Yuriy Brun},
title =
{Effects of
Centralized and Distributed Version Control on Commit Granularity},
year = {2012},
journal = {Tiny Transactions on Computer Science},
volume = {1},
venue = {TinyToCS},
month = {September},
note = {},
accept = {$\frac{50}{64} = 78\%$},
abstract = {Version control systems are critical for coordinating work in
large software engineering teams. Recently, distributed version control (DVC)
systems have become popular, as they have many advantages over their
centralized (CVC) counterparts. DVC allows for more frequent commits, and
simplifies branching and merging. These features encourage developers to make
smaller, finer-grained commits that do not interleave changes related to
different development tasks. Such commits improve accountability and ease
certain tasks, such as reverting changes that later cause problems.
DVC systems are also better suited for repository mining techniques, making
available more useful information about the development process. For example,
approaches that infer collaboration patterns can benefit from the more
detailed attribution of data in DVC. This can be used by an integration server
to send email about failed test cases to just the subset of developers who
authored the relevant code. DVC may also lead to smaller and more focused
commits, which could benefit mining techniques that identify changes relevant
to specific development tasks, such as refactorings.
However, to date, there has been no explicit evaluation of the practical
differences in mining DVC over CVC, though some work acknowledges that
developers might use DVC and CVC differently. We report on such an evaluation
with one counterintuitive finding that raises doubts about certain DVC
promises and opens research questions into what causes DVC and CVC
differences. Further, our finding indicates that repository type should be
controlled for in repository mining experiments.},
}