Tesseract Write Up
Demo:
You can accesss a demo of Tesseract at:
http://crc.maccherone.com/tesseract/mpv.html
Description of the tool:
Tesseract
is designed to provide a high level view of the entire software project
and its evolution. Specifically, it combines information about project
activities (frequency of commits), file dependencies (logical coupling
based on when files have been checked in together), social dependencies
(dependencies among developers based on underlying dependencies among
artifacts that they are editing), and bug history.
Tesseract
is designed to allow a user to investigate a project through different
perspectives that are linked together to present a holistic view of the
project.
Figure 1
Tool Walk-through
-
Each project in the combo list
(Fig 1 (a)) represents a GNOME project. Select a project (e.g.,
“gnome/rhytmbox”) from the drop down to populate the other panels.
-
The search bar (Fig
1 (b)) allows you to search for a specific file name, developer name,
or bug text. The appropriate node/text that is found is highlighted in
yellow.
-
Date slider (see
Fig 1 (c)) displays the date range for which the project was active.
For example, Project “gnome/rhythmbox had commits between 2002-03-02
and 2007-01-03. The data range also shows the distribution of the
number of commits and communication frequency over time period.
The date slider is set by default to encompass 6 month period (starting
from the start date of the project). The file network, developer
network, and bug data displays information for this time slice. Either
thumbnail in the slider can be adjusted to show a different start or
end date.
Network Data:
-
File-to-file network (Figure2),
shows the network of interdependent files in the project. Currently,
coupling among artifacts is determined when files have been changed
together and committed in the selected time range. The graph takes into
consideration only changes to code files (i.e., it disregards .gif,
.log, .txt, etc). Hovering over a file node presents a tool tip
displaying its file name and highlights (makes the nodes darker) its
neighbors. Tesseract provides two extra controls to allow the user to
fine-tune which files are to be considered in a particular commit.
i. The Numeric combo box (top
left in panel) allows users to specify the threshold for the number of
files per commit that are to be considered. For example, if the
threshold is set to 10, then commits in which more than 10 files have
changed will not be considered. This threshold helps us filter changes
to a large set of files which might have been due to a licensing change
or authorship changes. This threshold also helps in making the file
network graph scalable.
ii. The File Numeric stepper (top
right in panel) allows the user to specify how many times two files
have to be checked in together to deem them coupled.
Figure 2: File network
-
Developer to developer network
(Figure3) displays the congruence in the social network of the project.
Congruence is defined as a match between the coordination requirements
and the coordination behavior of a team, where developers who are
working on interdependent artifacts are meant to coordinate with each
other. We calculate coordination requirements based on the methodology
developed by Cataldo et al [], where developer to developer dependency
is calculated based on the underlying logical coupling among the
artifacts (i.e., files that have been committed together). The
communication behavior in the project is based on communication
activities in the mailing lists and bug database. Specifically, when
developers participate in email discussions, comment on a particular
bug/issue in the Bugzilla database, or work on a particular bug/issue
they are considered to have communicated with each other. This
communication link is then compared with the coordination requirement
link to calculate congruence. When the communication link matches the
coordination requirement link, we color the edge between two nodes in
this graph “green”. When the communication link is missing the edge is colored “red”
representing a gap. When there is an extra communication link (i.e.,
two developers have communicated, but not worked on coupled artifact),
the edge is colored “grey”. The developer network panel provides two
controls.
i. It
is possible that developers first discuss a bug or feature before
editing the files concerning that bug/feature and committing them to
the repository. To take into consideration such discussions the
Numerical stepper (left top panel in Figure3) allows the user to select
the “number of days” prior to the editing of the files when the
discussions occurred.
ii. Communication
selection checkboxes (right top panel Figure3) allow users to select
which communication channel (email, bug activity, bug comment) is to be
used for congruence calculation.
Figure 3: Developer network
NOTES
-
Hovering
over any node in the graphs (file to file network or
developer-to-developer network) highlights (darkens the node) the
neighboring nodes.
-
Clicking a particular node
in the developer-to-developer graph highlights (colored yellow) the
files that that developer has committed during the time period under
consideration. Similarly clicking a node in the file-to-file graph
highlights (colored yellow) the developer who had committed that file
in the time period under consideration. More than 1 node can be
selected by (cntrl+click).
-
Please wait until the networks have stabilized before dragging nodes.
-
Sometimes nodes move outside the window panes. In such cases you can
zoom out in the graph using the scroll button on the mouse.
-
The entire graph can also be panned/dragged by placing the mouse in the center of the graph and moving it.
Bug Data:
For the time range selected by the Date Slider (Fig 1(b)) bug
information is shown if (1) the bug was opened during this time period
or (2) an open bug was closed during this period:
-
The stacked area chart (Fig 1 (f)) displays the number of open bugs in
the selected time range classified (and colored) according bug
severity.
-
The Table (Fig 1 (f)) provides further information on bugs shown in the
stack chart. The status that a user provides when reporting a bug is
called “priority”, the status that is assigned by the core developers
in the project is called “severity”. The status of the bug as reported
by the developer working on it is reported as “status” and the final
decision about how the bug was resolved shown as “resolution”.
i. The
user can use the checkboxes (Fig 1 (f)) to filter which bug is
displayed based on the severity of the box. Enhancement requests are
included in this list because it appears in the Bug database.
ii. Clicking
on a particular bug in the Table selects the developer (colored yellow)
to whom the bug was assigned in the developer to developer network.
Developers who had communicated regarding that bug (bug activity or
comment) can be found by hovering on that developer node (deeper
colored) in the developer-to-developer network.