Comparing gene annotations from alternative sources is an important task many biologists do frequently. Manually comparing text files or visually assessing differences using a genome browser is tedious and error prone. We developed ParsEval to facilitate comparative analysis of gene annotations at the genome scale. Reports from ParsEval include a variety of informative similarity statistics that highlight the differences between the annotations.
ParsEval is implemented in ANSI C and is designed to run on all POSIX-compliant UNIX systems (Linux, Mac OS X, Cygwin, Solaris, etc.). Aside from a C compiler with OpenMP support (such as GCC 4.2 or higher), ParsEval's only external dependency is the GenomeTools library.
Use the download button on the navigation bar to download the latest version of ParsEval. Installation instructions are provided in the README included in the source code distribution.
In addition to a single summary report that provides comparison statistics aggregated across all annotations, ParsEval provides the same statistics in reports individualized for each distinct gene locus. The summary report provides a high-level view of the similarity between the two sets of annotations, while the locus reports provide a detailed breakdown of the precise differences in gene structure at each genomic region.
ParsEval uses the term locus differently than perhaps most biologists do. During runtime, ParsEval treats each gene annotation as a node in an interval graph G. In this graph, there is an edge between two nodes if the corresponding gene annotations overlap. Each connected component in G then corresponds to a distinct gene locus, which we define as the smallest genomic region containing every gene annotation associated with the corresponding subgraph. Defining a gene locus in this way makes no assumptions as to the relative quality of the two sets of annotations, and ensures that no potentially relevant data are discarded. Furthermore, according to this definition each gene locus is independent, enabling the subsequent comparative analysis tasks to run in parallel.