Hagfish

For detailed explanations, see the Hagfish page.

Hagfish is a tool that builds on the concept of coverage plots based on paired-end read data, and aims to assist in quality control of de novo genome assembly or identification of structural variation in a genome re-sequencing experiment. It can be used to assess for mis-assemblies in scaffolds.

Hagfish coverage plots were generated for scaffolds of our N. benthamiana v0.3 assembly where enough paired-end data was available. An example plot is shown below.

Hagfish plot

Three stacked coverage plots are shown: "high" in red, "ok" in green, and "low" in blue. ECP: A regular (exclusive) coverage score - each read that covers a nucleotide increases the coverage score for that nucleotide by one. ICP : As an ECP, but nucleotides in between the reads of a mapped read pair also receive a coverage score increment.

Any region with "sufficient" green ECP coverage (below the y-axis) indicates that that region is completely covered by reads mapping to that region. Any region with "sufficient" green ICP coverage (above the y-axis) indicates that that region is structurally sound, i.e. based on the read pairs mapped to the genome. The top plot shows two aberrant major features. The first is around 3.5k - the first gap. The ICP shows coverage in the "high" category bridging the gap. The ECP has two small bumps in the "high" category on both sides of the gap. A number of conclusions can be drawn from this signature: it indicates, in this case, that the assembly is correctly scaffolded in this location (i.e. the contigs left & right of the gap belong together), but the size of the gap (the yellow block) is over estimated. This can be concluded by the fact that there is coverage in the ICP spanning the gap (there are read pairs spanning the gap), but they fall in the "high" category - meaning they map further apart than expected - hence, the size of the gap is too large.