A high definition network visualization approach to detect fine-scale population structures from genome-wide patterns of variation
Markus Neuditschko, Mehar S Khatkar and Herman W Raadsma
What is NETVIEW: NETVIEW is a fast and efficient way for identifying and visualizing fine-scale population structures from a genetic relationship matrix among individuals/populations. It identifies and visualizes individual ancestry in an effective way for a range of population structure analyses including the identification of founder individuals, family structures and admixture. A detailed description of the method is provided in:
M Neuditschko, MS Khatkar, HW Raadsma. 2012. A high definition network visualization approach to detect fine-scale population structures from genome-wide patterns of variation. PLoS One submitted
NETVIEW is an analysis pipeline which combines three different software tools to generate a high-definition network visualization of population structures. The tools included in the NETVIEW procedure are:
- Super Paramagnetic Clustering (SPC) as implemented in the software SPIN, which is free for academic use upon request:
- Network analysis Tool (NeAT) NeAT
The working steps of the three different software tools including the various input and output formats/styles are described in the following sections with relevant example files:
About SPIN (SPC)
SPC is the main tool within the NETVIEW procedure. This algorithm is a powerful tool to identify population clusters from any relationship matrix and provides the network structure for a range of further analysis and the final network visualization.
The program SPIN supports different input formats including the options Eucl (Euclidian distance), Cor (Correlation) and Jacc (Jaccard) once an option is selected the input matrix is converted to a distance matrix and an appropriate scatter plot is calculated. If all buttons are un-selected the program expects a distance matrix as input. As our example matrix Example Matrix describes correlations between individuals of five different populations, the Coroption is selected before the matrix is loaded into the GUI using theLoad button.
After loading the relationship matrix in SPIN sorter, press Run SPC button as shown in the following image 1 which will prompt to set SPC parameter (set K=10 and min size=2) as described in the paper. Pressing the OK button performs the clustering and generates the various output files. Image 1
SPC outputs include following images and files:
- Re-ordered relationship matrix Sort Matrix
- HeatMap image of the reordered relationship matrix showing the clusters and relationship within and between clusters. Image 2.
- Data scatter plot of the respective relationship matrix comparing PCA1 vs PCA2 and PCA3.Image 3.
- SPC Tree as shown in Image 4. The output of the tree can also be obtained in a text file Example Tree Lab which can be used for drawing SPC tree and labelling by any other tool such as R.
- A binary edge file showing the edges/connections identified through the clustering process. The file from this example is here Example mst10edges. This file contains pairs connected by edges. The edge file is combined with relationship matrix file to generate a weighted relationship matrix W Matrix Example. A simple R script is provided here R manual for NeAT R to generate this file.
About network analysis tools (NeAT)
The network analysis toolbox provides a user-friendly web access to a collection of tools for the analysis of networks including format conversion/layout calculation and the calculation of node statistics, including the “degree centrality” statistic. On NeAT page the weighted relationship matrix from SPC W Matrix Example can be transformed into the GLM file (required by CYTOSCAPE) by first selecting the “Format conversion” tab from left hand side panel of NeAT page and then selecting input format option as “Adjacency matrix”, output format as “GLM” and with the option “Edge width proportional to weight” selected. The degree centrality of each individual has been calculated selecting the “Node topology statistics” tab from the left hand side panel of NeAT page. The example of the transformed GML file is given here Example Network and the use of NeAT is shown in following .Image 5.
CYTOSCAPE is an open source software platform for visualizing complex network structures. To visualize the example_network.gml in CYTOSCAPE the file has to be imported: File > Import > Network (multiple file types). Once the network is imported, various visualization manipulations can be performed (e.g. to generate a network as presented in the paper) use the option: Layout > yFiles > Circular or Organic as shown in the following print screen .Image 6.
There are a number of additional options for setting labels, colour and size for nodes and edges. The manipulation of these attributes can be done locally, but we found it more convenient to also import attributes from text files using: File > Import > Node or Edge Attributes. The example files for node colour and node size are given here:
Degree for node size Node fill Color for node colour. To visualize the node size according to the numbers of connections (degree) the degree values have to be associated with node size, which is done by selecting: VizMapperTM > Visual Mapping Browser > Node Size (Double click to create) > Select Degree > Continuous Mapper. Once, all these steps are applied, the final network visualization should appear like Image 7. The file of this network visualization is also provided Network cys and can be opened in CYTOSCAPE.