Notice: I decide to maintain all the programs and source code on Google Code, due to my upcoming relocation. This page will be left as it is and will not be updated. You will be redirected to the Google Code page for this project in 10 seconds...


Click here to go to the Google Code page now: http://code.google.com/p/genetic-ensemble-snpx/

GEsnpx: A genetic ensemble approach for gene-gene interaction identification

  • Description

    GEsnpx is an implementation of a hybrid algorithm developed for gene-gene interaction identification in complex diseases. The system utilizes a multiple objective genetic algorithm with an ensemble of 5 nonlinear classifiers to capture gene-gene interaction through SNP markers. SNP subsets are evaluated and selected in a combinatorial manner, and potential interactions are identified by a combinatorial ranking procedure.

    Current version supports case-control designed association study. Besides its comparable detection power for SNP pair (two SNP interaction) to many other state-of-the-art programs, the parallel support of GEsnpx for higher-order gene-gene interaction identification set it aside from the single or pairwise based SNP screening algorithms. Please refer to reference [1] for more details on implementation and evaluation of GEsnpx.

  • News!

    Note that in current implementation, we have modified the classifier evaluation method using Area Under ROC Curve (AUC) to address the imbalanced case-control dataset. This may result in a longer computational time depending on the type of machines you are using. A random over sampling procedure is added to address the same problem when case-control ratio is highly imbalanced (need to be specified explicitly to use it). We expect those changes to increase the detection power when the data is imbalanced.

    A new diversity measure "kappa diversity" is implemented and used as default. The original "double fault diversity" can still be used by specifying through options.

  • Availability
    • GEsnpx 1.1 [download]
    • test dataset1 [20 SNPs] [download]
    • test dataset2 [100 SNPs] [download]
    • as requested by many people, the source code is now available for academic (non-commercial) users [download]

    The test dataset1 and test dataset2 are obtained from study [2].

    * Java 5.0 is required for executing the program.

    To obtain the general information about the program, run following command in command line (without parameters):

    java -jar GEsnpx.jar
    To test the program, run the program with the example dataset as follows:

    java -jar GEsnpx.jar -f balanced_200_0.2_20.arff
    To test the program verbosely, use the verbose option "-v" as follows:

    java -jar GEsnpx.jar -f balanced_200_0.2_20.arff -v

    We welcome any help on improving the quality of the software. To report bugs, please email to following address:

  • References
  • [1] Pengyi Yang, Joshua W.K. Ho, Albert Y. Zomaya, Bing B. Zhou, "A genetic ensemble approach for gene-gene interaction identification", BMC Bioinformatics, 2010, 11:524. [fulltext]
    [2] Jason H. Moore et al., "Application of genetic algorithms to the discovery of complex models for simulation studies in human genetics", In: Proceedings of the Genetic and Evolutionary Computation Conference, 1150-1155, 2002. [