September 2018
September
MoTuWeThFrSaSu
12
3456789
10111213141516
17181920212223
24252627282930
Visitors
eXTReMe Tracker
Since Jan 10, 2008

 

iLOCi

iLOCi: a SNP interaction prioritization technique for detecting epistasis in genome-wide association studies

 


iLOCi software is used to calculate the gene-gene interactions from GWAS data. This software was implemented by the OpenCL framework.

Download : iLOCi software and README

 

Supplement files :

  • Additional file 1 – The mathematical details of ρdiff value and its relation with LD (iLOCi_details.pdf)
  • Additional file 2 – Penetrance tables for dataset simulation (Penetrance_tables.pdf)
  • Additional file 3 – Top 1000 SNP pairs from analyses of complete SNP set of WTCCC (TopPairs_Complete.xls)
  • Additional file 4 – Top 1000 SNP pairs from analyses of gene-only SNP set of WTCCC (TopPairs_GeneOnly.xls)
  • Additional file 5 – Pathway enrichment analysis of WTCCC datasets (Pathway_analysis.xls)

 

Software requirements :


1. Linux or Mac operating system

2. Python 2.4 or later version

3. Java runtime version 1.6 or later

4. OpenCL driver and libraries

4.1 Linux x86, x86_64 from intel website (OpenCL 2.0 GPU/CPU driver package)

4.2 MacOSX 10.7 Lion with build-in OpenCL framework

5. Optional (Queue management) for cluster computing environment, e.g. SGE (Sun Grid Engine) for parallel calculation.

 

 iLOCi_flow.png

iLOCi workflow

 

iLOCi software comprises three parts :


1. Pre-processing step Python script (MergeInput.py) prepares iLOCi input files. This input file is composition of three files : control and case genotype files and SNP annotation file.

The input genotype of both cases and controls cannot contain any missing data. User must screen out or imputing the missing data. An example of genotyping data is shown below.

02011110211011111000

11011000001111000101

00011001011010201011

00200010000000110211

12110110201010000100

01000100101201100102

This sample has 20 individuals (columns) and 6 SNPs (rows). Genotypes are encoded with "0" : homozygous wide type, "1" : heterozygous and
 "2" : homozygous variant type.

The SNPs annotation file is tab-delimited format. The example is shown below.

rs3094315       Chr1:742429     FLJ22639 | geneID:79854 | near-gene-3_10k

rs4075116       Chr1:993492     LOC401934 | geneID:401934 | near-gene-3_10k

rs9442385       Chr1:1087198    MIRN200B | geneID:406984 | near-gene-5_10k

rs10907175      Chr1:1120590    TNFRSF18 | geneID:8784 | near-gene-3_10k

rs2887286       Chr1:1145994    SDF4 | geneID:51150 | intron

rs6603781       Chr1:1148494    SDF4 | geneID:51150 | coding-synon

The example data of 1963 cases and 2938 controls in 8000 SNPs are stored in files Gty_Cases_8000snps.txt, Gty_Ctrls_8000snps.txt and SNPs_8000.txt.

MergeInput.py script is used to combine the cases and controls genotyping data and the SNP identification from SNPs annotation file to the iLOCi input file. The example file is shown below.

iloci_input.png

The first line contains the number of individuals from case and control groups separated by tab respectively. The next lines contain the combination of genotyping data of cases follow by controls. The genotyping data starts exactly at the 21st characters of each line.

User can prepare the iLOCi input file with following sample command.

> ./MergeInput.py Gty_Cases_8000snps.txt Gty_Ctrls_8000snps.txt SNPs_8000.txt Combined_Gty.txt

Gty_Cases_8000snps.txt is the genotype data of cases.
Gty_Ctrls_8000snps.txt is the genotype data of controls.
SNPs_8000.txt is the SNPs annotation file.
Combined_Gty.txt is the file name to save the output.

 

2. Processing step requires the same script with different options to calculate ρdiff values of SNP pairs. The file "iloci-main.jar" was implemented using OpenCL framework and jOCL (Java OpenCL wrapper).

Users can provide parameters for "iloci-main.jar" program as shown below.

 

> java –Xmx2000m –jar iloci-main.jar –i Combined_0 –j Combined_1 –x 0 –y 1 –r 1000 –f 2000 –o Toprank_0_1 –h Histogram_0_1 –p 1

-Xmx2000m this option is used to reserved the 2000 MB of memory for java virtual machine.
-i Combined_0 the first input file.
-j Combined_1 the second input file.
-x 0 the block position of input file 1.
-y 1 the block position of the input file 2. The default value is 0.
-r 1000 the number of range to store the histogram of ρdiff values (2.0/1000). The default value is 100.
-f 2000 the number of top rank score of ρdiff values to stored in the output file. The default value is 10000.
-o Toprank_0_1 the name of the output file to store the SNP pairs and ρdiff values.
-h Histogram_0_1 the name of the output file sot store the histogram of the ρdiff values.
-p 1
Specify the device used to perform iloci-main.jar program. This option is used when running on the machine that have heterogenous environment (multiple CPUs and GPUs). The default value is 0 (The first device) depending on the machine configuration.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

iLOCi_block.png

The example above demonstrates when divide the full data set into small 4 blocks (0, 1, 2 and 3). The combination of the operations are “00”, “01”, “02”, “03”, “11”, “12”, “13”, “22”, “23” and “33”.

We provide the example script to perform the large data set in the file “Job_submit.py”. User can modify the script for running on your system.

 

3. Post-processing step Python script “Combined_Toprank.py” use to collect the whole results and select the top rank pairs.  The other script “Combined_Histogram.py” is used to collect the ρdiff  values and the frequency to plot the histogram.

Create the list file that contains all of result files by using the simple UNIX command. If the toprank output files from processing step contain 3 files : Toprank_0_1, Toprank_0_2 and Toprank_1_2. User and use the ”ls” command to create the list file as shown below.

> ls Toprank_* > Toprank_TestData_list.txt

Use “Combined_Toprank.py” to collect the required result.

> ./Combined_Toprank.py SNPs_8000.txt Toprank_TestData_list.txt 1000 Top_1000.txt

The explanation of parameters are shown below.

SNPs_8000.txt SNPs annotation file.
Toprank_TestData_list.txt File contains the list of top rank output files.
1000 The number of top rank pairs.
Top_1000.txt The final top rank output file.

 

Create the list file of histogram results with ”ls” command as same as Toprank files.

> ./Combined_Histogram.py Histogram_list.txt 500 0.8 Histogram_500.txt

The explanation of parameters are shown below.

Histogram_list.txt File contains the list of histogram output files.
500 The range of the ρdiff values, this generates the bin values as 2.0/2000.
0.8 The top ρdiff values to collect the frequency.
Histogram_500.txt The final output of ρdiff  histogram.

 

File Run_test.py is the python script for running the example files. This script do the complete processes include pre-processing, processing (called Job_submit.py) and post-processing (Combined_Toprank.py and Combined_Histogram.py).

> ./Run_test.py

 

Document Actions