September 2019
September
MoTuWeThFrSaSu
1
2345678
9101112131415
16171819202122
23242526272829
30
Visitors
eXTReMe Tracker
Since Jan 10, 2008

 

ToNER

What is ToNER?

ToNER is a tool for statistical modeling of enrichment from RNA-seq data comprising enriched and unenriched control libraries.

ToNER_flowchart

Figure 1. ToNER flowchart

Supplement files

Requirements

  1. Linux, Mac or Windows operating system
  2. python version 2.7 with module below
    • matplotlib version 1.8.0 or higher
    • numpy version 1.4.3 or higher
    • scipy version 0.17.0 or higher
    recommend : download Anaconda 4.2.0 for python version 2.7 with modules.
  3. samtools version 1.2 or higher

Downloading ToNER

You can download the latest source release here : ToNER program

Using ToNER

Usage : python ToNER.1.0.py -i <folder_input> [options]*

The following is a detailed description of the options used to control the ToNER script.

Arguments:

-i <folder_input> Destination directory of input file. This directory have data that already aligned to a reference genome sequence in BAM format.

In 1 dataset have 2 libraries, file name of enrich library must be end with _1.bam and file name of control library must be end with _2.bam such as <name of dataset>_1.bam and <name of dataset>_2.bam.

If folder-input have multiple datasets, ToNER can analyze with meta-analysis for increase power of detection.

Options:

-h/--help Prints the help message and exits
-o/--output <string> Sets the name of directory of output file. The default is ./ToNER_year_month_day_time
-r/--read <string>

Specify position of reads for calculate enrichment ratio. Consider supplying options below to suitable for your interested. The default is all.

reads_count

  • start  use only first position at the 5'-end of sequence of reads for finding upstream enrichment ratios.
  • all      use all positions of sequence of reads. (Default)
  • end      use only last position at the 3'-end of sequence of reads for finding downstream enrichment ratios.
-t/--total_reads <integer> Specify minimum number of sum of reads count in each position in both libraries (enriched and unenriched libraries) to filter out noise positions.
-e/--each_library <integer> Specify minimum number of reads count in each position in either libraries to filter out noise positions.
-p/--p_value <float> Specify maximum p-value threshold (range 0-1) for set significant enrichment value. If option -c/--combine used, This value is also combined p-value. The default is 0.05
--none_pseudo Not use pseudo count. The default (pseudo count) is add 1 to amount of depth in position of reads mapping in both libraries (enriched and unenriched libraries) for resolve problem when amount of depth in either unenriched libraries or enriched libraries is zero.
-g/--gene <string>

Add gene annotation information (GFF file : GFF format) to annotate significant enriched position which GFF file must be added to directory of input file. Then output file will added 7 columns to represent distance of position from genes, feature, name, strand, start and end of genes by follow GFF file. Consider supplying options below to suitable for your interested.

gene_annotate_start

  • start  focus on annotate upstream genes at the 5'-end of genes (red color block).
gene_annotate_end
  • end      focus on annotate downstream genes at the 3'-end of genes (red color block).
-c/--combine If datasets more than 1 dataset, meta-analysis will use Fisher's combined probability test to calculate combined p-values. File name of datasets in directory of input file can specified such as <name of dataset1>_1.bam , <name of dataset1>_2.bam and <name of dataset2>_1.bam and <name of dataset2>_2.bam .
-s/--consensus <integer> Specify minimum number of replicates. If datasets more than 1 dataset, meta-analysis will use amount of replicate to filter significant enrichment positions.
-d/--distribution <string> Specify type of distribution method. Default, ToNER program use boxcox function for transform data to normal distribution by consider R-square value of qq-plot. If R-square value do not pass threshold (default is 0.9), ToNER program will terminate. (Look at -q/--qqplot for specify R-square value threshold). Consider supplying options below to suitable for your interested. The default is normal
  • normal    use boxcox function for transform data to normal distribution. (Default)
  • toprank  use the percentage of scores in its frequency distribution.

Notice : If toprank used, -c/--combine option cannot used because ToNER do not calculate combined p-values.

-q/--qqplot <float> Specify R-square value of qq-plot that used to test normal distribution (range 0-1). The default is 0.9
--less_memory In case of memory problem, this option will use less memory than default which that are time consuming.
--history_data Added all output file from each process of ToNER program.

ToNER Output:

The ToNER program produces a number of files in the output directory. The following is a detailed of output files.

  1. <name of dataset>_result.txt
  2. This text file include significant positions and another column of detail following :

    • column 1     position : significant position of reference genome
    • column 2     strand : strand of reference genome
    • column 3     chromosome : chromosome of reference genome
    • column 4*   distance : distance from enriched position to gene
    • column 5*   region : region of gene
    •  Upstream - This position is before gene.
    •  Downstream - This position is after gene.
    •  Intra - This position is inside gene.
    • column 6*   feature : feature of gene (gene, mRNA, tRNA,... so on from GFF file)
    • column 7*   f_gene : name of gene
    • column 8*   f_strand : strand of gene
    • column 9*   f_start : start position of gene
    • column 10* f_end : end position of gene
    • column 11   depth_1 : depth of reads in enriched library
    • column 12   depth_2 : depth of reads in unenriched library
    • column 13^ ratio(before Box-Cox) : ratio of this position before box-cox transformation
    • column 14^ ratio(after Box-Cox) : ratio of this position after box-cox transformation
    • column 15   p-value : p-value of ratio

    Notice : * -g/--gene option will add column 4-10, and

                ^ if ToNER program not use box-cox transformation, column 13 and 14 will not shown before and after,  Result will show only 1 ratio.

  1. <name of dataset>_forward.gff and <name of dataset>_reverse.gff
  2. This file is GFF format that include significant position by separate to forward strand and reverse strand. Another column of detail is following :

    • column 1    chromosome : chromosome of reference genome
    • column 2    source : program name
    • column 3    feature : dataset name
    • column 4    start : start position of ratio
    • column 5    end : end position of ratio
    • column 6    score : nothing
    • column 7    strand : strand of reference genome
    • column 8    frame : nothing
    • column 9    attribute : attributes of position : depth of sequences of enriched (depth_1) and unenriched (depth_2) libraries; ratio of this position before and after box-cox transformation; p-value of this position
  3. <name of dataset>_type_of_read.png
  4. This file is pie-chart image that presents percent and amount of reads of 3 groups :

     type_of_read

    • both (filled red color)       : all position that have reads in unenriched and enriched libraries
    • only_1 (filled blue color)  : only position that have reads in enriched library
    • only_2 (filled green color): only position that have reads in unenriched library
  5. <name of dataset>_Box-Cox_transformation.png
  6. This file is include before and after qq-plot box-cox transformation image. Each value in image is following :

    Box-cox_transformation

    • p-value : maximum p-value threshold for set significant enrichment value.
    • R-square value : R-square value of qq-plot for test normal distribution.
    • lambda value : Lambda value of boxcox transform process.
    • cutoff value : ratio value that match with specified p-value, used to filter noise ratio (minimum value).

    Notice : If -d/--distribution toprank option is selected, ToNER program will produces top-rank.png instead of transform.png that show only density plot and p-value.

  7. <name of dataset>_reads_per_position.png
  8. This file is probability density of number of reads per position graph. Library 1 (red color) is enriched library and Library 2 (blue color) is unenriched library. :

    reads_per_position

  9. depth_1.txt and depth_2.txt
    If --history_data option is selected, all depth values is reported in files following :
    • depth_1.txt is depth values of enriched library.
    • depth_2.txt is depth values of unenriched library.
  10. ratio.txt
  11. If --history_data option is selected, all ratio values is reported in that file.

  12. log.txt
  13. This file is report status while run ToNER program.

Document Actions