Commands reference

From PileLine

(Difference between revisions)
Jump to: navigation, search
(Analysis Commands)
(Analysis Commands)
Line 183: Line 183:
  '''Usage (by position):''' pileline-nsmc -a <GP_file> -a <GP_file> -a <GP_file>... -b <GP_file> -b <GP_file> -b <GP_file>... [OPTIONS] -o <OUTFILE>
  '''Usage (by position):''' pileline-nsmc -a <GP_file> -a <GP_file> -a <GP_file>... -b <GP_file> -b <GP_file> -b <GP_file>... [OPTIONS] -o <OUTFILE>
  '''Usage (by intervals):''' pileline-nsmc [-B <bed_file> | -G <gff_file>] -a <GP_file> -a <GP_file> -a <GP_file>... -b <GP_file> -b <GP_file> -b <GP_file>... [OPTIONS] -o <OUTFILE>
  '''Usage (by intervals):''' pileline-nsmc [-B <bed_file> | -G <gff_file>] -a <GP_file> -a <GP_file> -a <GP_file>... -b <GP_file> -b <GP_file> -b <GP_file>... [OPTIONS] -o <OUTFILE>
-
 
+
-
Option                                  Description                             
+
Option                                  Description                             
-
------                                  -----------                             
+
------                                  -----------                             
-
-B, --intervals-bed-file <File>        intervals file in BED format.           
+
-B, --intervals-bed-file <File>        intervals file in BED format.           
                                           Intervals are taken as 0-based and   
                                           Intervals are taken as 0-based and   
                                           the end-position is exclusive         
                                           the end-position is exclusive         
-
-G, --intervals-gff-file <File>        intervals file in GFF format.           
+
-G, --intervals-gff-file <File>        intervals file in GFF format.           
                                           Intervals are taken as 1-based and   
                                           Intervals are taken as 1-based and   
                                           the end-position is inclusive         
                                           the end-position is inclusive         
-
-I, --intervals-gp-file <File>          intervals file in custom format. You   
+
-I, --intervals-gp-file <File>          intervals file in custom format. You   
                                           can provide the columns for           
                                           can provide the columns for           
                                           sequence, start and stop in the       
                                           sequence, start and stop in the       
                                           appropriate parameters               
                                           appropriate parameters               
-
-a                                      variant pileup (pileup -c)  files for   
+
-a                                      variant pileup (pileup -c)  files for   
                                           sample A (one or more. i.e: -a file1  
                                           sample A (one or more. i.e: -a file1  
                                           -a file2 -a file3...) [required]     
                                           -a file2 -a file3...) [required]     
-
-b                                      variant pileup (pileup -c) files for   
+
-b                                      variant pileup (pileup -c) files for   
                                           sample B (one or more. i.e: -b file4  
                                           sample B (one or more. i.e: -b file4  
                                           -b file5 -b file6...) [required]     
                                           -b file5 -b file6...) [required]     
-
-e, --expand-cells                      In exact position mode: fill each cell  
+
-e, --expand-cells                      In exact position mode: fill each cell  
                                           in the output with the corresponding  
                                           in the output with the corresponding  
                                           pileup line if it exists, separated   
                                           pileup line if it exists, separated   
Line 211: Line 211:
                                           interval (this is not taken into     
                                           interval (this is not taken into     
                                           account in the fisher test)           
                                           account in the fisher test)           
-
--end-col-intervals <Integer>          end position column in the intervals   
+
--end-col-intervals <Integer>          end position column in the intervals   
                                           file. The first is 1 (default: 3)     
                                           file. The first is 1 (default: 3)     
-
-o                                      output file. Use - for stdout           
+
-o                                      output file. Use - for stdout           
                                           [required]                           
                                           [required]                           
-
--seq-col-intervals <Integer>          sequence column in the intervals file.  
+
--seq-col-intervals <Integer>          sequence column in the intervals file.  
                                           The first is 1 (default: 1)           
                                           The first is 1 (default: 1)           
-
--start-col-intervals <Integer>        start position column in the intervals  
+
--start-col-intervals <Integer>        start position column in the intervals  
                                           file. The first is 1 (default: 2)     
                                           file. The first is 1 (default: 2)     

Revision as of 11:41, 14 February 2011

Processing and Annotation Commands

  • pileline-fastseek

Prints a given range of a GP file.

Usage: pileline-fastseek -p <GP_file> -s <range> [--seq-col <int>] [--pos-col <int>]

Option                                  Description                            
------                                  -----------                            
-p, --gp-file <File>                    SORTED genome position file to seek  [required]                           
--pos-col <Integer>                     position column for the gp-file. The first is 1 (default: 2)              
-s                                      seek position in the form of seq:start[:end] [required]             
--seq-col <Integer>                     sequence column for gp-file. The first is 1 (default: 1)

Example of use:

pileline-fastseek -p <GP_file> -s chr10:100:10000


  • pileline-sort

Sorts a GP file by position coordinate.

Usage:pileline-sort -i <GP_file> -o <outfile> [OPTIONS]

Option                                  Description                            
------                                  -----------                            
-T, --temp-dir                          Directory for temporary files [default is the system's temp dir]            
-i, --input-file                        Input file to sort. Use - for stdin  [required]                           
--max-chars-chunk <Long>                max chars per temporal file (default: 2000000)                             
-o, --output-file                       Output sorted file. Use - for stdout [required]                           
--pos-col <Integer>                     position column in the input file. The first is 1 (default: 2)              
--seq-col <Integer>                     sequence column in the input file. The first is 1 (default: 1)  

Example of use:

pileline-sort -i <GP_file> -o <outfile>


  • pileline-fastjoin

Joins two sorted GP files. Note: You may use pileline-sort whether you need to sort GP files to run pileline-fastjoin command.

Usage: 
pileline-fastjoin.sh -a <left_file> -b <right_file> [--right-outer-join | --left-outer-join][--noprint-a | --noprint-b][--seq-col-a <int>][--pos-col-a <int>][--seq-col-b <int>][--pos-col-b <int>]

Option                                  Description                            
------                                  -----------                            
-a, --left-file <File>                 left tab-delimited AND SORTED genome position file [required]             
-b, --right-file <File>                right tab-delimited AND SORTED genome position file [required]             
--left-outer-join                      performs a left outer join: all A records will be in output, inexistent B records are showed by a NULL                                 
--noprint-a                            prints only data fields of A           
--noprint-b                            prints only data fields of B           
--pos-col-a <Integer>                  position column for the left file. The first is 1 (default: 2)              
--pos-col-b <Integer>                  position column for the right file. The first is 1 (default: 2)          
--right-outer-join                     performs a right outer join: all B records will be in output,  inexistent A records are showed as   NULL                                 
--seq-col-a <Integer>                  sequence column for the left file. The first is 1 (default: 1)              
--seq-col-b <Integer>                  sequence column for the right file. The first is 1 (default: 1)

Example of use:

pileline-fastjoin -a <GP_file> -b <GP_file>           


  • pileline-rfilter.sh

Filters (or annotates) a positional file with range-based annotations (in bed, gff or custom formats). Each position that is inside of a specific range is annotated.

Usage: 
pileline-rfilter [--annotate] -A <GP_file> [-b <bed> | -g <gff> | -i <intervals_file>] [-w <int>] [--seq-pos-input <int>] [--pos-col-input <int>] 
                 [--seq-col-intervals <int>] [--start-col-intervals <int>] [--end-col-intervals <int>]

Option                                  Description                            
------                                  -----------                            
-A, --input-file                        SORTED genome position file. Use - for stdin [required]                     
--annotate                              Do not filter. Annotate the lines with the ranges (last column)             
-b, --intervals-bed-file <File>         intervals file in BED format [required -b or -g]                            
--end-col-intervals <Integer>           end position column in the intervals file. The first is 1 (default: 3)    
-g, --intervals-gff-file <File>         intervals file in GFF format [required -b or -g]                            
-i, --intervals-gp-file <File>          intervals file in any other format     
--pos-col-input <Integer>               position column in the input file. The first is 1 (default: 2)              
--seq-col-input <Integer>               sequence column in the input file. The first is 1 (default: 1)              
--seq-col-intervals <Integer>           sequence column in the intervals file. The first is 1 (default: 1)          
--start-col-intervals <Integer>         start position column in the intervals file. The first is 1 (default: 2)    
-w, --window <Integer>                  expand each interval with <window> size at both sides (default: 0)

Examples of use:

#on target filtering
pileline-rfilter -A <GP_file.txt> -i <targets.bed>
#simple annotation 
pileline-rfilter --annotate -A <GP_file.txt> -i <annotations.bed>
#multiple annotation (combining UNIX commands)
cat <GP_file.txt> | pileline-rfilter.sh --annotate -A - -i <annotations1.bed> | pileline-rfilter.sh --annotate -A - -i <annotations2.bed> > <myfullyannotated_GP_file.txt> 


  • pileline-genindex.sh

Indexes fasta genome and then can perform range based queries in that genome.

Usage: pileline-genindex [OPTIONS]

Option                                  Description                            
------                                  -----------                            
-g, --genome-file <File>                genome file to index in one unique fasta (on index mode) [required in --index]                               
-i, --index-file <File>                 index file to create (on index mode) or to access (on seek mode) [required]                           
--index                                 Index mode                             
-s                                      Seek position in the form of seq:start[:end] [required]             
--seek                                  Seek mode [default if no --index]

Examples of use:

pileline-genindex --index -g <fasta> -i <new_index>
pileline-genindex --seek -i <index> -s chr1:1000:2000


  • pileline-pileup2sift.sh

Generates a SIFT-compatible change column for each variant line in pileup files.

Usage: pileline-pileup2sift -i <pileup>

Option                                  Description                            
------                                  -----------                            
-i, --pileup-file                       variants pileup (pileup -c) file to annotate. Use - for stdin.

Example of use:

pileline-pileup2sift -i <pileup_file> 


  • pileline-pileup2polyphen.sh

Generates a Polyhen-compatible change column for each variant line in pileup files.

Usage: pileline-pileup2polyphen -i <pileup>

Option                                  Description                            
------                                  -----------                            
-i, --pileup-file                       variants pileup (pileup -c) file to annotate. Use - for stdin.

Example of use:

pileline-pileup2polyphen -i <pileup_file>


  • pileline-pileup2firestar.sh

Generates a Firestar-compatible input for each variant line in pileup files.

Usage: pileline-pileup2firestar -i <pileup>

Option                                  Description                            
------                                  -----------                            
-i, --pileup-file                       variants pileup (pileup -c) file to annotate. Use - for stdin.

Example of use:

pileline-pileup2firestar -i <pileup_file>

Analysis Commands

  • pileline-2smc

Looks for discrepancies in genotypes of two samples (i.e.: case vs control) in pileup format files. It also can annotate each output position with a user provided BED file containing custom annotations.

Usage: pileline-2smc -a <pileup> -b <pileup> --variants-a <pileup> --variants-b <pileup> [OPTIONS]

Option                                  Description                            
------                                  -----------                            
--AdiscrepantB                          Calculate variants present in sample A  (-v) and in sample B (-w), but with different genotype                   
-a, --genotype-a <File>                 Whole genotype pileup (with MAQ consensus) file of sample A [required]          
--all                                   Calculate all mutations (onlyA, onlyB, AdiscrepantB and both) [default]     
--annotate <File>                       Annotated positions with those of the provided BED file                    
-b, --genotype-b <File>                Whole genotype pileup (with MAQ consensus) file of sample B [required]          
--both                                  Calculate variants present in sample A (-v) and in sample B (-w) and with the same genotype                    
--cq-column <Integer>                   MAQ consensus quality column in variants and genotype files (default: 5)                         
-d, --genotype-depth-filter-threshold   genotype depth filter threshold <Integer>(default: 10)                        
-o, --out-prefix <File>                 Output files prefix [required]         
--onlyA                                 Calculate mutations which are variants in sample A (-v) and are homozigous-reference in B                       
--onlyB                                 Calculate mutations which are variants in sample B (-w) and are homozigous- reference in A                       
-r, --reference-column <Integer>        reference genotype column in genotype files (options -a and -b) (default:3)                                   
-t, --genotype-depth-filter-column      genotype depth column (default: 8) <Integer>                                                                    
-v, --variants-a <File>                 Variants of interest in pileup format (with MAQ consensus) of sample A [required]                           
-w, --variants-b <File>                 Variants of interest in pileup format (with MAQ consensus) of sample B [required]

Example of use:

pileline-2smc -a <pileup> -b <pileup> --variants-a <pileup> --variants-b <pileup> --annotate <bed> -d 30


  • pileline-nsmc

Takes the output of several 2smc comparisons commands to reports where variants are reproduced. It can operate in two modes: by exact position or by intervals (i.e.:genes). For intervals mode, you have to provide an additional intervals file (.bed, .gff or custom)

Usage (by position): pileline-nsmc -a <GP_file> -a <GP_file> -a <GP_file>... -b <GP_file> -b <GP_file> -b <GP_file>... [OPTIONS] -o <OUTFILE>
Usage (by intervals): pileline-nsmc [-B <bed_file> | -G <gff_file>] -a <GP_file> -a <GP_file> -a <GP_file>... -b <GP_file> -b <GP_file> -b <GP_file>... [OPTIONS] -o <OUTFILE>

Option                                  Description                            
------                                  -----------                            
-B, --intervals-bed-file <File>         intervals file in BED format.          
                                         Intervals are taken as 0-based and   
                                         the end-position is exclusive        
-G, --intervals-gff-file <File>         intervals file in GFF format.          
                                         Intervals are taken as 1-based and   
                                         the end-position is inclusive        
-I, --intervals-gp-file <File>          intervals file in custom format. You   
                                         can provide the columns for          
                                         sequence, start and stop in the      
                                         appropriate parameters               
-a                                      variant pileup (pileup -c)  files for  
                                         sample A (one or more. i.e: -a file1 
                                         -a file2 -a file3...) [required]     
-b                                      variant pileup (pileup -c) files for   
                                         sample B (one or more. i.e: -b file4 
                                         -b file5 -b file6...) [required]     
-e, --expand-cells                      In exact position mode: fill each cell 
                                         in the output with the corresponding 
                                         pileup line if it exists, separated  
                                         by '|' (default will appear YES or   
                                         NO in the cell). In intervals mode:  
                                         show how many entries in the         
                                         variants file are within the         
                                         interval (this is not taken into     
                                         account in the fisher test)          
--end-col-intervals <Integer>           end position column in the intervals   
                                         file. The first is 1 (default: 3)    
-o                                      output file. Use - for stdout          
                                         [required]                           
--seq-col-intervals <Integer>           sequence column in the intervals file. 
                                         The first is 1 (default: 1)          
--start-col-intervals <Integer>         start position column in the intervals 
                                         file. The first is 1 (default: 2)    

Examples of use:

pileline-nsmc -a <GP_file> -a <GP_file> -b <GP_file> -b <GP_file> -o <OUTFILE>
pileline-nsmc -G <gff_file> -a <GP_file> -a <GP_file> -a <GP_file> -o <OUTFILE>


  • pileline-genotest.sh

Calculates the NGS performance on genotyping, surveying a set of genomic positions whose genotype is known in the sample.

Usage: 
pileline-genotest --create-genotest-file <new_genotest> -p <pileup> -g <gold> -r <reference>
pileline-genotest -a <new_genotest> -t <int> [--print-help-table] [--depth-filter <int>]
pileline-genotest -a <new_genotest> --roc
pileline-genotest -a <new_genotest> --batch-t 0,255,1

Option                                  Description                            
------                                  -----------                            
-a, --genotest-file <File>              the genotest intermediate file to analyze [required if no -c]          
--batch-t                               A sequence of thresholds to test, specified as <start>,<end>,<step> [required -t or --roc or --batch-t]  
-c, --create-genotest-file <File>       creates the genotest intermediate file for further analysis                 
--depth-filter <Integer>                consider as no base called positions below this depth filter (default: 0) 
-g, --gold-genotype <File>              gold genotype (chr<tab>pos<tab>genotpye (two letters, including NN) [required if -c]                     
-p, --pileup <File>                     complete pileup [required if -c]       
--print-help-table                      print measures help table              
-r, --ref-genome <File>                 index of the reference genome (created  with gentools-genindex) [required if -c]                                  
--roc                                   output roc values [required -t or -- roc or --batch-t]                    
--simple-output                         print only the performance measures in a single line. Useful to include in scripts                              
-t, --threshold <Double>                SQPq threshold to report variant [required -t or --roc or --batch-t] (default: 1.0)

Example of use:

# Warning: Check that your alleles in the <gold_genotype.sorted> file are expressed in the same strand as the 
#          reference genome sequence used in your NGS experiment. Typically forward (+) strand. 

## Step1.

#Create reference index <ref_genome.pileline> using pileline-genindex command.
pileline-genindex --index -i  <ref_genome.pileline> -g <ref_genome.fa>

## Step2.
#Create genotest file (required).
pileline-genotest --create-genotest-file <experiment.genotest> -p <GP_file.txt> -g <gold_genotype.sorted> -r <ref_genome.pileline>

## Step3. QC analysis.

#Generate a metrics table of performance at a given threshold.
pileline-genotest -a <experiment.genotest> -t <snpq_treshold>

#Generate all performance metrics for several thresholds
pileline-genotest -a <experiment.genotest> --batch-t 0,255,1

#Generate values for ROC curve plot (outfile compatible to ROCR R package)
pileline-genotest -a <experiment.genotest> --roc
Personal tools