Commands reference

From PileLine

(Difference between revisions)
Jump to: navigation, search
(Processing and Annotation Commands)
Line 132: Line 132:
==Analysis Commands==
==Analysis Commands==
-
*'''''pileline-2smc.sh'''''
+
*'''''pileline-2smc'''''
Looks for discrepancies in genotypes of two samples (i.e.: case vs control) in pileup format files. It also can annotate each output position with a user provided BED file containing custom annotations.
Looks for discrepancies in genotypes of two samples (i.e.: case vs control) in pileup format files. It also can annotate each output position with a user provided BED file containing custom annotations.
Line 160: Line 160:
-
*'''''pileline-nsmc.sh'''''
+
*'''''pileline-nsmc'''''
Takes the output of several 2smc comparisons commands to reports where variants are reproduced.
Takes the output of several 2smc comparisons commands to reports where variants are reproduced.

Revision as of 11:26, 28 June 2010

Processing and Annotation Commands

  • pileline-fastseek

Prints a given range of a GP file.

Usage: pileline-fastseek -p <GP_file> -s <range> [--seq-col <int>] [--pos-col <int>]

Option                                  Description                            
------                                  -----------                            
-p, --gp-file <File>                    SORTED genome position file to seek  [required]                           
--pos-col <Integer>                     position column for the gp-file. The first is 1 (default: 2)              
-s                                      seek position in the form of seq:start[:end] [required]             
--seq-col <Integer>                     sequence column for gp-file. The first is 1 (default: 1)

Example:

pileline-fastseek –p <GP_file> -s chr10:100:10000


  • pileline-sort

Sorts a GP file by position coordinate.

Usage:pileline-sort -i <GP_file> -o <outfile> [OPTIONS]

Option                                  Description                            
------                                  -----------                            
-T, --temp-dir                          Directory for temporary files [default is the system's temp dir]            
-i, --input-file                        Input file to sort. Use - for stdin  [required]                           
--max-chars-chunk <Long>                max chars per temporal file (default: 2000000)                             
-o, --output-file                       Output sorted file. Use - for stdout [required]                           
--pos-col <Integer>                     position column in the input file. The first is 1 (default: 2)              
--seq-col <Integer>                     sequence column in the input file. The first is 1 (default: 1)  

Example:

pileline-sort -i <GP_file> -o <outfile>


  • pileline-fastjoin

Joins two sorted GP files. Note: You may use pileline-sort whether you need to sort GP files to run pileline-fastjoin command.

Usage: 
pileline-fastjoin.sh -a <left_file> -b <right_file> [--right-outer-join | --left-outer-join][--noprint-a | --noprint-b][--seq-col-a <int>][--pos-col-a <int>][--seq-col-b <int>][--pos-col-b <int>]

Option                                  Description                            
------                                  -----------                            
-a, --left-file <File>                 left tab-delimited AND SORTED genome position file [required]             
-b, --right-file <File>                right tab-delimited AND SORTED genome position file [required]             
--left-outer-join                      performs a left outer join: all A records will be in output, inexistent B records are showed by a NULL                                 
--noprint-a                            prints only data fields of A           
--noprint-b                            prints only data fields of B           
--pos-col-a <Integer>                  position column for the left file. The first is 1 (default: 2)              
--pos-col-b <Integer>                  position column for the right file. The first is 1 (default: 2)          
--right-outer-join                     performs a right outer join: all B records will be in output,  inexistent A records are showed as   NULL                                 
--seq-col-a <Integer>                  sequence column for the left file. The first is 1 (default: 1)              
--seq-col-b <Integer>                  sequence column for the right file. The first is 1 (default: 1)

Example:

pileline-fastjoin –a <GP_file> -b <GP_file>           


  • pileline-rfilter.sh

Filters (or annotates) a positional file with range-based annotations (in bed, gff or custom formats). Each position that is inside of a specific range is annotated.

Usage: 
pileline-rfilter [--annotate] -A <GP_file> [-b <bed> | -g <gff> | -i <intervals_file>] [-w <int>] [--seq-pos-input <int>] [--pos-col-input <int>] 
                 [--seq-col-intervals <int>] [--start-col-intervals <int>] [--end-col-intervals <int>]

Option                                  Description                            
------                                  -----------                            
-A, --input-file                        SORTED genome position file. Use - for stdin [required]                     
--annotate                              Do not filter. Annotate the lines with the ranges (last column)             
-b, --intervals-bed-file <File>         intervals file in BED format [required -b or -g]                            
--end-col-intervals <Integer>           end position column in the intervals file. The first is 1 (default: 3)    
-g, --intervals-gff-file <File>         intervals file in GFF format [required -b or -g]                            
-i, --intervals-gp-file <File>          intervals file in any other format     
--pos-col-input <Integer>               position column in the input file. The first is 1 (default: 2)              
--seq-col-input <Integer>               sequence column in the input file. The first is 1 (default: 1)              
--seq-col-intervals <Integer>           sequence column in the intervals file. The first is 1 (default: 1)          
--start-col-intervals <Integer>         start position column in the intervals file. The first is 1 (default: 2)    
-w, --window <Integer>                  expand each interval with <window> size at both sides (default: 0)

Examples:

pileline-rfilter –A <GP_file.txt> –i <targets.bed> 
pileline-rfilter --annotate –A <GP_file.txt> –i <annotations.bed> 


  • pileline-genindex.sh

Indexes fasta genome and then can perform range based queries in that genome.

Usage: pileline-genindex [OPTIONS]

Option                                  Description                            
------                                  -----------                            
-g, --genome-file <File>                genome file to index in one unique fasta (on index mode) [required in --index]                               
-i, --index-file <File>                 index file to create (on index mode) or to access (on seek mode) [required]                           
--index                                 Index mode                             
-s                                      Seek position in the form of seq:start[:end] [required]             
--seek                                  Seek mode [default if no --index]

Examples:

pileline-genindex –-index –g <fasta> -i <new_index>
pileline-genindex --seek –i <index> -s chr1:1000:2000


  • pileline-pileup2sift.sh

Generates a SIFT-compatible change column for each variant line in pileup files.

Usage: pileline-pileup2sift -i <pileup>

Option                                  Description                            
------                                  -----------                            
-i, --pileup-file                       variants pileup (pileup -c) file to annotate. Use - for stdin.

Example:

pileline-pileup2sift -i <pileup_file> 


  • pileline-pileup2polyphen.sh

Generates a Polyhen-compatible change column for each variant line in pileup files.

Usage: pileline-pileup2polyphen -i <pileup>

Option                                  Description                            
------                                  -----------                            
-i, --pileup-file                       variants pileup (pileup -c) file to annotate. Use - for stdin.

Example:

pileline-pileup2polyphen -i <pileup_file>

Analysis Commands

  • pileline-2smc

Looks for discrepancies in genotypes of two samples (i.e.: case vs control) in pileup format files. It also can annotate each output position with a user provided BED file containing custom annotations.

Usage: pileline-2smc -a <pileup> -b <pileup> --variants-a <pileup> --variants-b <pileup> [OPTIONS]

Option                                  Description                            
------                                  -----------                            
--AdiscrepantB                          Calculate variants present in sample A  (-v) and in sample B (-w), but with different genotype                   
-a, --genotype-a <File>                 Whole genotype pileup (with MAQ consensus) file of sample A [required]          
--all                                   Calculate all mutations (onlyA, onlyB, AdiscrepantB and both) [default]     
--annotate <File>                       Annotated positions with those of the provided BED file                    
-b, --genotype-b <File>                Whole genotype pileup (with MAQ consensus) file of sample B [required]          
--both                                  Calculate variants present in sample A (-v) and in sample B (-w) and with the same genotype                    
--cq-column <Integer>                   MAQ consensus quality column in variants and genotype files (default: 5)                         
-d, --genotype-depth-filter-threshold   genotype depth filter threshold <Integer>(default: 10)                        
-o, --out-prefix <File>                 Output files prefix [required]         
--onlyA                                 Calculate mutations which are variants in sample A (-v) and are homozigous-reference in B                       
--onlyB                                 Calculate mutations which are variants in sample B (-w) and are homozigous- reference in A                       
-r, --reference-column <Integer>        reference genotype column in genotype files (options -a and -b) (default:3)                                   
-t, --genotype-depth-filter-column      genotype depth column (default: 8) <Integer>                                                                    
-v, --variants-a <File>                 Variants of interest in pileup format (with MAQ consensus) of sample A [required]                           
-w, --variants-b <File>                 Variants of interest in pileup format (with MAQ consensus) of sample B [required]

Example:

pileline-2smc -a <pileup> -b <pileup> --variants-a <pileup> --variants-b <pileup> --annotate <bed> -d 30


  • pileline-nsmc

Takes the output of several 2smc comparisons commands to reports where variants are reproduced.

Usage: pileline-nsmc -a <GP_file> -a <GP_file> -a <GP_file>... -b <GP_file> -b <GP_file>-b <GP_file>... [OPTIONS] -o <OUTFILE>

Option                                  Description                            
------                                  -----------                            
-a                                      variant pileup (pileup -c)  files for sample A (one or more. i.e: -a file1 -a file2 -a file3...) [required]  
-b                                      variant pileup (pileup -c) files for sample B (one or more. i.e: -b file4 -b file5 -b file6...) [required]     
-e, --expand-cells                      fill each cell in the output with the corresponding pileup line if it exists, separated by '|' (default will appear YES or NO in the cell).  
-o                                      output file [required]   

Examples:

pileline-nsmc -a <GP_file> -a <GP_file> -b <GP_file> -b <GP_file> -o <OUTFILE>
pileline-nsmc -a <GP_file> -a <GP_file> -a <GP_file> -o <OUTFILE>


  • pileline-genotest.sh

Calculates the NGS performance on genotyping, surveying a set of genomic positions whose genotype is known in the sample.

Usage: 
pileline-genotest --create-genotest-file <new_genotest> -p <pileup> -g <gold> -r <reference>
pileline-genotest -a <new_genotest> -t <int> [--print-help-table] [--depth-filter <int>]
pileline-genotest -a <new_genotest> --roc
pileline-genotest -a <new_genotest> --batch-t 0,255,1

Option                                  Description                            
------                                  -----------                            
-a, --genotest-file <File>              the genotest intermediate file to analyze [required if no -c]          
--batch-t                               A sequence of thresholds to test, specified as <start>,<end>,<step> [required -t or --roc or --batch-t]  
-c, --create-genotest-file <File>       creates the genotest intermediate file for further analysis                 
--depth-filter <Integer>                consider as no base called positions below this depth filter (default: 0) 
-g, --gold-genotype <File>              gold genotype (chr<tab>pos<tab>genotpye (two letters, including NN) [required if -c]                     
-p, --pileup <File>                     complete pileup [required if -c]       
--print-help-table                      print measures help table              
-r, --ref-genome <File>                 index of the reference genome (created  with gentools-genindex) [required if -c]                                  
--roc                                   output roc values [required -t or -- roc or --batch-t]                    
--simple-output                         print only the performance measures in a single line. Useful to include in scripts                              
-t, --threshold <Double>                SQPq threshold to report variant [required -t or --roc or --batch-t] (default: 1.0)

Example:

## Step1. 

#Create genotest file (required).
pileline-genotest --create-genotest-file <experiment.genotest> –p <GP_file.txt> –g <gold_genotype.sorted> -r <ref_genome.pileline>

## Step2. QC analysis.

#Generate a metrics table of performance at a given threshold.
pileline-genotest -a <experiment.genotest> -t <snpq_treshold>

#Generate all performance metrics for several thresholds
pileline-genotest -a <experiment.genotest> --batch-t 0,255,1

#Generate values for ROC curve plot (outfile compatible to ROCR R package)
pileline-genotest -a <experiment.genotest> --roc
Personal tools