Commands reference
From PileLine
(→Analysis Commands) |
(→Analysis Commands) |
||
Line 208: | Line 208: | ||
*'''''pileline-nsmc''''' | *'''''pileline-nsmc''''' | ||
- | Takes the output of several 2smc comparisons commands to reports where variants are reproduced. It can operate in two modes: by exact position or by intervals (i.e.:genes). For intervals mode, you have to provide an additional intervals file (.bed, .gff or custom) | + | Takes the output of several 2smc comparisons commands to reports where variants are reproduced. It can operate in two modes: by exact position or by intervals (i.e.:genes). For intervals mode, you have to provide an additional intervals file (.bed, .gff or custom).The INPUT FILES MUST BE SORTED. |
'''Usage (by position):''' pileline-nsmc -a <GP_file> -a <GP_file> -a <GP_file>... -b <GP_file> -b <GP_file> -b <GP_file>... [OPTIONS] -o <OUTFILE> | '''Usage (by position):''' pileline-nsmc -a <GP_file> -a <GP_file> -a <GP_file>... -b <GP_file> -b <GP_file> -b <GP_file>... [OPTIONS] -o <OUTFILE> |
Latest revision as of 12:18, 31 August 2011
Processing and Annotation Commands
- pileline-fastseek
Prints a given range of a GP file.
Usage: pileline-fastseek -p <GP_file> -s <range> [--seq-col <int>] [--pos-col <int>] Option Description ------ ----------- -f, --regions-file <File> File with seek positions in the form of seq:start[:end] per line [required if no -s]. Please note: if several regions are provided, they will not be merged, and the output will be ordered in the same order as the input intervals -p, --gp-file <File> SORTED genome position file to seek [required] --pos-col <Integer> position column for the gp-file. The first is 1 (default: 2) -s seek positions in the form of seq:start[:end] [required]. Please Note: if several regions are provided, they will not be merged, and the output will be ordered in the same order as the input intervals --seq-col <Integer> sequence column for gp-file. The first is 1 (default: 1)
Example of use:
pileline-fastseek -p <GP_file> -s chr10:100:10000
- pileline-sort
Sorts a GP file by position coordinate.
Usage:pileline-sort -i <GP_file> -o <outfile> [OPTIONS] Option Description ------ ----------- -T, --temp-dir Directory for temporary files [default is the system's temp dir] -i, --input-file Input file to sort. Use - for stdin [required] --max-chars-chunk <Long> max chars per temporal file (default: 2000000) -o, --output-file Output sorted file. Use - for stdout [required] --pos-col <Integer> position column in the input file. The first is 1 (default: 2) --seq-col <Integer> sequence column in the input file. The first is 1 (default: 1)
Example of use:
pileline-sort -i <GP_file> -o <outfile>
- pileline-fastjoin
Joins two sorted GP files. Note: You may use pileline-sort whether you need to sort GP files to run pileline-fastjoin command.
Usage: pileline-fastjoin.sh -a <left_file> -b <right_file> [--right-outer-join | --left-outer-join][--noprint-a | --noprint-b][--seq-col-a <int>][--pos-col-a <int>][--seq-col-b <int>][--pos-col-b <int>] Option Description ------ ----------- -a, --left-file <File> left tab-delimited AND SORTED genome position file [required] -b, --right-file <File> right tab-delimited AND SORTED genome position file [required] --left-outer-join performs a left outer join: all A records will be in output, inexistent B records are showed by a NULL --noprint-a prints only data fields of A --noprint-b prints only data fields of B --pos-col-a <Integer> position column for the left file. The first is 1 (default: 2) --pos-col-b <Integer> position column for the right file. The first is 1 (default: 2) --right-outer-join performs a right outer join: all B records will be in output, inexistent A records are showed as NULL --seq-col-a <Integer> sequence column for the left file. The first is 1 (default: 1) --seq-col-b <Integer> sequence column for the right file. The first is 1 (default: 1)
Example of use:
pileline-fastjoin -a <GP_file> -b <GP_file>
- pileline-fulljoin
Merges two or more GP files, printing for each genome position, the corresponding line of each input file (if any).
Usage: pileline-fulljoin -i <GP_file> -i <GP_file2> [-i <GP_file3> ...] [--seq-col <int>] [--pos-col <int>] Option Description ------ ----------- -i, --gp-file <File> SORTED genome position files to full join [required 2 or more] --pos-col <Integer> position column for all gp-files. The first column is 1 (default: 2) --pos-cols comma-separated position columns for each input gp- file (--pos-col will be ignored). The first column is 1 --seq-col <Integer> sequence column for all gp-files. The first column is 1 (default: 1) --seq-cols comma-separated sequence columns for each input gp- file (--seq-col will be ignored). The first column is 1
Example of use:
pileline-fulljoin -i <GP_file1> -i <GP_file2> -i <GP_file3>
- pileline-rfilter.sh
Filters (or annotates) a positional file with range-based annotations (in bed, gff or custom formats). Each position that is inside of a specific range is annotated.
Usage: pileline-rfilter [--annotate] -A <GP_file> [-b <bed> | -g <gff> | -i <intervals_file>] [-w <int>] [--seq-pos-input <int>] [--pos-col-input <int>] [--seq-col-intervals <int>] [--start-col-intervals <int>] [--end-col-intervals <int>] Option Description ------ ----------- -A, --input-file SORTED genome position file. Use - for stdin. [required] Positions are considered 1-based --annotate Do not filter. Annotate the lines with the ranges (last column) -b, --intervals-bed-file <File> intervals file in BED format.[required -b or -g] Intervals are taken as 0-based and the end-position is exclusive --end-col-intervals <Integer> end position column in the intervals file. The first is 1 (default: 3) -g, --intervals-gff-file <File> intervals file in GFF format.[required -b or -g] Intervals are taken as 1-based and the end-position is inclusive -i, --intervals-gp-file <File> intervals file in any other format --pos-col-input <Integer> position column in the input file. The first is 1 (default: 2) --seq-col-input <Integer> sequence column in the input file. The first is 1 (default: 1) --seq-col-intervals <Integer> sequence column in the intervals file.The first is 1 (default: 1) --start-col-intervals <Integer> start position column in the intervals file. The first is 1 (default: 2) -v, --inverse inverse filtering, that is, output lines that are OUTSIDE of the provided intervals -w, --window <Integer> expand each interval with <window> size at both sides (default: 0)
Examples of use:
#on target filtering pileline-rfilter -A <GP_file.txt> -i <targets.bed> #simple annotation pileline-rfilter --annotate -A <GP_file.txt> -i <annotations.bed> #multiple annotation (combining UNIX commands) cat <GP_file.txt> | pileline-rfilter.sh --annotate -A - -i <annotations1.bed> | pileline-rfilter.sh --annotate -A - -i <annotations2.bed> > <myfullyannotated_GP_file.txt>
Please note: If you are experimenting memory issues, please give the intervals file in a compressed+indexed form with tabix. (Please see: Compress/index input with bgzip+tabix)
- pileline-genindex.sh
Indexes fasta genome and then can perform range based queries in that genome.
Usage: pileline-genindex [OPTIONS] Option Description ------ ----------- -g, --genome-file <File> genome file to index in one unique fasta (on index mode) [required in --index] -i, --index-file <File> index file to create (on index mode) or to access (on seek mode) [required] --index Index mode -s Seek position in the form of seq:start[:end] [required] --seek Seek mode [default if no --index]
Examples of use:
pileline-genindex --index -g <fasta> -i <new_index> pileline-genindex --seek -i <index> -s chr1:1000:2000
- pileline-pileup2sift.sh
Generates a SIFT-compatible change column for each variant line in pileup files.
Usage: pileline-pileup2sift -i <pileup> Option Description ------ ----------- -i, --pileup-file variants pileup (pileup -c) file to annotate. Use - for stdin.
Example of use:
pileline-pileup2sift -i <pileup_file>
- pileline-pileup2polyphen.sh
Generates a Polyhen-compatible change column for each variant line in pileup files.
Usage: pileline-pileup2polyphen -i <pileup> Option Description ------ ----------- -i, --pileup-file variants pileup (pileup -c) file to annotate. Use - for stdin.
Example of use:
pileline-pileup2polyphen -i <pileup_file>
- pileline-pileup2firestar.sh
Generates a Firestar-compatible input for each variant line in pileup files.
Usage: pileline-pileup2firestar -i <pileup> Option Description ------ ----------- -i, --pileup-file variants pileup (pileup -c) file to annotate. Use - for stdin.
Example of use:
pileline-pileup2firestar -i <pileup_file>
Analysis Commands
- pileline-2smc
Looks for discrepancies in genotypes of two samples (i.e.: case vs control) in pileup format files. It also can annotate each output position with a user provided BED file containing custom annotations. The INPUT FILES MUST BE SORTED.
Usage: pileline-2smc -a <pileup> -b <pileup> --variants-a <pileup> --variants-b <pileup> [OPTIONS] Option Description ------ ----------- --AdiscrepantB Calculate variants present in sample A (-v) and in sample B (-w), but with different genotype -a, --genotype-a <File> Whole genotype pileup (with MAQ consensus) file of sample A [required] --all Calculate all mutations (onlyA, onlyB, AdiscrepantB and both) [default] --annotate <File> Annotated positions with those of the provided BED file -b, --genotype-b <File> Whole genotype pileup (with MAQ consensus) file of sample B [required] --both Calculate variants present in sample A (-v) and in sample B (-w) and with the same genotype --cq-column <Integer> MAQ consensus quality column in variants and genotype files (default: 5) -d, --genotype-depth-filter-threshold genotype depth filter threshold <Integer>(default: 10) -o, --out-prefix <File> Output files prefix [required] --onlyA Calculate mutations which are variants in sample A (-v) and are homozigous-reference in B --onlyB Calculate mutations which are variants in sample B (-w) and are homozigous- reference in A -r, --reference-column <Integer> reference genotype column in genotype files (options -a and -b) (default:3) -t, --genotype-depth-filter-column genotype depth column (default: 8) <Integer> -v, --variants-a <File> Variants of interest in pileup format (with MAQ consensus) of sample A [required] -w, --variants-b <File> Variants of interest in pileup format (with MAQ consensus) of sample B [required]
Example of use:
pileline-2smc -a <pileup> -b <pileup> --variants-a <pileup> --variants-b <pileup> --annotate <bed> -d 30
- pileline-nsmc
Takes the output of several 2smc comparisons commands to reports where variants are reproduced. It can operate in two modes: by exact position or by intervals (i.e.:genes). For intervals mode, you have to provide an additional intervals file (.bed, .gff or custom).The INPUT FILES MUST BE SORTED.
Usage (by position): pileline-nsmc -a <GP_file> -a <GP_file> -a <GP_file>... -b <GP_file> -b <GP_file> -b <GP_file>... [OPTIONS] -o <OUTFILE> Usage (by intervals): pileline-nsmc [-B <bed_file> | -G <gff_file>] -a <GP_file> -a <GP_file> -a <GP_file>... -b <GP_file> -b <GP_file> -b <GP_file>... [OPTIONS] -o <OUTFILE> Option Description ------ ----------- -B, --intervals-bed-file <File> intervals file in BED format. Intervals are taken as 0-based and the end-position is exclusive -G, --intervals-gff-file <File> intervals file in GFF format. Intervals are taken as 1-based and the end-position is inclusive -I, --intervals-gp-file <File> intervals file in custom format. You can provide the columns for sequence, start and stop in the appropriate parameters -a variant pileup (pileup -c) files for sample A (one or more. i.e: -a file1 --allele-col <Integer> allele (variant) column in sample files. The first is 1 (default: 4) -a file2 -a file3...) [required] -b variant pileup (pileup -c) files for sample B (one or more. i.e: -b file4 -b file5 -b file6...) [required] -c, --expand-cells-col <Integer> When using -e, fill the cell with the info of the specified column. (default: 0) -e, --expand-cells In exact position mode: fill each cell in the output with the corresponding pileup line if it exists, separated by '|' (default will appear YES or NO in the cell). In intervals mode: show how many entries in the variants file are within the interval (this is not taken into account in the fisher test) --end-col-intervals <Integer> end position column in the intervals file. The first is 1 (default: 3) -o output file. Use - for stdout [required] --ref-col <Integer> reference genome column in sample files. The first is 1 (default: 3) --seq-col-intervals <Integer> sequence column in the intervals file. The first is 1 (default: 1) --start-col-intervals <Integer> start position column in the intervals file. The first is 1 (default: 2)
Examples of use:
pileline-nsmc -a <GP_file> -a <GP_file> -b <GP_file> -b <GP_file> -o <OUTFILE> pileline-nsmc -G <gff_file> -a <GP_file> -a <GP_file> -a <GP_file> -o <OUTFILE>
- pileline-genotest.sh
Calculates the NGS performance on genotyping, surveying a set of genomic positions whose genotype is known in the sample.
Usage: pileline-genotest --create-genotest-file <new_genotest> -p <pileup> -g <gold> -r <reference> pileline-genotest -a <new_genotest> -t <int> [--print-help-table] [--depth-filter <int>] pileline-genotest -a <new_genotest> --roc pileline-genotest -a <new_genotest> --batch-t 0,255,1 Option Description ------ ----------- -a, --genotest-file <File> the genotest intermediate file to analyze [required if no -c] --batch-t A sequence of thresholds to test, specified as <start>,<end>,<step> [required -t or --roc or --batch-t] -c, --create-genotest-file <File> creates the genotest intermediate file for further analysis --depth-filter <Integer> consider as no base called positions below this depth filter (default: 0) -g, --gold-genotype <File> gold genotype (chr<tab>pos<tab>genotpye (two letters, including NN) [required if -c] -p, --pileup <File> complete pileup [required if -c] --print-help-table print measures help table -r, --ref-genome <File> index of the reference genome (created with gentools-genindex) [required if -c] --roc output roc values [required -t or -- roc or --batch-t] --simple-output print only the performance measures in a single line. Useful to include in scripts -t, --threshold <Double> SQPq threshold to report variant [required -t or --roc or --batch-t] (default: 1.0)
Example of use:
# Warning: Check that your alleles in the <gold_genotype.sorted> file are expressed in the same strand as the # reference genome sequence used in your NGS experiment. Typically forward (+) strand. ## Step1. #Create reference index <ref_genome.pileline> using pileline-genindex command. pileline-genindex --index -i <ref_genome.pileline> -g <ref_genome.fa> ## Step2. #Create genotest file (required). pileline-genotest --create-genotest-file <experiment.genotest> -p <GP_file.txt> -g <gold_genotype.sorted> -r <ref_genome.pileline> ## Step3. QC analysis. #Generate a metrics table of performance at a given threshold. pileline-genotest -a <experiment.genotest> -t <snpq_treshold> #Generate all performance metrics for several thresholds pileline-genotest -a <experiment.genotest> --batch-t 0,255,1 #Generate values for ROC curve plot (outfile compatible to ROCR R package) pileline-genotest -a <experiment.genotest> --roc