From PileLine

(Difference between revisions)

Revision as of 16:16, 8 June 2010

Welcome to PileLine Wiki

PileLine (Pileup pipeLine) is a flexible command-line toolkit for efficient handling, filtering, and comparison of locus text files produced by next-generation sequencing experiments (i.e. pileup files from SAMtools). PileLine is designed to be memory efficient by performing on-disk operations over sorted locus files directly.

PileLine is available for downloading at: http://sourceforge.net/projects/pileline

Main Features

Filtering and comparison of locus text files.
Full annotation of locus files with human dbSNP, HGNC Gene Symbol and Ensembl IDs. Custom annotations are also allowed and may be supplied through standard .BED or .GFF files.
SIFT and PolyPhen-2 compatible outputs to facilitate the biological interpretation of huge lists of variants.
Genotyping quality control functionality to estimate performance metrics (Harismendi et al. 2009) on detecting homo/heterozigote variants against a given gold standard genotype.

PileLine Commands

Processing Commands

pileline-fastseek.sh

Prints a given range of a locus file.

pileline-fastsjoin.sh

Joins two positional files.

pileline-rfilter.sh

Filters (or annotates) a positional file with range-based annotations (in bed format). Each position that is inside of a specific range is annotated.

pileline-sort.sh

Sorts a locus text files by coordinate.

pileline-genindex.sh

Indexes fasta genome and then can perform range based queries in that genome.

Analysis Commands

pileline-2smc.sh

Looks for discrepancies in genotypes of two samples (i.e.: case vs control). It also can annotate each output position with a given positional file containing custom annotations (i.e. dbSNP). Also produces a SIFT and PolyPhen-2 compatible outfiles.

pileline-nsmc.sh

Takes the output of several 2smc comparisons commands to reports where variants are reproduced.

pileline-genotest.sh

Calculates the NGS performance on genotyping, surveying a set of genomic positions whose genotype is known in the sample.

Use Cases

PileLine coupled to SAMtools facilitating pileup handling.

Perform 2 samples comparison.

pileline-2smc.sh 
–a <locusfile_A.txt> –b <locusfile_B.txt>
–v <variants_locusfile_A.txt> –w <variants_locusfile_B.txt> 
–o <out.txt> -d <min_depth>

Perform n samples comparison.

pileline-nsmc.sh
--a-samples<locusfile_a1>,<locusfile_a2>,<locusfile_a3> 
--b-samples <locusfile_b1>,<locusfile_b2>,<locusfile_b3>

Annotate a locus file with dbSNP.

pileline-fastjoin.sh –a <locus_file.txt> -b dbSNP130.txt --left-outer-join

Annotate a locus file with genes.

pileline-rfilter.sh --annotate –A <locus_file.txt> –b <genes.bed>

Filter pileup to exon loci.

pileline-rfilter.sh –A <locus_file.txt> –b <exons.bed>

Perform a genotyping test for quality control.

pileline-genotest –p <locus_file.txt> –g <gold_genotype.sorted> -r <ref_genome.pileline> -t <snpq_treshold>

Perform a genotyping test and displays measures table.

pileline-genotest.sh -p <locus.file.txt> -g <goldgenotype.sorted> -r <ref_genome.pileline> --depth-filter 10 --print-help-table

@@ Line 64: / Line 64: @@
 *'''Perform a genotyping test for quality control.'''
-  pileline-genotest –p <locus_file.txt> –g <gold_genotype> -r <ref_genome.pileline> -t <snpq_treshold>
+  pileline-genotest –p <locus_file.txt> –g <gold_genotype.sorted> -r <ref_genome.pileline> -t <snpq_treshold>
 *'''Perform a genotyping test and displays measures table.'''
   pileline-genotest.sh -p <locus.file.txt> -g <goldgenotype.sorted> -r <ref_genome.pileline> --depth-filter 10 --print-help-table

Main Page

From PileLine

Revision as of 16:16, 8 June 2010

Contents

Welcome to PileLine Wiki

Main Features

PileLine Commands

Processing Commands

Analysis Commands

Use Cases

Views

Personal tools

project

Command-line help

GUI help

Useful links

Search

Toolbox