Main Page

From PileLine

(Difference between revisions)
Jump to: navigation, search
(Processing and Annotation Commands)
(Main Features)
Line 11: Line 11:
# Filtering and comparison of GP files.
# Filtering and comparison of GP files.
# Full annotation of GP files with human [http://www.ncbi.nlm.nih.gov/projects/SNP/ dbSNP], [http://www.genenames.org/ HGNC Gene Symbol] and [http://www.ensembl.org/ Ensembl IDs]. Custom annotations are also allowed and may be supplied through standard [http://genome.ucsc.edu/FAQ/FAQformat#format1 .BED] or [http://genome.ucsc.edu/FAQ/FAQformat#format3 .GFF] files.  
# Full annotation of GP files with human [http://www.ncbi.nlm.nih.gov/projects/SNP/ dbSNP], [http://www.genenames.org/ HGNC Gene Symbol] and [http://www.ensembl.org/ Ensembl IDs]. Custom annotations are also allowed and may be supplied through standard [http://genome.ucsc.edu/FAQ/FAQformat#format1 .BED] or [http://genome.ucsc.edu/FAQ/FAQformat#format3 .GFF] files.  
-
# [http://sift.jcvi.org/ SIFT], [http://genetics.bwh.harvard.edu/pph2/ PolyPhen-2] and [Firestar] compatible outputs to facilitate the biological interpretation of huge lists of variants.  
+
# [http://sift.jcvi.org/ SIFT], [http://genetics.bwh.harvard.edu/pph2/ PolyPhen-2] and [[Firestar]] compatible outputs to facilitate the biological interpretation of huge lists of variants.  
# Genotyping quality control functionality to estimate performance metrics [http://www.ncbi.nlm.nih.gov/pubmed/19327155 (Harismendi et al. 2009)] on detecting homo/heterozigote variants against a given gold standard genotype.
# Genotyping quality control functionality to estimate performance metrics [http://www.ncbi.nlm.nih.gov/pubmed/19327155 (Harismendi et al. 2009)] on detecting homo/heterozigote variants against a given gold standard genotype.

Revision as of 10:25, 28 June 2010

Contents

Welcome to PileLine Wiki

PileLine is a flexible command-line toolkit for efficient handling, filtering, and comparison of genomic position (GP) files produced by next-generation sequencing experiments (i.e. pileup, BED,GFF, or VCF files). PileLine is designed to be memory efficient by performing on-disk operations over sorted GP files directly.

PileLine is available for downloading at: http://sourceforge.net/projects/pileline

Other useful tools for handling GP files: SAMTools, BEDtools, Picard

Main Features

  1. Filtering and comparison of GP files.
  2. Full annotation of GP files with human dbSNP, HGNC Gene Symbol and Ensembl IDs. Custom annotations are also allowed and may be supplied through standard .BED or .GFF files.
  3. SIFT, PolyPhen-2 and Firestar compatible outputs to facilitate the biological interpretation of huge lists of variants.
  4. Genotyping quality control functionality to estimate performance metrics (Harismendi et al. 2009) on detecting homo/heterozigote variants against a given gold standard genotype.

PileLine Commands

Processing and Annotation Commands

  • pileline-fastseek

Prints a given range of a GP file.

  • pileline-sort

Sorts GP files by coordinate.

  • pileline-fastsjoin

Joins two sorted GP files.

  • pileline-rfilter

Filters (or annotates) a positional file with range-based annotations (in bed format). Each position that is inside of a specific range is annotated.

  • pileline-genindex

Indexes fasta genome and then can perform range based queries in that genome.

  • pileline-pileup2sift

Generates SIFT compatible outfiles from pileup files.

  • pileline-pileup2polyphen

Generates PolyPhen-2 compatible outfiles from pileup files.

  • pileline-pileup2firestar

Generates Firestar compatible outfiles from GP files.

Analysis Commands

  • pileline-2smc

Looks for discrepancies in genotypes of two samples (i.e.: case vs control). It also can annotate each output position with a user provided BED file containing custom annotations.

  • pileline-nsmc

Compares n samples reporting consistent variants.

  • pileline-genotest

Calculates the NGS performance on genotyping, surveying a set of genomic positions whose genotype is known in the sample.

Use Cases

PileLine coupled to SAMtools facilitating pileup handling. NS: non-synonymous
  • Perform 2 samples comparison
pileline-2smc.sh 
–a <file_A.pileup> –b <file_B.pileup>
–v <variants_file_A.pileup> –w <variants_file_B.pileup> 
–o <out.txt> -d <min_depth>
  • Perform n samples comparison
pileline-nsmc.sh
--a-samples<GPfile_a1>,<GPfile_a2>,<GPfile_a3> 
--b-samples <GPfile_b1>,<GPfile_b2>,<GPfile_b3>
  • Sort GP files
pileline-sort.sh -i <input_GP_file.txt> -o <outfile.sorted.txt>
  • Annotate a GP file with dbSNP
pileline-fastjoin.sh –a <GP_file.txt> -b dbSNP130.txt --left-outer-join
  • Annotate a GP file with genes
pileline-rfilter.sh --annotate –A <GP_file.txt> –b <genes.bed>
  • Filter pileup to exon loci
pileline-rfilter.sh –A <GP_file.txt> –b <exons.bed>
  • Generate column compatible to SIFT intput
pileline-pileup2sift.sh -i <file.pileup>
  • Perform a genotyping test for quality control
## Step1. 

#Create genotest file (required).
pileline-genotest --create-genotest-file <experiment.genotest> –p <GP_file.txt> –g <gold_genotype.sorted> -r <ref_genome.pileline>

## Step2. QC analysis.

#Generate a metrics table of performance at a given threshold.
pileline-genotest -a <experiment.genotest> -t <snpq_treshold>

#Generate all performance metrics for several thresholds
pileline-genotest -a <experiment.genotest> --batch-t 0,255,1

#Generate values for ROC curve plot (outfile compatible to ROCR R package)
pileline-genotest -a <experiment.genotest> --roc
Personal tools