Main Page

From PileLine

(Difference between revisions)
Jump to: navigation, search
(Use Cases)
(Welcome to PileLine Wiki)
Line 1: Line 1:
= Welcome to PileLine Wiki =
= Welcome to PileLine Wiki =
-
'''PileLine''' (Pileup pipeLine) is  a flexible command-line toolkit for efficient handling, filtering, and comparison of locus text files produced by next-generation sequencing experiments (i.e. [http://samtools.sourceforge.net/pileup.shtml pileup] files from [http://samtools.sourceforge.net SAMtools]).  
+
'''PileLine''' (Pileup pipeLine) is  a flexible command-line toolkit for efficient handling, filtering, and comparison of genomic position (GP) files produced by next-generation sequencing experiments (i.e. [http://samtools.sourceforge.net/pileup.shtml pileup] files from [http://samtools.sourceforge.net SAMtools]).  
-
'''PileLine''' is designed to be memory efficient by performing on-disk operations over sorted locus files directly.  
+
'''PileLine''' is designed to be memory efficient by performing on-disk operations over sorted GP files directly.  
'''PileLine''' is available for downloading at: [http://sourceforge.net/projects/pileline http://sourceforge.net/projects/pileline]
'''PileLine''' is available for downloading at: [http://sourceforge.net/projects/pileline http://sourceforge.net/projects/pileline]
==Main Features==
==Main Features==
-
# Filtering and comparison of locus text files.
+
# Filtering and comparison of GP files.
-
# Full annotation of locus files with human [http://www.ncbi.nlm.nih.gov/projects/SNP/ dbSNP], [http://www.genenames.org/ HGNC Gene Symbol] and [http://www.ensembl.org/ Ensembl IDs]. Custom annotations are also allowed and may be supplied through standard [http://genome.ucsc.edu/FAQ/FAQformat#format1 .BED] or [http://genome.ucsc.edu/FAQ/FAQformat#format3 .GFF] files.  
+
# Full annotation of GP files with human [http://www.ncbi.nlm.nih.gov/projects/SNP/ dbSNP], [http://www.genenames.org/ HGNC Gene Symbol] and [http://www.ensembl.org/ Ensembl IDs]. Custom annotations are also allowed and may be supplied through standard [http://genome.ucsc.edu/FAQ/FAQformat#format1 .BED] or [http://genome.ucsc.edu/FAQ/FAQformat#format3 .GFF] files.  
# [http://sift.jcvi.org/ SIFT]  and [http://genetics.bwh.harvard.edu/pph2/ PolyPhen-2] compatible outputs to facilitate the biological interpretation of huge lists of variants.  
# [http://sift.jcvi.org/ SIFT]  and [http://genetics.bwh.harvard.edu/pph2/ PolyPhen-2] compatible outputs to facilitate the biological interpretation of huge lists of variants.  
# Genotyping quality control functionality to estimate performance metrics [http://www.ncbi.nlm.nih.gov/pubmed/19327155 (Harismendi et al. 2009)] on detecting homo/heterozigote variants against a given gold standard genotype.
# Genotyping quality control functionality to estimate performance metrics [http://www.ncbi.nlm.nih.gov/pubmed/19327155 (Harismendi et al. 2009)] on detecting homo/heterozigote variants against a given gold standard genotype.
Line 15: Line 15:
===Processing Commands===
===Processing Commands===
*'''''pileline-fastseek.sh'''''
*'''''pileline-fastseek.sh'''''
-
Prints a given range of a locus file.
+
Prints a given range of a GP file.
*'''''pileline-fastsjoin.sh'''''
*'''''pileline-fastsjoin.sh'''''
Line 24: Line 24:
*'''''pileline-sort.sh'''''
*'''''pileline-sort.sh'''''
-
Sorts a locus text files by coordinate.
+
Sorts GP files by coordinate.
*'''''pileline-genindex.sh'''''
*'''''pileline-genindex.sh'''''
Line 45: Line 45:
*'''Perform 2 samples comparison'''
*'''Perform 2 samples comparison'''
  pileline-2smc.sh  
  pileline-2smc.sh  
-
  –a <locusfile_A.txt> –b <locusfile_B.txt>
+
  –a <GPfile_A.txt> –b <GPfile_B.txt>
-
  –v <variants_locusfile_A.txt> –w <variants_locusfile_B.txt>  
+
  –v <variants_GPfile_A.txt> –w <variants_GPfile_B.txt>  
  –o <out.txt> -d <min_depth>
  –o <out.txt> -d <min_depth>
*'''Perform n samples comparison'''
*'''Perform n samples comparison'''
  pileline-nsmc.sh
  pileline-nsmc.sh
-
  --a-samples<locusfile_a1>,<locusfile_a2>,<locusfile_a3>  
+
  --a-samples<GPfile_a1>,<GPfile_a2>,<GPfile_a3>  
-
  --b-samples <locusfile_b1>,<locusfile_b2>,<locusfile_b3>
+
  --b-samples <GPfile_b1>,<GPfile_b2>,<GPfile_b3>
-
*'''Sort a locus file'''
+
*'''Sort GP files'''
-
  pileline-sort.sh -i <input_locus_file.txt> -o <outfile.sorted.txt>
+
  pileline-sort.sh -i <input_GP_file.txt> -o <outfile.sorted.txt>
-
*'''Annotate a locus file with dbSNP'''
+
*'''Annotate a GP file with dbSNP'''
-
  pileline-fastjoin.sh –a <locus_file.txt> -b dbSNP130.txt --left-outer-join
+
  pileline-fastjoin.sh –a <GP_file.txt> -b dbSNP130.txt --left-outer-join
-
*'''Annotate a locus file with genes'''
+
*'''Annotate a GP file with genes'''
-
  pileline-rfilter.sh --annotate –A <locus_file.txt> –b <genes.bed>
+
  pileline-rfilter.sh --annotate –A <GP_file.txt> –b <genes.bed>
*'''Filter pileup to exon loci'''
*'''Filter pileup to exon loci'''
-
  pileline-rfilter.sh –A <locus_file.txt> –b <exons.bed>
+
  pileline-rfilter.sh –A <GP_file.txt> –b <exons.bed>
*'''Perform a genotyping test for quality control'''
*'''Perform a genotyping test for quality control'''
Line 70: Line 70:
   
   
  #Create genotest file (required).
  #Create genotest file (required).
-
  pileline-genotest --create-genotest-file <experiment.genotest> –p <locus_file.txt> –g <gold_genotype.sorted> -r <ref_genome.pileline>
+
  pileline-genotest --create-genotest-file <experiment.genotest> –p <GP_file.txt> –g <gold_genotype.sorted> -r <ref_genome.pileline>
   
   
  ## Step2. QC analysis.
  ## Step2. QC analysis.

Revision as of 13:28, 10 June 2010

Contents

Welcome to PileLine Wiki

PileLine (Pileup pipeLine) is a flexible command-line toolkit for efficient handling, filtering, and comparison of genomic position (GP) files produced by next-generation sequencing experiments (i.e. pileup files from SAMtools). PileLine is designed to be memory efficient by performing on-disk operations over sorted GP files directly.

PileLine is available for downloading at: http://sourceforge.net/projects/pileline

Main Features

  1. Filtering and comparison of GP files.
  2. Full annotation of GP files with human dbSNP, HGNC Gene Symbol and Ensembl IDs. Custom annotations are also allowed and may be supplied through standard .BED or .GFF files.
  3. SIFT and PolyPhen-2 compatible outputs to facilitate the biological interpretation of huge lists of variants.
  4. Genotyping quality control functionality to estimate performance metrics (Harismendi et al. 2009) on detecting homo/heterozigote variants against a given gold standard genotype.

PileLine Commands

Processing Commands

  • pileline-fastseek.sh

Prints a given range of a GP file.

  • pileline-fastsjoin.sh

Joins two positional files.

  • pileline-rfilter.sh

Filters (or annotates) a positional file with range-based annotations (in bed format). Each position that is inside of a specific range is annotated.

  • pileline-sort.sh

Sorts GP files by coordinate.

  • pileline-genindex.sh

Indexes fasta genome and then can perform range based queries in that genome.

Analysis Commands

  • pileline-2smc.sh

Looks for discrepancies in genotypes of two samples (i.e.: case vs control). It also can annotate each output position with a given positional file containing custom annotations (i.e. dbSNP). Also produces a SIFT and PolyPhen-2 compatible outfiles.

  • pileline-nsmc.sh

Takes the output of several 2smc comparisons commands to reports where variants are reproduced.

  • pileline-genotest.sh

Calculates the NGS performance on genotyping, surveying a set of genomic positions whose genotype is known in the sample.

Use Cases

PileLine coupled to SAMtools facilitating pileup handling.
  • Perform 2 samples comparison
pileline-2smc.sh 
–a <GPfile_A.txt> –b <GPfile_B.txt>
–v <variants_GPfile_A.txt> –w <variants_GPfile_B.txt> 
–o <out.txt> -d <min_depth>
  • Perform n samples comparison
pileline-nsmc.sh
--a-samples<GPfile_a1>,<GPfile_a2>,<GPfile_a3> 
--b-samples <GPfile_b1>,<GPfile_b2>,<GPfile_b3>
  • Sort GP files
pileline-sort.sh -i <input_GP_file.txt> -o <outfile.sorted.txt>
  • Annotate a GP file with dbSNP
pileline-fastjoin.sh –a <GP_file.txt> -b dbSNP130.txt --left-outer-join
  • Annotate a GP file with genes
pileline-rfilter.sh --annotate –A <GP_file.txt> –b <genes.bed>
  • Filter pileup to exon loci
pileline-rfilter.sh –A <GP_file.txt> –b <exons.bed>
  • Perform a genotyping test for quality control
## Step1. 

#Create genotest file (required).
pileline-genotest --create-genotest-file <experiment.genotest> –p <GP_file.txt> –g <gold_genotype.sorted> -r <ref_genome.pileline>

## Step2. QC analysis.

#Generate a metrics table of performance at a given threshold.
pileline-genotest -a <experiment.genotest> -t <snpq_treshold>

#Generate all performance metrics for several thresholds
pileline-genotest -a <experiment.genotest> --batch-t 0,255,1

#Generate values for ROC curve plot (outfile compatible to ROCR R package)
pileline-genotest -a <experiment.genotest> --roc
Personal tools