Main Page
From PileLine
(→Use Cases) |
(→Welcome to PileLine Wiki) |
||
| Line 1: | Line 1: | ||
= Welcome to PileLine Wiki = | = Welcome to PileLine Wiki = | ||
| - | '''PileLine''' (Pileup pipeLine) is a flexible command-line toolkit for efficient handling, filtering, and comparison of | + | '''PileLine''' (Pileup pipeLine) is a flexible command-line toolkit for efficient handling, filtering, and comparison of genomic position (GP) files produced by next-generation sequencing experiments (i.e. [http://samtools.sourceforge.net/pileup.shtml pileup] files from [http://samtools.sourceforge.net SAMtools]). |
| - | '''PileLine''' is designed to be memory efficient by performing on-disk operations over sorted | + | '''PileLine''' is designed to be memory efficient by performing on-disk operations over sorted GP files directly. |
'''PileLine''' is available for downloading at: [http://sourceforge.net/projects/pileline http://sourceforge.net/projects/pileline] | '''PileLine''' is available for downloading at: [http://sourceforge.net/projects/pileline http://sourceforge.net/projects/pileline] | ||
==Main Features== | ==Main Features== | ||
| - | # Filtering and comparison of | + | # Filtering and comparison of GP files. |
| - | # Full annotation of | + | # Full annotation of GP files with human [http://www.ncbi.nlm.nih.gov/projects/SNP/ dbSNP], [http://www.genenames.org/ HGNC Gene Symbol] and [http://www.ensembl.org/ Ensembl IDs]. Custom annotations are also allowed and may be supplied through standard [http://genome.ucsc.edu/FAQ/FAQformat#format1 .BED] or [http://genome.ucsc.edu/FAQ/FAQformat#format3 .GFF] files. |
# [http://sift.jcvi.org/ SIFT] and [http://genetics.bwh.harvard.edu/pph2/ PolyPhen-2] compatible outputs to facilitate the biological interpretation of huge lists of variants. | # [http://sift.jcvi.org/ SIFT] and [http://genetics.bwh.harvard.edu/pph2/ PolyPhen-2] compatible outputs to facilitate the biological interpretation of huge lists of variants. | ||
# Genotyping quality control functionality to estimate performance metrics [http://www.ncbi.nlm.nih.gov/pubmed/19327155 (Harismendi et al. 2009)] on detecting homo/heterozigote variants against a given gold standard genotype. | # Genotyping quality control functionality to estimate performance metrics [http://www.ncbi.nlm.nih.gov/pubmed/19327155 (Harismendi et al. 2009)] on detecting homo/heterozigote variants against a given gold standard genotype. | ||
| Line 15: | Line 15: | ||
===Processing Commands=== | ===Processing Commands=== | ||
*'''''pileline-fastseek.sh''''' | *'''''pileline-fastseek.sh''''' | ||
| - | Prints a given range of a | + | Prints a given range of a GP file. |
*'''''pileline-fastsjoin.sh''''' | *'''''pileline-fastsjoin.sh''''' | ||
| Line 24: | Line 24: | ||
*'''''pileline-sort.sh''''' | *'''''pileline-sort.sh''''' | ||
| - | Sorts | + | Sorts GP files by coordinate. |
*'''''pileline-genindex.sh''''' | *'''''pileline-genindex.sh''''' | ||
| Line 45: | Line 45: | ||
*'''Perform 2 samples comparison''' | *'''Perform 2 samples comparison''' | ||
pileline-2smc.sh | pileline-2smc.sh | ||
| - | –a < | + | –a <GPfile_A.txt> –b <GPfile_B.txt> |
| - | –v < | + | –v <variants_GPfile_A.txt> –w <variants_GPfile_B.txt> |
–o <out.txt> -d <min_depth> | –o <out.txt> -d <min_depth> | ||
*'''Perform n samples comparison''' | *'''Perform n samples comparison''' | ||
pileline-nsmc.sh | pileline-nsmc.sh | ||
| - | --a-samples< | + | --a-samples<GPfile_a1>,<GPfile_a2>,<GPfile_a3> |
| - | --b-samples < | + | --b-samples <GPfile_b1>,<GPfile_b2>,<GPfile_b3> |
| - | *'''Sort | + | *'''Sort GP files''' |
| - | pileline-sort.sh -i < | + | pileline-sort.sh -i <input_GP_file.txt> -o <outfile.sorted.txt> |
| - | *'''Annotate a | + | *'''Annotate a GP file with dbSNP''' |
| - | pileline-fastjoin.sh –a < | + | pileline-fastjoin.sh –a <GP_file.txt> -b dbSNP130.txt --left-outer-join |
| - | *'''Annotate a | + | *'''Annotate a GP file with genes''' |
| - | pileline-rfilter.sh --annotate –A < | + | pileline-rfilter.sh --annotate –A <GP_file.txt> –b <genes.bed> |
*'''Filter pileup to exon loci''' | *'''Filter pileup to exon loci''' | ||
| - | pileline-rfilter.sh –A < | + | pileline-rfilter.sh –A <GP_file.txt> –b <exons.bed> |
*'''Perform a genotyping test for quality control''' | *'''Perform a genotyping test for quality control''' | ||
| Line 70: | Line 70: | ||
#Create genotest file (required). | #Create genotest file (required). | ||
| - | pileline-genotest --create-genotest-file <experiment.genotest> –p < | + | pileline-genotest --create-genotest-file <experiment.genotest> –p <GP_file.txt> –g <gold_genotype.sorted> -r <ref_genome.pileline> |
## Step2. QC analysis. | ## Step2. QC analysis. | ||
Revision as of 13:28, 10 June 2010
Contents |
Welcome to PileLine Wiki
PileLine (Pileup pipeLine) is a flexible command-line toolkit for efficient handling, filtering, and comparison of genomic position (GP) files produced by next-generation sequencing experiments (i.e. pileup files from SAMtools). PileLine is designed to be memory efficient by performing on-disk operations over sorted GP files directly.
PileLine is available for downloading at: http://sourceforge.net/projects/pileline
Main Features
- Filtering and comparison of GP files.
- Full annotation of GP files with human dbSNP, HGNC Gene Symbol and Ensembl IDs. Custom annotations are also allowed and may be supplied through standard .BED or .GFF files.
- SIFT and PolyPhen-2 compatible outputs to facilitate the biological interpretation of huge lists of variants.
- Genotyping quality control functionality to estimate performance metrics (Harismendi et al. 2009) on detecting homo/heterozigote variants against a given gold standard genotype.
PileLine Commands
Processing Commands
- pileline-fastseek.sh
Prints a given range of a GP file.
- pileline-fastsjoin.sh
Joins two positional files.
- pileline-rfilter.sh
Filters (or annotates) a positional file with range-based annotations (in bed format). Each position that is inside of a specific range is annotated.
- pileline-sort.sh
Sorts GP files by coordinate.
- pileline-genindex.sh
Indexes fasta genome and then can perform range based queries in that genome.
Analysis Commands
- pileline-2smc.sh
Looks for discrepancies in genotypes of two samples (i.e.: case vs control). It also can annotate each output position with a given positional file containing custom annotations (i.e. dbSNP). Also produces a SIFT and PolyPhen-2 compatible outfiles.
- pileline-nsmc.sh
Takes the output of several 2smc comparisons commands to reports where variants are reproduced.
- pileline-genotest.sh
Calculates the NGS performance on genotyping, surveying a set of genomic positions whose genotype is known in the sample.
Use Cases
- Perform 2 samples comparison
pileline-2smc.sh –a <GPfile_A.txt> –b <GPfile_B.txt> –v <variants_GPfile_A.txt> –w <variants_GPfile_B.txt> –o <out.txt> -d <min_depth>
- Perform n samples comparison
pileline-nsmc.sh --a-samples<GPfile_a1>,<GPfile_a2>,<GPfile_a3> --b-samples <GPfile_b1>,<GPfile_b2>,<GPfile_b3>
- Sort GP files
pileline-sort.sh -i <input_GP_file.txt> -o <outfile.sorted.txt>
- Annotate a GP file with dbSNP
pileline-fastjoin.sh –a <GP_file.txt> -b dbSNP130.txt --left-outer-join
- Annotate a GP file with genes
pileline-rfilter.sh --annotate –A <GP_file.txt> –b <genes.bed>
- Filter pileup to exon loci
pileline-rfilter.sh –A <GP_file.txt> –b <exons.bed>
- Perform a genotyping test for quality control
## Step1. #Create genotest file (required). pileline-genotest --create-genotest-file <experiment.genotest> –p <GP_file.txt> –g <gold_genotype.sorted> -r <ref_genome.pileline> ## Step2. QC analysis. #Generate a metrics table of performance at a given threshold. pileline-genotest -a <experiment.genotest> -t <snpq_treshold> #Generate all performance metrics for several thresholds pileline-genotest -a <experiment.genotest> --batch-t 0,255,1 #Generate values for ROC curve plot (outfile compatible to ROCR R package) pileline-genotest -a <experiment.genotest> --roc


