Quick Start
From PileLine
(→PipeLine Guided Example) |
|||
Line 36: | Line 36: | ||
* [[myoutput1.txt.onlyB]]: Variants found in Case1 but not in Control1 (i.e. somatic mutations or SNPs) | * [[myoutput1.txt.onlyB]]: Variants found in Case1 but not in Control1 (i.e. somatic mutations or SNPs) | ||
* [[myoutput1.txt.both]]: Case1 and Control1 variants are similar alleles and both of them are different to the reference genome allele. (i.e. germ-line mutations or SNPs) | * [[myoutput1.txt.both]]: Case1 and Control1 variants are similar alleles and both of them are different to the reference genome allele. (i.e. germ-line mutations or SNPs) | ||
- | * [[myoutput1.txt.AdiscrepantB]]: Both Case1 and Control1 variants are different alleles and both of them are different to the reference genome allele. (i.e. germ-line mutations mutated or SNPs) | + | * [[myoutput1.txt.AdiscrepantB]]: Both Case1 and Control1 variants are different alleles and both of them are different to the reference genome allele. (i.e. germ-line mutations mutated or SNPs). |
+ | |||
+ | See an example in this table: | ||
+ | |||
+ | {| border="2" | ||
+ | |+ '''''pileline-2smc.sh''''' output files | ||
+ | ! Ref genome !! Control (-a file) !! Case (-b file) !! Output File Name | ||
+ | |- | ||
+ | ! T | ||
+ | ! A !! T !! myoutput1.txt.onlyA | ||
+ | |- | ||
+ | ! T | ||
+ | ! T !! G !! myoutput1.txt.onlyB | ||
+ | |- | ||
+ | ! T | ||
+ | ! A !! A !! myoutput1.txt.both | ||
+ | |- | ||
+ | ! T | ||
+ | ! C !! G !! myoutput1.txt.AdiscrepantB | ||
+ | |- | ||
+ | |} | ||
Line 45: | Line 65: | ||
–o ./myoutput2.txt | –o ./myoutput2.txt | ||
- | '''3.''' You can also compare samples to report | + | '''3.''' You can also compare multiple samples to report consistent variants. This functionality is particularly useful when you want to find common variants in biological replicates. You should use '''''pileline-nsmc.sh''''' command: |
$ sh YOUR_PATH_TO_PILELINE/cmd/pileline-nsmc.sh | $ sh YOUR_PATH_TO_PILELINE/cmd/pileline-nsmc.sh |
Revision as of 11:23, 22 June 2010
PipeLine Input Files
PileLine is capable to handle, filter and compare genomic position files (GP) including standard pileup, BED,GFF, or VCF files.
Basically, GP are tabular files where the two first columns contain chromosome name and position coordinate respectively. Additional optional fields are accepted in PileLine, see an example of GP input file below:
10 118829 optional1 optional2 optional3 ... 10 121207 optional1 optional2 optional3 ... 10 121337 optional1 optional2 optional3 ... 10 121636 optional1 optional2 optional3 ...
PipeLine Guided Example
1. Download GP example files (pileup format) to your working directory:
- Experiment 1.
- Experiment 2.
Each .zip file contains 2 pileup files:
- Whole pileup file (.pileup)
- Variants against reference genome pileup file (.variants.pileup).
2. You may compare Case1 vs Control1 at variant level using pileline-2smc.sh functionality. Use this command line:
$ cd DOWNLOADED_FILES_DIRECTORY $ sh YOUR_PATH_TO_PILELINE/cmd/pileline-2smc.sh –a ./Control1.pileup –b ./Case1.pileup –v ./Control1varfilter.pileup –w ./Case1.variants.pileup –o ./myoutput1.txt
Executing this code you will obtain 4 output files:
- myoutput1.txt.onlyA: Variants found in Control1 but not in Case1 (i.e. germ-line reverted mutations or SNPs)
- myoutput1.txt.onlyB: Variants found in Case1 but not in Control1 (i.e. somatic mutations or SNPs)
- myoutput1.txt.both: Case1 and Control1 variants are similar alleles and both of them are different to the reference genome allele. (i.e. germ-line mutations or SNPs)
- myoutput1.txt.AdiscrepantB: Both Case1 and Control1 variants are different alleles and both of them are different to the reference genome allele. (i.e. germ-line mutations mutated or SNPs).
See an example in this table:
Ref genome | Control (-a file) | Case (-b file) | Output File Name |
---|---|---|---|
T | A | T | myoutput1.txt.onlyA |
T | T | G | myoutput1.txt.onlyB |
T | A | A | myoutput1.txt.both |
T | C | G | myoutput1.txt.AdiscrepantB |
Now, run pileline-2smc.sh to compare Case2 vs Control2:
$ sh YOUR_PATH_TO_PILELINE/cmd/pileline-2smc.sh –a ./Control2.pileup –b ./Case2.pileup –v ./Control2.variants.pileup –w ./Case2.variants.pileup –o ./myoutput2.txt
3. You can also compare multiple samples to report consistent variants. This functionality is particularly useful when you want to find common variants in biological replicates. You should use pileline-nsmc.sh command:
$ sh YOUR_PATH_TO_PILELINE/cmd/pileline-nsmc.sh -a ./Case1.variants.pileup -b ./Case2.variants.pileup -o ./mycommonvariants.txt
In this example we have compared 2 samples (Case1 and Case2 variants) but pileline-nsmc.sh can be employed for n samples.
4. At this point it could be useful to annotate SNPs in variants found between Case1 and Control1 to discard SNPs.
To this end, you should execute pileline-fastjoin.sh command as follows:
$ sh YOUR_PATH_TO_PILELINE/cmd/pileline-fastjoin.sh –a ./myoutput1.txt -b YOUR_PATH_TO_PILELINE/dbSNP130.txt --left-outer-join > ./mydbSNPannotation1.txt