Scripts used in GBS pipeline

PERLJ.pl (written by Xuewen Wang)

Joins forward and reverse reads from paired-end data.  See manual for further information.

MERL.pl (written by Debkanta Chakraborty)

Adds As (Ns would be considered low quality bases) at the end of the reads to make them a desired length.

Usage: perl MERL.pl filename.fq N

N is the total read length required

RIOL.pl (written by Debkanta Chakraborty)

Prepares the reads for running the script RSR.pl

Usage: perl RIOL.pl filename.fq

RSR.pl (written by Debkanta Chakraborty)

Removes reads shorter than a defined length.  In order to use this script, another script (RIOL.pl) needs to be applied first.

Usage: perl RSR.pl filename.fq.one.line N

N: minimum length: All reads shorter than N bp will be removed

The output file 'filename.CapturedLengths.fq' contains all reads with a length equal to or more than the given length of N

The output file 'filename.IgnoredLengths.fq' contains all reads with a length smaller than N

FCT.pl (written by Debkanta Chakraborty)

Selects Cstack tags that are uniquely present in a specified number of accessions.  

Usage: perl FCT.pl batch_1.catalog.tags.tsv N X

N: minimum number of accessions that have contributed to a Cstack tag

X: maximum number of accessions that have contributed to a Cstack tag (typically this will be the total number of samples analyzed by Ustacks).

SNPs_ISL.pl (written by Debkanta Chakraborty)

Prepares the reads for running the script Rm_adj_SNPs.pl

Usage: perl SNPs_ISL.pl input_file.vcf (output from GATK)

Rm_adj_SNPs.pl (written by Debkanta Chakraborty)

Removes adjacent SNPs from the .vcf file (output from SNPs_ISL.pl script).

Usage: perl Rm_adj_SNPs.pl filename

SNP_genotyper.py

Scores the SNPs as A, B, H, C and D from filtered GATK file

SNP_cosegregation.py

Retains only a single SNP if two or more SNPs are cosegregating

SNP_selectByParent.py

Removes all SNPs that are not homozygous for different alleles in the parents