Germline - Genetic Error-tolerant Regional Matching with LINear-time Extension

About

GERMLINE is a program for discovering long shared segments of Identity by Descent (IBD) between pairs of individuals in a large population. It takes as input genotype or haplotype marker data for individuals (as well as an optional known pedigree) and generates a list of all pairwise segmental sharing.

GERMLINE uses a novel hashing & extension algorithm which allows for segment identification in haplotype data in time proportional to the number of individuals. Presently, GERMLINE executes on phased data only. GERMLINE can identify shared segments of any specified length, as well as allow for any number of mismatching markers.

The program has been developed in Itsik Pe'er's Lab of Computational Genetics at Columbia University. It is built in C++ and tested in the Red Hat Linux environment; the source is distributed here in a tar.gz package under the GPL license.

  Download: germline 1.3 (09.03.08)

Usage

From the command line, extract germline with tar xzvf germline-X-X.zip, enter the extracted directory, and compile germline with make. The executable is run as germline <options> which prompts the user for input/output file information and runs the algorithm.

Input

GERMLINE accepts as input the following formats:

Output

Upon completion, GERMLINE generates a .match and .log file in the specified location. Each line in the .match file corresponds to a pairwise shared segment, with the following fields:

Options

The program has several command line options to direct the segmental sharing process:

FlagDefaultDescription
-map-File location for genetic distance map. Uses the PLINK map format.
-min_m5Minimum length for match to be used for imputation (in cM or MB).
-max_err2The maximum number of mismatching markers for a slice to still be considered part of a match.
-from_snp-Indicate the ID of the first SNP to start processing from.
-to_snp-Indicate the ID of the last SNP to end processing with.
-haps-Print all haplotypes with imputed information in a seperate *.haps file.
-print-Print the haplotype sequence for each match along with match information (Warning: This may require a large amount of free space).

Changes

1.3 (09.03.08)
Output format has changed to provide more detailed SNP information (see above).
Can now iteratively process multi-chromosomal data (for PLINK / PED format only).
Genotype calling has been removed for the time being.
Genetic map restructured (see above) and processed as a parameter.

1.2 (08.12.08)
Updated the HapMap format input - auto-detection of trio or unrelated input.

1.1 (06.09.08)
Added options to perform analysis on specific region (see -from_snp, -to_snp flags).
Added option to print haplotypes and matches (see: -haps, -print flags).

Contact

For any questions or comments with regards to this program, please contact the developers at:
{gusev,itsik}@cs.columbia.edu