Heterogeneous Stock Mice

This page contains data resources relating to the population of 2000 heterogeneous stock (HS) mice phenotyped for over 100 traits, and used in the papers:

Genome-wide genetic association of complex traits in heterogeneous stock mice. Nat Genet 2006 38(8):879-87

Genetic and environmental effects on complex traits in mice. Genetics. 2006 174(2):959-84.

A protocol for high-throughput phenotyping, suitable for quantitative trait analysis in mice. Mamm Genome. 2006 17(2):129-46.

In addition we provide gene expression data from hippocampus, liver and lung measured on subsets of these mice and published in

High resolution mapping of expression QTLs in heterogeneous stock mice in multiple tissues. Genome Res. 2009 19(6):1133-40.

The data have been also used in a number of other studies.

These data were previously available from the defunct web site mus.well.ox.ac.uk/GSCAN.


  • The dataset we recommend for analysis comprises 10168 SNP genotypes mapped to build 37 of the mouse genome. The data are in this directory and are arranged by chromosome.
  • Each chromosome is represented by two files formatted for use by the R HAPPY package.
  • Files are named as chrN.Build37.data, chrN.Build37.alleles where N is the chromosome number (1..19, X).
  • .data files are in ped-format. and contain the SNP genotypes.
  • .alleles files are in HAPPY alleles format and contain the genotypes of the eight founder strains of the HS at these SNPs.
  • Missing data are coded as NA.
  • The file mapfile.txt contains the bp coordinates of these SNPs relative to build37.


  • The raw phenotypes, together with relevant covariates, are in this directory, as a collection of tab-delimited text files. These files, for example Glucose.txt, contain related phenotypes (in this case measurements related to the Glucose Tolernace Test) together with covariates relevant to these phenotypes (ie sigbificantly associated at P<0.05). The phenotypes are combined in the file CombinedPhenotypes.txt.
  • Residual phenotypes corrected for relevant covariates are in this directory. We recommend using these for analysis. Each trait is in a separate file with extension .resid

Gene Expression Data

This directory contains transformed gene expression data for hippocampus, liver and lung. Each is held as an RData matrix. Rows are mice and columns are expression traits. Note that each expression trait has been transformed by a Box-Cox transformation.