Breeders have always used pedigrees to manage the genetics of Thoroughbred horses. Pedigree analysis is firmly rooted in our understanding of classical, Mendelian genetics (Gregor Mendel, through his work on pea plants, discovered the fundamental laws of inheritance and deduced that genes come in pairs and are inherited as distinct units, one from each parent).
However, pedigrees only track relationships. The actual genetic variation present in any animal can only be assessed by reading the DNA itself. In 2006, the whole genome sequence of the horse was first reported. It was an expensive undertaking, but since then the costs of doing this work decreased dramatically such that sequencing the whole genome of a horse is a routine activity in research laboratories.
The DNA sequence contains information about the extent of genetic variation, relationship to other horses, relationship to other breeds, levels of inbreeding and can even provide raw material for discovery to potentially deleterious (harmful) genes that interfere with the success of the breeder.
On August 6, 2021, Teruaki Tozaki and colleagues from the Laboratory of Racing Chemistry and Japan Racing Association published a landmark paper describing whole genome sequencing of 101 Thoroughbreds in Japan. While scientists have been identifying DNA variation in short regions for the last three decades, this study is unique in that these scientists collected data on all 2.41 billion DNA nucleotides (the basic structural unit DNA) of the 101 horses.
This work provides a baseline for comparison of different populations of Thoroughbreds, as well as a benchmark to assess changes over time. In addition, the variation found in this study can be compared to variation found in other breeds. Jagannathan and colleagues (2019 Anim. Genet. 50, 74–77) sequenced the whole genomes of 88 horses of diverse breeds including Warmblood, Standardbred, Quarter Horse, Arabian, Morgan, Franches-Montagne, Paint, Icelandic, Shetland, Akhal-Teke, Noriker, Welsh ponies, and one Thoroughbred. The two studies were similar in design; their results are directly compared in the table below:
The total numbers of single nucleotide variants (SNVs) found in the Jagannathan study was almost twice as large as those found in the Tozaki study. This illustrates the great amount of diversity existing among horses of all breeds. However, when we examine the number of SNVs found in each horse (Max-Min), Thoroughbred horses fall within the range for the diversity of breeds. Specifically, while the Jagannathan study reported a range of 4.4 million-6.6 million SNV per animal, the Thoroughbred counts fall within that range, 4.8 million-5.3 million.
Two technical caveats bear mentioning here. SNVs are just one type of DNA variant. Other types of variants exist, including DNA insertions, deletions, and repeats. Therefore, the total number of variants including those in other categories is certainly greater than the number of SNVs reported. Another — and perhaps very consequential — caveat is that the number of SNVs were determined through comparisons with reference to the genome of a Thoroughbred mare (the equine reference genome). If a different breed was to be used as a reference, say a Shire horse, a larger number of variants for Thoroughbred horses would be seen when compared to this new reference, but fewer for Shires.
Regardless, these results suggest that the amount of variation found among Thoroughbreds is not exceptionally depleted when compared to the range of variation among other horse breeds.
Arguably, some of the most important outcomes of this study are yet-to-be generated products of these data. For instance:
- The information serves as a baseline of diversity to assess and model/predict changes in the future population resulting from current and evolving breeding practices.
- The 12.1 million genetic variants identified among these 101 Thoroughbreds can be assessed to determine which may cause fitness problems, and those desirable for health and racing performance.
- These data can be applied to assist in detecting inappropriate modifications of DNA, called gene doping, done in order to enhance racing performance.
~ Ernest Bailey, PhD / Ted Kalbfleisch, PhD / Jessica Petersen, PhD