GenLang Consortium


1. Who conducted this study and why?

The researchers of this study have expertise in the areas of genetics and/or reading and language skills, including difficulties in developing these skills. They include scientists who are members of the Genetics of Language consortium (, an international research consortium which aims to facilitate large-scale genomic investigations of speech, language, reading, and related skills. The researchers have teamed up with the consumer genetics company, 23andMe, in order to study the world's largest database of DNA and dyslexia diagnosis information. The team is interested in understanding how the genes that a person carries might help to explain why some individuals have trouble with reading and spelling while others do not. In this study, the researchers applied a powerful gene discovery analysis method called genome-wide association which uses the information from unrelated individuals from the population to test for association between common DNA variations called single nucleotide polymorphisms (SNPs) and whether or not a person has had a dyslexia diagnosis. For studies of this kind to give reliable results, they must be done in very large samples, such as that available in 23andMe. Besides understanding which genetic variants may contribute to dyslexia propensity, the results of this analysis can point to relevant biological pathways (like those involved in the development and wiring of the brain). The findings can also tell us what other behaviours are genetically related to dyslexia, leading to further research to disentangle these relationships.

2. What is dyslexia?

Dyslexia is a common learning difficulty that mainly causes problems with reading, writing and spelling ( These kinds of difficulties were first recognised in the 1870s, when it was also noticed that they tend to cluster in families. But it wasn't until the 1960s when a consensus definition of dyslexia was reached that we saw more widespread identification of children with dyslexia. Dyslexia is typically given as a diagnosis if reading and spelling abilities are poor and much lower than a person's other academic skills or cognitive abilities. A clinical diagnosis would thus require testing the person on a range of cognitive tests and comparing these results. Further, to give a diagnosis of dyslexia, the basis of the poor reading and spelling should not be due to problems with vision, hearing or other obvious underlying health conditions. In this study, because information was needed from many hundreds of thousands of unrelated people, the researchers did not carry out new tests of reading, spelling and other relevant skills, but instead relied on a self-report questionnaire that asked whether or not the person had received a diagnosis of dyslexia.

3. Can dyslexia be inherited?

Early observations of dyslexia found that it tends to run in families. In other words, you are more likely to have dyslexia if you have a close relative who has it. But that is not enough to show involvement of genetic factors, because families typically share not only their genes but also their environments. Later twin studies compared genetically identical and non-identical twins growing up in the same environment, where at least one of the twins had dyslexia. They found that the likelihood of the second twin also having dyslexia was higher for identical than non-identical twins. These kinds of studies confirmed that dyslexia was strongly heritable. Heritability is a term that refers to the proportion of differences between people in a population that are accounted for by genetic variations in that particular population. Importantly, a high heritability doesn't imply a simple genetic explanation. It doesn't tell you anything about the numbers of different genes contributing to the trait (which could be very large), nor can it identify which genes are involved.

4. What is a GWAS?

A genome-wide association study (GWAS) tests whether individual DNA variations at different sites across the entire genome (each of the nucleotide bases where people differ: adenine, A; cytosine, C; guanine, G; thymine, T) are statistically associated with an outcome of interest, in this case whether or not the person has been diagnosed with dyslexia. A person carries a pair of these individual nucleotide bases (one inherited from each biological parent). For example, if a particular site in the genome has two alternative versions of DNA sequence (known as alleles), one with an A and another with a G, a person can have the following pairs: AA, AG, GG. Because there is variation in the genotype at this sequence position we can test whether this variation relates to variation in the outcome. For instance, do people with dyslexia more often carry an A than a G allele at this position? There are millions of SNPs spanning the genome where people commonly differ and each of these can be tested for their association with an outcome, in a systematic search across all 23 pairs of chromosomes. Since we know that a single common SNP by itself is likely to make only a tiny contribution to the overall outcome, a GWAS must analyse very large numbers of people to be able to detect associations with confidence.

5. If a SNP shows association in a GWAS, does that mean it causes the trait?

No. A GWAS is a purely correlational approach, so it cannot establish causation. And because SNPs located close together on a chromosome tend to be inherited together they will show similar associations with an outcome even though only one (or several of these) SNPs might be having a biological effect. Indeed, the critical DNA variant might be very rare and not even tested for association, but because it has been inherited alongside the common (SNP) variant, that SNP will show an association signal. Other important types of genetic differences between people (e.g., stretches of DNA sequence that are missing, known as deletions) may also be inherited alongside common SNPs, and again, such neighbouring SNPs will show association although it is the deletion that is impacting on the trait. Another reason why associations from a GWAS should not be considered causal is related to potential differences in effects across different environments (e.g. language, country, socio-economics). SNP associations found in one particular population at a certain time in history might not be found in other populations and/or at other times. That is, a gene might have differential influences on human biology (e.g. on brain development and cognitive functions) depending on the environment in which it is found.

Additionally, a GWAS has the potential to find spurious (i.e. untrue) associations between a SNP and outcome simply because the genetic ancestry of a sample hasn't been properly controlled for. If an outcome differs between people with different ancestral history due to systematic cultural/environmental reasons, then an association will arise with any allele that differs in frequency between ancestry groups, irrespective of the contributions of the gene to that outcome. So, GWAS associations are the starting point for further investigations of why the association exists.

6. Could associated SNPs from a GWAS predict whether a person will have dyslexia?

Dyslexia is a complex trait and is likely influenced by thousands of different DNA variations (located in a great many different genes) each with very small effect. These variants operate probabilistically, meaning that if you were to carry many or even all of the common SNP alleles associated with dyslexia there is still no certainty that you would develop dyslexia. Moreover, since learning to read and spell is a product of the intertwining of nature with nurture, and because genes interact with the environments they find themselves in, associated SNPs might only be useful in indicating greater likelihood of dyslexia in specific environments. Therefore, if one were to use SNPs to try to predict the likelihood of a person being dyslexic, there would be a large amount of error around this prediction. Currently, family history of dyslexia would be better than SNPs for estimating a child's genetic predisposition to dyslexia.

7. What is a polygenic score?

A polygenic score combines the effects of relevant SNPs from the GWAS of the outcome of interest to give a total value for an individual based on all the common DNA variation in that individual's genome. A higher polygenic index for an individual would indicate greater genetic propensity for that outcome, in this case dyslexia. The polygenic index is a better 'predictor' (in a statistical sense) of outcomes than a single SNP, because each person carries a mixture of SNPs that decrease and increase the likelihood of an outcome, so the net effects of these gives a more reliable estimate of overall genetic predispositions. A polygenic index can be informative with regard to characterising general tendencies in the population; that is, people with lower polygenic scores tend not to have dyslexia. However, this is not reliable enough to predict an individual person's likelihood of being dyslexic. Furthermore, because most GWAS efforts have so far focused on people of white European ancestry, the polygenic index would be even less reliable if used to investigate people from different backgrounds.

8. What did we know about genetic associations with dyslexia prior to this study?

Prior to this study, a handful of genetic associations with dyslexia had been discovered, mainly through family studies. These studies typically focussed on families where parents and siblings had been diagnosed with dyslexia; genetic regions that had been transmitted from a dyslexic parent to a dyslexic child, for instance, could be identified. A number of these genetic regions were studied further to find associations with DNA variants in particular genes. Of the genes that were identified, replication studies which attempt to reproduce the original finding were inconsistent. Therefore, it was unclear whether these genetic associations with dyslexia were true or not. And even if true, their effects would still leave the vast majority of the genetics of dyslexia unresolved.

9. What did we find in this GWAS of dyslexia?

In around 50,000 adults self-reporting a dyslexia diagnosis and over 1 million adults reporting none, we found 42 SNPs that were reliably associated with dyslexia. This is the first GWAS of dyslexia to discover statistically significant SNP associations from a genome-wide scan. As expected for a genetically complex trait like dyslexia, we showed that the effect of any individual SNP is very small. For the largest SNP effect in this study, a person with dyslexia would be just 1.12 times more likely to carry the predisposing allele than a person without dyslexia. We were also able to test association at the level of a whole gene, meaning the combined association results of all SNPs located in a gene, and here we found 173 different genes that were significantly associated with dyslexia. However, our results did not show any clear biological pathways, like specific neurotransmitters or specific processes of brain cells, that might underlie our significant associations. Importantly, while uncovering many new genes involved in dyslexia, we also found there was very little evidence in support of previously reported genetic associations with this trait, which were based on smaller investigations, meaning that this study changes our perspective on dyslexia genetics in a big way.

The association of SNPs with dyslexia was very similar in males and females, suggesting that sex-specific genetic effects are small. A key finding to support our use of self-reported dyslexia as a reliable and valid outcome was the very strong genetic correlation between dyslexia and reading test scores, based on data from a recent study by the GenLang Consortium which investigated these in around 34,000 people. We also found that many other variables were genetically correlated with dyslexia; one of the strongest being with Attention Deficit Hyperactivity Disorder. We already knew from prior studies that dyslexia and ADHD often co-occur; our findings show that this may be (in part) because some of the same genes are involved in both conditions. A polygenic index using our GWAS results was created in independent cohorts of children and adults (not included in the original GWAS), including samples of children that were enriched for reading difficulties. We found that people with higher polygenic scores tended to have lower reading test scores. While statistically robust, the overall effects are still relatively small, with the polygenic index accounting for up to 6% the variability in reading test scores in the new samples.

A major limitation of our work is that, for practical reasons, the GWAS was based on individuals with white European genetic ancestry, so more work is needed in populations with other ancestries to understand if the same genes and DNA variations contribute. Also, our study focusses only on DNA variations that are common in the population. We found that these kinds of genetic variants explain perhaps a third of the heritability of dyslexia. Hence, most of the genetic variation underlying dyslexia is still undiscovered.

10. Are there any practical uses of the results of this study?

The current study does not provide results that can be used within an education or clinical setting for identification of dyslexia in individual people. The polygenic index is far too inaccurate to offer reliable individual predictions, and we do not understand yet whether these genetic effects (of which we have identified only a small proportion) may contribute in different ways given different learning environments. It could be that certain reading instruction environments would minimise or completely negate genetic effects, for example. Scientists will be able to use our findings to test such hypotheses, towards new insights into the ways that nature and nurture interact when we learn to read.

The genetic correlations observed in this study point to other outcomes and aspects of behaviour that one might wish to be mindful of when supporting a person with dyslexia. But these will mostly be of use to other scientists who are seeking to understand why dyslexia co-occurs with ADHD, for example, or why there is a correlation with heightened pain sensitivity. The 42 common genetic variants that we associated with dyslexia, and the genes that they implicate, will need to be studied in more detail to understand their functions and the potential mechanisms through which they influence cognitive processes. Such knowledge might inform future clinical interventions; although there is already a very good evidence base around effective interventions for dyslexia.

11. Dyslexia and GWAS resources