Sunday, June 9, 2019

Introduction - why is this here?

I have been teaching undergraduate biology courses since 2001, and one area that I find is consistently difficult is math. Students seem to come into our biology program with the expectation that math doesn't matter. Sadly, they often receive a rude awakening very early on! One area of applied mathematics that they encounter is the Hardy-Weinberg equation. This is an equation that is used as a null model, representing what happens when evolution is NOT occurring. By determining if a population does or does not meet the conditions of the Hardy-Weinberg equation, you can potentially learn a lot about the population and how evolution might be acting on it.

For this reason we teach this topic to our students, but they often have a hard time understanding what the equation says and what it means. One thing that seemed promising would be to give them additional chances to practice using the equation, with feedback to help them improve. At this point, I got the idea to develop an app that would generate Hardy-Weinberg problems and see if students could answer them. From this, the Hardy-Weinberg quiz app was born. 

This site functions as an explanation/introduction to the Hardy-Weinberg equation and how to apply it. It also serves as a help system for the app. If you haven't reached this webiste from the app, you can get to the app in the Google Play Store at this link. If you are an iPhone user, I'm sorry, but I don't have a capability to program for Apple devices, so the app is Android only. You might still find this website useful for understanding Hardy-Weinberg even if you cannot (or don't want) use the app. Click on the links on the right for more information (in particular, the "Instructions" link gives more information on using the app) and if you have questions, you can reach me at utilitybeltsoftware@gmail.com

Saturday, June 8, 2019

What is the Hardy-Weinberg equation?


The Hardy-Weinberg equation was developed in the early 20th century. This equation describes the way that allele frequencies change (or do not change) over generations. It is generally used to examine one trait with two alleles, one dominant (represented by p in the equation) and one recessive (represented by q). It is also common to represent the dominant allele with a capital letter (e.g., "A") and the recessive allele with the same letter in lower case (e.g., "a"). To test this equation, we examine the gene pool, which consists of all the alleles that are present in  the population. Evolution is occurring when the frequencies of these alleles change from one generation to the next. Because there are only two possible alleles, their frequencies must add up to a total of 1.

p + q = 1 (equation 1).

This means that if you know the frequency of the dominant allele, you can calculate the recessive (and the reverse is also true). This must always be true, whether evolution is happening or not. The test of the equation is in how well it is able to predict the genotype frequencies in the population. If we are dealing with a population that is diploid and a gene with only two alleles, then there are three possible genotype frequencies, homozygous dominant (an individual with two copies of the dominant allele,, which can be abbreviated as "AA"), heterozygous (one dominant and one recessive, represented as "Aa"), and homozygous recessive (two recessive alleles, represented as "aa"). The standard way to present the Hardy-Weinberg equation is to start with equation 1 and square it, because there are two alleles in diploid individuals

(p + q)2 = 1 (equation 2)

This can then be expanded to

p2 + 2pq + q2 = 1 (equation 3)

In this equation, each of the items corresponds to a genotype, with p2 representing the homozygous dominant frequency, 2pq representing the heterozygotes, and q2 representing the homozygous recessive frequency. These are the values you expect to see for the genotypes if the Hardy-Weinberg equation is accurately describing your population. The equation describes a situation where evolution should not be occurring, meaning that the allele frequencies will remain unchanged from one generation to the next. This requires several assumptions:
  1. There is no factor that allows some individuals to be more successful than others
  2. The population size is large and there are no large-scale random events affecting it
  3. There is no net mutation in the population
  4. There is no net movement of individuals into or out of the population
  5. Mating is random
These assumptions do not have to be perfectly met, but the closer they are, the better the equation will apply. Even in situations where the assumptions do apply, there will tend to be some variation over time, because some randomness cannot be avoided. The figure below shows how two particular alleles are changing over time when the Hardy-Weinberg assumptions are met. Note that the alleles are tied together - when one allele goes up, the other goes down in response.


These assumptions need to apply to the particular trait being studied, not every trait in the organism. This means that some traits might be evolving while others are not. It is also generally applied to traits that are autosomal, although this is not required. This app will only focus on autosomal traits to minimize the complications. See other sections on this website for specifics on how to calculate the values and how the app tests your understanding...

Friday, June 7, 2019

Calculating allele frequencies

Generally when working with the H-W equation, you are given the information on your population and then must calculate the required values. The first set of values you are likely to need are the allele frequencies (p for the dominant, q for the recessive). To determine these values you will be given the number of individuals in a population for each genotype. From this you can calculate the values directly. For example, if you have a population with 252 AA, 640 Aa, and 382 aa. What are the values of p and q for this population? Because p and q represent the allele frequencies in the gene pool, you need to determine how many alleles are in the entire gene pool. For this you take the total number of individuals and multiply by 2 (because the individuals are diploid). Thus:

252 + 640 + 382 = 1274 individuals.

1274 x 2 = 2548 alleles

To calculate an allele's frequency you need to take into account how many total copies are present for that allele in the entire gene pool. This means you need to take into account two different types of individuals: heterozygotes and one type of homozygous individual. If you are calculating p, then you would count all the homozygous dominant individuals and the heterozygotes. However, you have to take into account the fact that a homozygous dominant individual has two dominant alleles while heterozygotes have only one. So in order to calculate p, you need to take

 (equation 1)

Replacing the generic equation listed above with the actual values we get 

 (equation 2)

This means that p = 0.449 (your number may vary slightly due to different number of decimal places). This means you know that q must be 0.551, because p + q = 1. You can also calculate q directly as a way to check your math (always a good idea!). You would use equation 2, but replace 252 with 382.

This is an important skill to cultivate - pretty much every other calculation you need for the Hardy-Weinberg equation requires that you know what the allele frequencies are!

Thursday, June 6, 2019

Calculating genotype frequencies

Once you know the allele frequencies you can calculate the expected genotype frequencies using those values (the values given here for p and q are from the topic on calculating allele frequencies).

Frequency of homozygous dominant = p2 = 0.4492  0.202

Frequency of heterozygous = 2pq = 2 x 0.449 x 0.551 = 0.495

Frequency of homozygous recessive = q2 = 0.5512 = 0.304

We often find that genetic diseases are recessive, and only expressed in the homozygous recessive form. In such a case, the heterozygous individuals would be called "carriers" because they can pass the disease gene to their offspring, but they usually don't have the disease. These genotype frequencies are going to be worthwhile in their own right, but mostly because they can be used for the calculations to see if the H-W equation applies to a particular population.

Wednesday, June 5, 2019

Calculating number of individuals of each genotype.

Once you have calculated the genotype frequencies that you expect if the H-W equation is true, you need to calculate how many individuals that corresponds to. To do this you multiply each genotype frequency times the size of the population. For our starting example this means:

Expected homozygous dominant (AA) individuals = p2 x 1274 (number of individuals in the population). This is 0.202 x 1274 = 257.4

Expected heterozygous (Aa) individuals = 2pq x 1274 = 0.495 x 1274 = 630.6

Expected homozygous recessive (aa) individuals = qx 1274 = 0.304 x 1274 = 387.3

NOTE: your expected values will typically not be whole numbers, which seems a little odd, because you cannot have fractions of an individual. However, for the purposes of the statistical test we're going to be using, fractional answers are reasonable. You can check to see if your numbers are reasonable by adding up your expected values. They may be off slightly due to round effects, but they should be close to the 1274 population size you are working with. If there is a large difference check your work - there may be an error in there somewhere!

Now that you have your expected genotype numbers, you can move on to perform the statistical test that will let you determine if the population meets the Hardy-Weinberg requirements or not.

Tuesday, June 4, 2019

Calculating if the population meets the Hardy-Weinberg requirements

Once you have your expected values you can compare them to the observed ones, but doing so based on your intuition isn't reliable. Instead, you need to perform a statistical test that allows us to determine if our values are too far apart to be correct. The test we use is called the chi-square goodness of fit test (abbreviated as χ2). This test is calculated by taking the difference between the actual number of individuals in each genotype in the population and the numbers of each genotype that you expected if the Hardy-Weinberg equation is true. You compare those numbers using this formula:


For a H-W test, you will calculate three values (one for each genotype) and sum them together to determine the χ2 value. You then compare this value to a critical value to see if there is a significant difference, or if the differences between the observed and expected are just due to random noise. Normally, you need to have a statistical table to see if your value is significant or not, but that isn't necessary for us. For reasons we don't need to address here, the comparison value for H-W tests is always 3.841. Note: a common misunderstanding is to use a value of 5.991. This is based on an incorrect interpretation of how the χ2 test is done (see this website for more explanation). If the χ2 value is greater than 3.841, then we would say that the differences between the observed and expected are large enough to say that the H-W equation probably doesn't apply to that population. In other words, this would mean the population is evolving for this gene because at least one of the H-W assumptions is not correct. Further work would be needed to determine which assumption(s) are being violated.

For the example data we have been using recall that there were 252 AA, 640, Aa, and 382 aa individuals. These are the observed values. If the H-W equation is true, we expect 257.4, 630.6, and 387.3 for each genotype. This gives the following calculations


 Solving for each equation gives us

So the total is χ2 = 0.326.

Because this value is less than 3.841, there is no significant difference between the expected and observed values, leading us to conclude that this population is not evolving for this gene because it meets all the assumptions required by the H-W equation.

Monday, June 3, 2019

Instructions - introduction to the H-W Quiz App







When you start the app, you should see this screen. The "Make pop." button will randomly choose one of the question types that you have selected (check the Settings on the menu to see which types of questions are available) and generate a population that will be used for the numbers for the problem. The menu at the top right gives access to a number of options (discussed in separate items on this website). The hint button will activate the hint system, where the app will guide you through the calculations to get to the correct answer. This is discussed in more detail on this site. If you need general instructions for the app, you can access them by clicking on the menu and choosing "Instructions". Once you have generated a population you will see details on the population (number of individuals of each genotype) and a place to answer the question. There are two types of questions the app can ask: multiple-choice, where the app gives you a list of choices to answer the question being asked. The other type are fill-in questions where you have to provide a number you calculated to answer the question. Once you have made your choice or entered your answer, you will click on the check answer button and learn if your answer was correct or not. For the multiple-choice questions, the specific answer will be among the options. For the fill-in questions, the app permits some deviation from the correct answer. This is to take into account rounding issues. Depending on the type of question, the number of decimal points you should use in your answer will be indicated. The amount of "wiggle room" you have varies based on the type of question but it isn't very large in any type. You aren't required to use that many, but if you do, you're more likely to be within the acceptable limits for that type of question.

Based on providing correct answers quickly enough, you will earn points that can be used to increase your rank from its starting point (Novice) all the way up to the maximum value (Ph.D.). Answers that aren't given quickly enough will not earn points (but won't cost you any). Wrong answers will cost you points, so be sure you have the correct answer before you select it! See the section on the rankings for more information on this aspect.