In the past, election forecasts have been made on a country-wide basis. A sample of people is carefully selected to represent the population as a whole and they are asked who they will vote for. The percentages for each party are normally pretty good, but how do you convert that to seats in parliament? Just because a party gets x% of the vote doesn’t mean that get the same percentage of seats – a fact well known to the smaller parties. In the past there have been rules of thumb, but these are very crude.
Now there are forecasts by constituency based on a technique called MPR. This stands for “multi-level regression and post-stratification (MRP)”. How does this work?
Let us start with “regression”. Regression is a technique for estimating the relationship between things. For example height and weight are related obviously. Taller people are heavier generally speaking than shorter people. But you can have short chubby people and skinny tall people. The relationship is not exact. If I figure out a formula – so many kilograms for each centimetres of height – its not going to be absolutely exact. So that makes it useless right?
Wrong. As long as you have a good idea of how inexact it could be, it is still useful. Say 95% of people match your estimate plus or minus five kilograms. This gives us a range that we can use.
Regression works by taking a sample and working out a formula that minimised the differences between each of the actual values (weights in my example) and the calculated values given by the formula. When I say “minimises the differences”, I mean that it minimises the sum of the squares of the differences. This gives the good formula. It also allows you to make statements about how accurate it is, essentially a range of values that a certain percentage of actual results will fall within. There are a whole bunch of assumptions underlying this which may or may not be true. I won’t go into the detail but they are listed here.
That accounts for “regression” what about “multi-level”? That just means the regression formula is based on several values, not just one (e.g. height and waist size). One of the first computer programs I coded back in the 1960’s was for multi-level regression. The technique has been around for a while.
Now back to election estimates. In this case the polling company carries out a poll and uses regression to work out a formula that relates voting intention to demographics and location. Now we are really talking broad-brush estimates, but for example educated young professional people in the south of England might be more likely to vote Green than others.
Having worked out the formula, the polling company looks at the demographics in each constituency and uses the formula to calculate the likely result. That is the “post-stratification” bit. As with all good statistical estimates it comes with a range of values. This estimate gives the conservative 115 seats, with a range of 99 to 123. It is important to interpret this as an estimate of voting intentions, NOT the result on election day. Intentions can change.
This is a useful technique, but remember that the range is based on a whole set of assumptions. For example that the individual demographic values are independent of each other. If two of the demographics used were, for example college degree and current salary, these are clearly related to each other (statistically: yes I know there are rich non-graduates). There are ways of minimising these issues by using a variant on sum of squares to correct for different issues.
This brings us on to “Stacked Regression and Poststratification (SRP)”. This uses several formulae based on different variants on sum of squares and mashes them together to give a refined estimate. The result is more accurate, but the error estimate gets lost in the process. This polling company uses this technique. They have the conservatives on 105 seats.
SRP is definitely interesting. But I suspect is open to further development. The way the different methods are mashed together must be open to all sorts of research.
Leave a comment