Were the Scottish and EU Referendums driven by the same things - Brexit vs Indie: Model Notes and Links

Graham Stark [[email protected]](mailto:[email protected]')

4th Decembber 2016

This note accompanies a short paper on Brexit vs IndieRefs. This is incomplete but should contain all the relevant links to code and data.


To quote the BES website:

“The British Election Study (BES) is one of the longest running election studies world-wide and the longest running social science survey in the UK. It has made a major contribution to the understanding of political attitudes and behaviour over nearly sixty years. Surveys have taken place immediately after every general election since 1964.”

We’re going to use the inter-election internet panel part of the study. The ‘panel’ here indicates that the same group of people is contacted repeatedly. There are now 9 contact attempts (‘waves’ in the jargon) over the last 5 years - including pre- and post-general election waves, a European Elections wave, and a Scottish referendum wave. The panel aspect is nice for our purposes as we have records of how some of the sample voted in both Indie and Brexit. The BES a very rich dataset with information on attitudes, political allegiances, incomes, demographics (age, sex, education level, race and so on). Anyone can download this data Wave 1-9 Internet Panel.

Most voting data comes from polling organisations, and the public only gets to see summaries of it, usually after it has been processed in very opaque ways. Our BES dataset is different in that we get the individual records each person interviewed. So we can make detailed analyses, and do it in a transparent way.

We to build a statistical model that predicts how each person votes given their circumstances (age, gender, race, education, income and so on). There is a tradition of building such models in the social sciences - long but not always glorious. The models we’re building here are known as Binomial Probits - we take a specialised statistical computer program, and feed it observations from the dataset on each person’s vote, and on factors (“explanatory variables”) that might explain this vote. The program spits out (amongst other things) a number (“coefficient”) for each explanatory variable showing the apparent effect of each variable on voting behaviour. Of course, the model can’t predict every vote perfectly - inevitably there will be some people modelled as highly likely to vote one way who will nonetheless do the oppositve. But, if done well, it can capture accurately the things that have a systematic effect.

As well as voting choices, modelling like this are routinely used to model, for example choices of education courses, jobs, fertility and much more [BIG FOOTNOTE Ref Santos|Howard and Me|Benefit Takeup]. They are especially prevalent in Economics; since my day-job is as an economist I’ve approached this rather as an economist would in exactly how I’ve built my model; the models from politcal scientists I’ve mentioned above sometimes do things a little differently (notably in how they include age and income). I would of course defend my choices over theirs but my story for Brexit is ultimately similar.

There are two key advantages of this approach.

Firstly, we can capture the effect of each variable (education, gender, income and so on) on voting intention all else equal. For example, London voted to Remain in the EU Ref; that could be because Londoners on average have higher incomes, education levels, and so on, and those things are driving voting, or else it could be because of some (perhaps cultural) “London Effect”, or perhaps a bit of both. This is hard to work out from studies based on aggregate data, such as the REF Resolution Foundation. But our individual-level data will inevitably include some poor Londoners and some rich North-Easteners, and we can use these variations to isolate these different effects.

Secondly, our procedure can tell us how much confidence we can have that our findings are real and not just some fluke result that’s in this particular sample of people but isn’t actually there in the population as a whole. We can never be absolutely sure, of course, but we can put a number on how likely it is that there really is some relationship there.

This kind of modelling is tricky, however. There’s rarely any very systematic way to choose which explanatory variables to include. [Ref Hendry]. And, inevitably, relative to the whole population, the dataset has too many of some types of people, and too few of other types - this can be because of design (the BES oversamples Scotland, for example), because of non-response (the rich, the old and the sick are less likely to respond, for example), or other reasons [FOOTNOTE PANEL ATTRITION]; Dealing with unrepresentative samples is not always easy. And the BES doesn’t always ask the questions you’d really like (there’s very limited information on incomes, for example).

Usually, the best thing to do is to try things in a few ways (use different corrections for representativeness, different explanatory variables, and so on) and see if the most important results persist (they do). And openness is important; just as you can download the data, you can [download the statistical software], and the scripts I’ve written to drive it.

The other big problem is not drowning in information. Below I’ve tried to boil down the huge output from our Statistics program’s huge output into one simple(ish) table; the full outputs are available in formatted and raw forms, and of course anyone interested should be able to replicate the full results using the links below.

the effects in the table are relative to these.

See the [data creation script] for full details.

Main Results

Table 1 is the full table for the three main regressions summarised in the paper.

Table 2 is the complete set of regressions, showing groups of variables being added progressively.


Dependent variable:
Voted Yes in Scottish ReferendumVoted Leave in EU Ref
Scotland OnlyAll GBScotland Only
Log of Household Gross Income (£p.a)-0.131***-0.214***-0.124**
Age Squared-0.001***-0.0004***-0.0003
Highest Education: A Level/Higher Grade-0.019-0.259***-0.348***
Highest Education: Non-Degree Further0.126-0.316***-0.390***
Highest Education: Degree or Equivalent0.136-0.508***-0.701***
Ethnic Minority0.276-0.124*0.171
Has Children0.0750.104***0.215**
Has a Partner-0.0530.099***0.015
Identifies Conservative-1.360***0.246***0.271**
Identifies Libdem-0.883***-0.784***-0.829***
Identifies Labour-0.506***-0.533***-0.478***
Identifies Green0.789***-0.994***-0.801***
Identifies UKIP-0.700***2.003***5.625
Identifies SNP1.890***-0.491***-0.456***
Religion: Catholic0.193*0.058-0.073
Religion: Any Protestant-0.238***0.168***0.289***
Big5: Openness0.059***-0.016**0.029
North East0.034
North West of England-0.005
Yorkshire and Humberside0.067
South of England-0.117***
Log Likelihood-866.535-5,934.568-855.068
Akaike Inf. Crit.1,773.07011,923.1401,750.136
Note:*p<0.1; **p<0.05; ***p<0.01


Dependent variable:
Voted Yes in Scottish ReferendumVoted Leave in EU Ref
Scotland OnlyAll GBScotland Only
Log of Household Gross Income (£p.a)-0.225***-0.135***-0.124***-0.131***-0.128***-0.202***-0.201***-0.215***-0.214***-0.064-0.116**-0.116**-0.124**-0.190***-0.170***
Age Squared-0.0004***-0.001***-0.001***-0.001***-0.0003***-0.0004***-0.0004***-0.0004***-0.0004***-0.00001-0.0002-0.0003*-0.0003-0.0003*-0.0002
Highest Education: A Level/Higher Grade0.0440.041-0.0003-0.019-0.352***-0.323***-0.308***-0.277***-0.259***-0.326***-0.410***-0.377***-0.348***-0.349***-0.338***
Highest Education: Non-Degree Further0.0160.1640.1490.126-0.433***-0.372***-0.361***-0.331***-0.316***-0.384***-0.418***-0.397***-0.390***-0.390***-0.415***
Highest Education: Degree or Equivalent0.0670.165*0.155*0.136-0.653***-0.573***-0.556***-0.522***-0.508***-0.720***-0.719***-0.698***-0.701***-0.695***-0.728***
Ethnic Minority0.409**0.2870.2670.276-0.152**-0.149**-0.131**-0.114*-0.124*0.2260.1960.1880.1710.1090.164
Has Children0.0200.0760.0670.0750.110***0.120***0.108***0.103***0.104***0.202**0.229**0.203**0.215**0.199**0.200**
Has a Partner0.034-0.025-0.060-0.0530.081***0.091***0.087***0.107***0.099***-0.053-0.026-0.0220.0150.0610.056
Identifies Conservative-1.453***-1.373***-1.360***0.320***0.291***0.274***0.246***0.320***0.269**0.271**0.228**
Identifies Libdem-0.901***-0.910***-0.883***-0.737***-0.753***-0.765***-0.784***-0.773***-0.825***-0.829***-0.973***
Identifies Labour-0.497***-0.506***-0.506***-0.498***-0.500***-0.505***-0.533***-0.485***-0.482***-0.478***-0.611***
Identifies Green0.887***0.803***0.789***-1.066***-1.033***-1.007***-0.994***-0.792***-0.790***-0.801***-0.860***
Identifies UKIP-0.700***-0.718***-0.700***2.005***1.992***2.011***2.003***5.5875.5805.6255.382
Identifies SNP1.862***1.895***1.890***-0.684***-0.680***-0.687***-0.491***-0.445***-0.439***-0.456***-0.506***
Religion: Catholic0.1700.193*0.080*0.0690.058-0.008-0.0730.151
Religion: Any Protestant-0.260***-0.238***0.180***0.177***0.168***0.263***0.289***0.186**
Big5: Openness0.059***-0.017**-0.016**0.029-0.002
North East0.034
North West of England-0.005
Yorkshire and Humberside0.067
South of England-0.117***
Voted Yes in IndieRef-0.036-0.281***
Log Likelihood-1,656.315-925.873-878.954-866.535-7,499.412-6,319.826-6,279.390-5,968.492-5,934.568-1,013.896-898.282-887.847-855.068-1,020.807-1,149.595
Akaike Inf. Crit.3,334.6301,885.7451,795.9081,773.07015,020.83012,673.65012,596.78011,976.99011,923.1402,049.7931,830.5641,813.6941,750.1362,083.6142,323.190
Note:*p<0.1; **p<0.05; ***p<0.01

Equations 1-4 show IndieRef estimates, based on Scottish Subsample, with regressors added progressively: basic regressors (age, sex, education, income), then: politics, religion, psychology.

Equations 5-9 show Brexit estimates on the whole GB sample, using the same regressors, plus (Eqn. 9) regional dummies

Equations 10-13 are the same as 5-8, but for the Scottish subsample only.

Finally, 14 and 15 use vote in Indie Ref as a predictor. With all the other variables in, it’s not a good predictor since the variables we use themselves predict the indieref vote. Using the indieref without the full variable set is a good predictor.

We make the pretty-ish arrows table in the article from Table 1 above using a little Ruby script.

Voted Yes in Scottish Referendum Voted Leave in EU Ref
Scotland Only All GB Scotland Only
(1) (2) (3) (4) (5) (6)
Log of Household Gross Income (£p.a) 🢃 🢃 🢃 🡳 🢃 🢃
Age 🢁 🢁 🢁 🡱 🡱 🡡
Age Squared 🢃 🢃 🢃 🡣
Female 🢃 🡣 🡣
Highest Education: A Level/Higher Grade 🢃 🢃 🢃 🢃 🢃
Highest Education: Non-Degree Further 🢃 🢃 🢃 🢃 🢃
Highest Education: Degree or Equivalent 🢃 🢃 🢃 🢃 🢃
Ethnic Minority 🡣 🡣
Has Children 🢁 🢁 🡱 🡱 🡱
Has a Partner 🢁 🢁
Identifies Conservative 🢃 🢁 🢁 🡱 🡱
Identifies Libdem 🢃 🢃 🢃 🢃 🢃
Identifies Labour 🢃 🢃 🢃 🢃 🢃
Identifies Green 🢁 🢃 🢃 🢃 🢃
Identifies UKIP 🢃 🢁 🢁
Identifies SNP 🢁 🢃 🢃 🢃 🢃
Religion: Catholic 🡡
Religion: Any Protestant 🢃 🢁 🢁 🢁 🡱
Big5: Openness 🢁 🡳 🡳
North East
North West of England
Yorkshire and Humberside
London 🡣
South of England 🢃
Wales 🢃
Scotland 🢃
Voted Yes in IndieRef 🢃

Notes To The Article

The Commonspace CMS system unfortunately chopped out the article’s footnotes, so I’ll reproduce them here:

Matthew Goodwin and Oliver Heath Study : The study was completed after the vote but before the edition of the BES with the actual data on the Brexit vote had been released; instead, Goodwin and Heath model intention to vote (6 months before), which is a good though not perfect correlate with actual vote. I’m using the version (9) of the BES with the actual vote in it.

Kauffmann Study: this was carried out even earlier, before the actual Brexit vote: instead of explaining the vote he is modelling the extent to which people disapprove of the EU, which, too, is a good predictor of the final vote.

Paragraph on ‘Main Findings’: We say ‘Great Britain’ rather than ‘United Kingdom’ since Northern Ireland is not in the BES data.

Relationship between age and voting: in the IndieRef case (but not the Brexit vote), this is actually a bit of a simplification - ‘Yes’ is modelled to rise till about age 40 and fall thereafter, whereas ‘Leave’ rises steadily at all ages. This is ‘all else equal’, and of course incomes generally rise between those ages, which may explain why simple tabulations from polling data don’t show quite this pattern. See the Technical Note for more detail.

Political Parties: These are the parties people identify with; they needn’t be members of them.

Strength of statistical relationships : A technical aside: I’m using statistical significance for this (the p-values) rather than, for example, effect size. So I give more weight in that table to a small but certain influence than to a potentially large effect which has more uncertainty attached to it. The American Statistical Association has a good short paper on this

On Modelling Brexit: I report above a version of the model without these regional dummies, showing that adding or removing these variables doesn’t affect our other conclusions in any major way. So model in column (2) is still comparable with the Indie ref one even though it has more variables.

Analysis of Scottish Voters: ‘Scottish’, as is customary, simply means people living (or at least interviewed) in Scotland. Adding country of birth as an explanatory variable would be interesting.

London (non) Effect: The Resolution Foundation study finds this, too.

Code and Further Reading

The regressions show how the probability of voting yes to Scottish Independence or Brexit varies with age, income, gender, political and religious affiliation, holding all else constant.

Regressions are Binomial Probits. Variables used are in the standard forms used in the economics literature (age and age-squared, log of household income..). [Reference that paper]

More on the techniques used here:

Regressions were done in R.

All the code and full output are available on the GitHub code sharing site.

Please refer to these for data construction.

TODO (at least).

Some Crude Summary Crosstabs

           Remain      Leave
  Con    0.36609426 0.63390574
  Lab    0.65898280 0.34101720
  Lib    0.76721079 0.23278921
  SNP    0.69285714 0.30714286
  Plaid  0.77570093 0.22429907
  UKIP   0.01440144 0.98559856
  Green  0.85496183 0.14503817
  Other  0.56000000 0.44000000
  None  0.47735072 0.52264928

        Remain      Leave
  Con   0.220656243 0.401527424
  Lab   0.396608677 0.215690812
  Lib   0.119026646 0.037954177
  SNP   0.053402334 0.024878500
  Plaid 0.009138956 0.002777135
  UKIP  0.001761726 0.126706781
  Green 0.036996256 0.006595695
  Other   0.009249064 0.007637121
  None 0.153160097 0.176232354


            No         Yes
Con   0.95893224 0.04106776
Lab   0.78534031 0.21465969
Lib   0.85294118 0.14705882
SNP   0.05899420 0.94100580
UKIP  0.84090909 0.15909091
Green 0.24806202 0.75193798
BNP   0.50000000 0.50000000
Other 0.12500000 0.87500000
None  0.59541985 0.40458015
            No         Yes
Con   0.279808268 0.013063357
Lab   0.359496705 0.107119530
Lib   0.069502696 0.013063357
SNP   0.036548832 0.635532332
UKIP  0.044337927 0.009144350
Green 0.019173158 0.063357283
BNP   0.001797484 0.001959504
Other 0.002396645 0.018288700
None  0.186938286 0.138471587


Copyright © 2005-2016 | Virtual Worlds Research | Company No: 09208432 | 0776 330 9602 | Contact

Licenced under the Creative Commons Attribution-Share Alike Licence