Policy & Internet Blog

Syndicate content
Understanding public policy online
Updated: 3 years 10 weeks ago

Digital Disconnect: Parties, Pollsters and Political Analysis in #GE2015

11 May 2015

The Oxford Internet Institute undertook some live analysis of social media data over the night of the 2015 UK General Election. See more photos from the OII’s election night party, or read about the data hack

Counts of public Facebook posts mentioning any of the party leaders’ surnames. Data generated by social media can be used to understand political behaviour and institutions on an ongoing basis.[/caption]‘Congratulations to my friend @Messina2012 on his role in the resounding Conservative victory in Britain’ tweeted David Axelrod, campaign advisor to Miliband, to his former colleague Jim Messina, Cameron’s strategy adviser, on May 8th. The former was Obama’s communications director and the latter campaign manager of Obama’s 2012 campaign. Along with other consultants and advisors and large-scale data management platforms from Obama’s hugely successful digital campaigns, Conservative and Labour used an arsenal of social media and digital tools to interact with voters throughout, as did all the parties competing for seats in the 2015 election.

The parties ran very different kinds of digital campaigns. The Conservatives used advanced data science techniques borrowed from the US campaigns to understand how their policy announcements were being received and to target groups of individuals. They spent ten times as much as Labour on Facebook, using ads targeted at Facebook users according to their activities on the platform, geo-location and demographics. This was a top down strategy that involved working out was happening on social media and responding with targeted advertising, particularly for marginal seats. It was supplemented by the mainstream media, such as the Telegraph for example, which contacted its database of readers and subscribers to services such as Telegraph Money, urging them to vote Conservative. As Andrew Cooper tweeted after the election, ‘Big data, micro-targeting and social media campaigns just thrashed “5 million conversations” and “community organizing”’.

He has a point. Labour took a different approach to social media. Widely acknowledged to have the most boots on the real ground, knocking on doors, they took a similar ‘ground war’ approach to social media in local campaigns. Our own analysis at the Oxford Internet Institute shows that of the 450K tweets sent by candidates of the six largest parties in the month leading up to the general election, Labour party candidates sent over 120,000 while the Conservatives sent only 80,000, no more than the Greens and not much more than UKIP. But the greater number of Labour tweets were no more productive in terms of impact (measured in terms of mentions generated: and indeed the final result).

Both parties’ campaigns were tightly controlled. Ostensibly, Labour generated far more bottom-up activity from supporters using social media, through memes like #votecameron out, #milibrand (responding to Miliband’s interview with Russell Brand), and what Miliband himself termed the most unlikely cult of the 21st century in his resignation speech, #milifandom, none of which came directly from Central Office. These produced peaks of activity on Twitter that at some points exceeded even discussion of the election itself on the semi-official #GE2015 used by the parties, as the figure below shows. But the party remained aloof from these conversations, fearful of mainstream media mockery.

The Brand interview was agreed to out of desperation and can have made little difference to the vote (partly because Brand endorsed Miliband only after the deadline for voter registration: young voters suddenly overcome by an enthusiasm for participatory democracy after Brand’s public volte face on the utility of voting will have remained disenfranchised). But engaging with the swathes of young people who spend increasing amounts of their time on social media is a strategy for engagement that all parties ought to consider. YouTubers like PewDiePie have tens of millions of subscribers and billions of video views – their videos may seem unbelievably silly to many, but it is here that a good chunk the next generation of voters are to be found.

Use of emergent hashtags on Twitter during the 2015 General Election. Volumes are estimates based on a 10% sample with the exception of #ge2015, which reflects the exact value. All data from Datasift.

Only one of the leaders had a presence on social media that managed anything like the personal touch and universal reach that Obama achieved in 2008 and 2012 based on sustained engagement with social media – Nicola Sturgeon. The SNP’s use of social media, developed in last September’s referendum on Scottish independence had spawned a whole army of digital activists. All SNP candidates started the campaign with a Twitter account. When we look at the 650 local campaigns waged across the country, by far the most productive in the sense of generating mentions was the SNP; 100 tweets from SNP local candidates generating 10 times more mentions (1,000) than 100 tweets from (for example) the Liberal Democrats.

Scottish Labour’s failure to engage with Scottish peoples in this kind of way illustrates how difficult it is to suddenly develop relationships on social media – followers on all platforms are built up over years, not in the short space of a campaign. In strong contrast, advertising on these platforms as the Conservatives did is instantaneous, and based on the data science understanding (through advertising algorithms) of the platform itself. It doesn’t require huge databases of supporters – it doesn’t build up relationships between the party and supporters – indeed, they may remain anonymous to the party. It’s quick, dirty and effective.

The pollsters’ terrible night

So neither of the two largest parties really did anything with social media, or the huge databases of interactions that their platforms will have generated, to generate long-running engagement with the electorate. The campaigns were disconnected from their supporters, from their grass roots.

But the differing use of social media by the parties could lend a clue to why the opinion polls throughout the campaign got it so wrong, underestimating the Conservative lead by an average of five per cent. The social media data that may be gathered from this or any campaign is a valuable source of information about what the parties are doing, how they are being received, and what people are thinking or talking about in this important space – where so many people spend so much of their time. Of course, it is difficult to read from the outside; Andrew Cooper labeled the Conservatives’ campaign of big data to identify undecided voters, and micro-targeting on social media, as ‘silent and invisible’ and it seems to have been so to the polls.

Many voters were undecided until the last minute, or decided not to vote, which is impossible to predict with polls (bar the exit poll) – but possibly observable on social media, such as the spikes in attention to UKIP on Wikipedia towards the end of the campaign, which may have signaled their impressive share of the vote. As Jim Messina put it to msnbc news following up on his May 8th tweet that UK (and US) polling was ‘completely broken’ – ‘people communicate in different ways now’, arguing that the Miliband campaign had tried to go back to the 1970s.

Surveys – such as polls — give a (hopefully) representative picture of what people think they might do. Social media data provide an (unrepresentative) picture of what people really said or did. Long-running opinion surveys (such as the Ipsos MORI Issues Index) can monitor the hopes and fears of the electorate in between elections, but attention tends to focus on the huge barrage of opinion polls at election time – which are geared entirely at predicting the election result, and which do not contribute to more general understanding of voters. In contrast, social media are a good way to track rapid bursts in mobilization or support, which reflect immediately on social media platforms – and could also be developed to illustrate more long running trends, such as unpopular policies or failing services.

As opinion surveys face more and more challenges, there is surely good reason to supplement them with social media data, which reflect what people are really thinking on an ongoing basis – like, a video in rather than the irregular snapshots taken by polls. As a leading pollster João Francisco Meira, director of Vox Populi in Brazil (which is doing innovative work in using social media data to understand public opinion) put it in conversation with one of the authors in April – ‘we have spent so long trying to hear what people are saying – now they are crying out to be heard, every day’. It is a question of pollsters working out how to listen.

Political big data

Analysts of political behaviour – academics as well as pollsters — need to pay attention to this data. At the OII we gathered large quantities of data from Facebook, Twitter, Wikipedia and YouTube in the lead-up to the election campaign, including mentions of all candidates (as did Demos’s Centre for the Analysis of Social Media). Using this data we will be able, for example, to work out the relationship between local social media campaigns and the parties’ share of the vote, as well as modeling the relationship between social media presence and turnout.

We can already see that the story of the local campaigns varied enormously – while at the start of the campaign some candidates were probably requesting new passwords for their rusty Twitter accounts, some already had an ongoing relationship with their constituents (or potential constituents), which they could build on during the campaign. One of the candidates to take over the Labour party leadership, Chuka Umunna, joined Twitter in April 2009 and now has 100K followers, which will be useful in the forthcoming leadership contest.

Election results inject data into a research field that lacks ‘big data’. Data hungry political scientists will analyse these data in every way imaginable for the next five years. But data in between elections, for example relating to democratic or civic engagement or political mobilization, has traditionally been woefully short in our discipline. Analysis of the social media campaigns in #GE2015 will start to provide a foundation to understand patterns and trends in voting behaviour, particularly when linked to other sources of data, such as the actual constituency-level voting results and even discredited polls — which may yet yield insight, even having failed to achieve their predictive aims. As the OII’s Jonathan Bright and Taha Yasseri have argued, we need ‘a theory-informed model to drive social media predictions, that is based on an understanding of how the data is generated and hence enables us to correct for certain biases’

A political data science

Parties, pollsters and political analysts should all be thinking about these digital disconnects in #GE2015, rather than burying them with their hopes for this election. As I argued in a previous post, let’s use data generated by social media to understand political behaviour and institutions on an ongoing basis. Let’s find a way of incorporating social media analysis into polling models, for example by linking survey datasets to big data of this kind. The more such activity moves beyond the election campaign itself, the more useful social media data will be in tracking the underlying trends and patterns in political behavior.

And for the parties, these kind of ways of understanding and interacting with voters needs to be institutionalized in party structures, from top to bottom. On 8th May, the VP of a policy think-tank tweeted to both Axelrod and Messina ‘Gentlemen, welcome back to America. Let’s win the next one on this side of the pond’. The UK parties are on their own now. We must hope they use the time to build an ongoing dialogue with citizens and voters, learning from the success of the new online interest group barons, such as 38 degrees and Avaaz, by treating all internet contacts as ‘members’ and interacting with them on a regular basis. Don’t wait until 2020!

Helen Margetts is the Director of the OII, and Professor of Society and the Internet. She is a political scientist specialising in digital era governance and politics, investigating political behaviour, digital government and government-citizen interactions in the age of the internet, social media and big data. She has published over a hundred books, articles and major research reports in this area, including Political Turbulence: How Social Media Shape Collective Action (with Peter John, Scott Hale and Taha Yasseri, 2015).

Scott A. Hale is a Data Scientist at the OII. He develops and applies techniques from computer science to research questions in the social sciences. He is particularly interested in the area of human-computer interaction and the spread of information between speakers of different languages online and the roles of bilingual Internet users. He is also interested in collective action and politics more generally.

Political polarization on social media: do birds of a feather flock together on Twitter?

05 May 2015

Are tweeters mainly seeking to reinforce their own viewpoints and link with likeminded persons, or is there a basis for widening and thoughtful exposure to a variety of perspectives? Image by Pete Simon (Flickr).

Twitter has exploded in recent years, now boasting half a billion registered users. Like blogs and the world’s largest social networking platform, Facebook, Twitter has actively been used for political discourse during the past few elections in the US, Canada, and elsewhere but it differs from them in a number of significant ways. Twitter’s connections tend to be less about strong social relationships (such as those between close friends or family members), and more about connecting with people for the purposes of commenting and information sharing. Twitter also provides a steady torrent of updates and resources from individuals, celebrities, media outlets, and any other organization seeking to inform the world as to its views and actions.

This may well make Twitter particularly well suited to political debate and activity. Yet important questions emerge in terms of the patterns of conduct and engagement. Chief among them: are users mainly seeking to reinforce their own viewpoints and link with likeminded persons, or is there a basis for widening and thoughtful exposure to a variety of perspectives that may improve the collective intelligence of the citizenry as a result?

Conflict and Polarization

Political polarization often occurs in a so-called ‘echo chamber’ environment, in which individuals are exposed to only information and communities that support their own viewpoints, while ignoring opposing perspectives and insights. In such isolating and self-reinforcing conditions, ideas can become more engrained and extreme due to lack of contact with contradictory views and the exchanges that could ensue as a result.

On the web, political polarization has been found among political blogs, for instance. American researchers have found that liberal and conservative bloggers in the US tend to link to other bloggers who share their political ideology. For Kingwell, a prominent Canadian philosopher, the resulting dynamic is one that can be characterized by a decline in civility and a lessening ability for political compromise to take hold. He laments the emergence of a ‘shout doctrine’ that corrodes the civic and political culture, in the sense that divisions are accentuated and compromise becomes more elusive.

Such a dynamic is not the result of social media alone – but rather it reflects for some the impacts of the Internet generally and the specific manner by which social media can lend itself to broadcasting and sensationalism, rather than reasoned debate and exchange. Traditional media and journalistic organizations have thus become further pressured to act in kind, driven less by a patient and persistent presentation of all sides of an issue and more by near-instantaneous reporting online. In a manner akin to Kingwell’s view, one prominent television news journalist in the US, Ted Koppel, has lamented this new media environment as a danger to the republic.

Nonetheless, the research is far from conclusive as to whether the Internet increases political polarization. Some studies have found that among casual acquaintances (such as those that can typically be observed on Twitter), it is common to observe connections across ideological boundaries. In one such funded by the Pew Internet and American Life Project and the National Science Foundation, findings suggest that people who often visit websites that support their ideological orientation also visit web sites that support divergent political views. As a result, greater sensitivity and empathy for alternative viewpoints could potentially ensue, improving the likelihood for political compromise – even on a modest scale that would otherwise not have been achievable without this heightened awareness and debate.

Early Evidence from Canada

The 2011 federal election in Canada was dubbed by some observers in the media as the country’s first ‘social media election’ – as platforms such as Facebook and Twitter became prominent sources of information for growing segments of the citizenry, and evermore strategic tools for political parties in terms of fundraising, messaging, and mobilizing voters. In examining Twitter traffic, our own intention was to ascertain the extent to which polarization or cross-pollinization was occurring across the portion of the electorate making use of this micro-blogging platform.

We gathered nearly 6000 tweets pertaining to the federal election made by just under 1500 people during a three-day period in the week preceding election day (this time period was chosen because it was late enough in the campaign for people to have an informed opinion, but still early enough for them to be persuaded as to how they should vote). Once the tweets were retrieved, we used social network analysis and content analysis to analyze patterns of exchange and messaging content in depth.

We found that overall people do tend to cluster around shared political views on Twitter. Supporters of each of the four major political parties identified in the study were more likely to tweet to other supporters of the same affiliation (this was particularly true of the ruling Conservatives, the most inwardly networked of the four major politically parties). Nevertheless, in a significant number of cases (36% of all interactions) we also observed a cross-ideological discourse, especially among supporters of the two most prominent left-of-centre parties, the New Democratic Party (NDP) and the Liberal Party of Canada (LPC). The cross-ideological interactions among supporters of left-leaning parties tended to be agreeable in nature, but often at the expense of the party in power, the Conservative Party of Canada (CPC). Members from the NDP and Liberal formations were also more likely to share general information and updates about the election as well as debate various issues around their party platforms with each other.

By contrast, interactions between parties that are ideologically distant seemed to denote a tone of conflict: nearly 40% of tweets between left-leaning parties and the Conservatives tended to be hostile. Such negative interactions between supporters of different parties have shown to reduce enthusiasm about political campaigns in general, potentially widening the cleavage between highly engaged partisans and less affiliated citizens who may view such forms of aggressive and divisive politics as distasteful.

For Twitter sceptics, one concern is that the short length of Twitter messages does not allow for meaningful and in-depth discussions around complex political issues. While it is certainly true that expression within 140 characters is limited, one third of tweets between supporters of different parties included links to external sources such as news stories, blog posts, or YouTube videos. Such indirect sourcing can thereby constitute a means of expanding dialogue and debate.

Accordingly, although it is common to view Twitter as largely a platform for self-expression via short tweets, there may be a wider collective dimension to both users and the population at large as a steady stream of both individual viewpoints and referenced sources drive learning and additional exchange. If these exchanges happen across partisan boundaries, they can contribute to greater collective awareness and learning for the citizenry at large.

As the next federal election approaches in 2015, with younger voters gravitating online – especially via mobile devices, and with traditional polling increasingly under siege as less reliable than in the past, all major parties will undoubtedly devote more energy and resources to social media strategies including, perhaps most prominently, an effective usage of Twitter.

Partisan Politics versus Politics 2.0

In a still-nascent era likely to be shaped by the rise of social media and a more participative Internet on the one hand, and the explosion of ‘big data’ on the other hand, the prominence of Twitter in shaping political discourse seems destined to heighten. Our preliminary analysis suggests an important cleavage between traditional political processes and parties – and wider dynamics of political learning and exchange across a changing society that is more fluid in its political values and affiliations.

Within existing democratic structures, Twitter is viewed by political parties as primarily a platform for messaging and branding, thereby mobilizing members with shared viewpoints and attacking opposing interests. Our own analysis of Canadian electoral tweets both amongst partisans and across party lines underscores this point. The nexus between partisan operatives and new media formations will prove to be an increasingly strategic dimension to campaigning going forward.

More broadly, however, Twitter is a source of information, expression, and mobilization across a myriad of actors and formations that may not align well with traditional partisan organizations and identities. Social movements arising during the Arab Spring, amongst Mexican youth during that country’s most recent federal elections and most recently in Ukraine are cases in point. Across these wider societal dimensions – especially consequential in newly emerging democracies, the tremendous potential of platforms such as Twitter may well lie in facilitating new and much more open forms of democratic engagement that challenge our traditional constructs.

In sum, we are witnessing the inception of new forms of what can be dubbed ‘Politics 2.0’ that denotes a movement of both opportunities and challenges likely to play out differently across democracies at various stages of socio-economic, political, and digital development. Whether Twitter and other likeminded social media platforms enable inclusive and expansionary learning, or instead engrain divisive polarized exchange, has yet to be determined. What is clear however is that on Twitter, in some instances, birds of a feather do flock together as they do on political blogs. But in other instances, Twitter can play an important role to foster cross parties communication in the online political arenas.

Read the full article: Gruzd, A., and Roy, J. (2014) Investigating Political Polarization on Twitter: A Canadian Perspective. Policy and Internet 6 (1) 28-48.

Also read: Gruzd, A. and Tsyganova, K. Information wars and online activism during the 2013/2014 crisis in Ukraine: Examining the social structures of Pro- and Anti-Maidan groups. Policy and Internet. Early View April 2015: DOI: 10.1002/poi3.91

Anatoliy Gruzd is Associate Professor in the Ted Rogers School of Management and Director of the Social Media Lab at Ryerson University, Canada. Jeffrey Roy is Professor in the School of Public Administration at Dalhousie University’s Faculty of Management. His most recent book was published in 2013 by Springer: From Machinery to Mobility: Government and Democracy in a Participative Age.

Tracing our every move: Big data and multi-method research

30 Apr 2015

There is a lot of excitement about ‘big data’, but the potential for innovative work on social and cultural topics far outstrips current data collection and analysis techniques. Image by IBM Deutschland.

Using anything digital always creates a trace. The more digital ‘things’ we interact with, from our smart phones to our programmable coffee pots, the more traces we create. When collected together these traces become big data. These collections of traces can become so large that they are difficult to store, access and analyze with today’s hardware and software. But as a social scientist I’m interested in how this kind of information might be able to illuminate something new about societies, communities, and how we interact with one another, rather than engineering challenges.

Social scientists are just beginning to grapple with the technical, ethical, and methodological challenges that stand in the way of this promised enlightenment.

Social scientists are just beginning to grapple with the technical, ethical, and methodological challenges that stand in the way of this promised enlightenment. Most of us are not trained to write database queries or regular expressions, or even to communicate effectively with those who are trained. Ethical questions arise with informed consent when new analytics are created. Even a data scientist could not know the full implications of consenting to data collection that may be analyzed with currently unknown techniques. Furthermore, social scientists tend to specialize in a particular type of data and analysis, surveys or experiments and inferential statistics, interviews and discourse analysis, participant observation and ethnomethodology, and so on. Collaborating across these lines is often difficult, particularly between quantitative and qualitative approaches. Researchers in these areas tend to ask different questions and accept different kinds of answers as valid.

Yet trace data does not fit into the quantitative / qualitative binary. The trace of a tweet includes textual information, often with links or images and metadata about who sent it, when and sometimes where they were. The traces of web browsing are also largely textual with some audio/visual elements. The quantity of these textual traces often necessitates some kind of initial quantitative filtering, but it doesn’t determine the questions or approach.

The challenges are important to understand and address because the promise of new insight into social life is real. Large-scale patterns become possible to detect, for example according to one study of mobile phone location data one’s future location is 93% predictable (Song, Qu, Blum & Barabási, 2010), despite great variation in the individual patterns. This new finding opens up further possibilities for comparison and understanding the context of these patterns. Are locations more or less predictable among people with different socio-economic circumstances? What are the key differences between the most and least predictable?

Computational social science is often associated with large-scale studies of anonymized users such as the phone location study mentioned above, or participation traces of those who contribute to online discussions. Studies that focus on limited information about a large number of people are only one type, which I call horizontal trace data. Other studies that work in collaboration with informed participants can add context and depth by asking for multiple forms of trace data and involving participants in interpreting them — what I call the vertical trace data approach.

In my doctoral dissertation I took the vertical approach to examining political information gathering during an election, gathering participants’ web browsing data with their informed consent and interviewing them in person about the context (Menchen-Trevino 2012). I found that access to websites with political information was associated with self-reported political interest, but access to election-specific pages was not. The most active election-specific browsing came from those who were undecided on election day, while many of those with high political interest had already decided whom to vote for before the election began. This is just one example of how digging futher into such data can reveal that what is true for larger categories (political information in general) may not be true, and in fact can be misleading for smaller domains (election-specific browsing). Vertical trace data collection is difficult, but it should be an important component of the project of computational social science.

Read the full article: Menchen-Trevino, E. (2013) Collecting vertical trace data: Big possibilities and big challenges for multi-method research. Policy and Internet 5 (3) 328-339.


Menchen-Trevino, E. (2013) Collecting vertical trace data: Big possibilities and big challenges for multi-method research. Policy and Internet 5 (3) 328-339.

Menchen-Trevino, E. (2012) Partisans and Dropouts?: News Filtering in the Contemporary Media Environment. Northwestern University, Evanston, Illinois.

Song, C., Qu, Z., Blumm, N., & Barabasi, A.-L. (2010) Limits of Predictability in Human Mobility. Science 327 (5968) 1018–1021.

Erica Menchen-Trevino is an Assistant Professor at Erasmus University Rotterdam in the Media & Communication department. She researches and teaches on topics of political communication and new media, as well as research methods (quantitative, qualitative and mixed).

After dinner: the best time to create 1.5 million dollars of ground-breaking science

24 Apr 2015

Count this! In celebration of the International Year of Astronomy 2009, NASA’s Great Observatories — the Hubble Space Telescope, the Spitzer Space Telescope, and the Chandra X-ray Observatory — collaborated to produce this image of the central region of our Milky Way galaxy. Image: Nasa Marshall Space Flight Center

Since it first launched as a single project called Galaxy Zoo in 2007, the Zooniverse has grown into the world’s largest citizen science platform, with more than 25 science projects and over 1 million registered volunteer citizen scientists. While initially focused on astronomy projects, such as those exploring the surfaces of the moon and the planet Mars, the platform now offers volunteers the opportunity to read and transcribe old ship logs and war diaries, identify animals in nature capture photos, track penguins, listen to whales communicating and map kelp from space.

These projects are examples of citizen science; collaborative research undertaken by professional scientists and members of the public. Through these projects, individuals who are not necessarily knowledgeable about or familiar with science can become active participants in knowledge creation (such as in the examples listed in the Chicago Tribune: Want to aid science? You can Zooniverse).

The Zooniverse is a predominant example of citizen science projects that have enjoyed particularly widespread popularity and traction online.

Although science-public collaborative efforts have long existed, the Zooniverse is a predominant example of citizen science projects that have enjoyed particularly widespread popularity and traction online. In addition to making science more open and accessible, online citizen science accelerates research by leveraging human and computing resources, tapping into rare and diverse pools of expertise, providing informal scientific education and training, motivating individuals to learn more about science, and making science fun and part of everyday life.

While online citizen science is a relatively recent phenomenon, it has attracted considerable academic attention. Various studies have been undertaken to examine and understand user behaviour, motivation, and the benefits and implications of different projects for them. For instance, Sauermann and Franzoni’s analysis of seven Zooniverse projects (Solar Stormwatch, Galaxy Zoo Supernovae, Galaxy Zoo Hubble, Moon Zoo, Old Weather, The Milkyway Project, and Planet Hunters) found that 60 percent of volunteers never return to a project after finishing their first session of contribution. By comparing contributions to these projects with those of research assistants and Amazon Mechanical Turk workers, they also calculated that these voluntary efforts amounted to an equivalent of $1.5 million in human resource costs.

Our own project on the taxonomy and ecology of contributions to the Zooniverse examines the geographical, gendered and temporal patterns of contributions and contributors to 17 Zooniverse projects between 2009 and 2013. Our preliminary results show that:

  • The geographical distribution of volunteers and contributions is highly uneven, with the UK and US contributing the bulk of both. Quantitative analysis of 130 countries show that of three factors – population, GDP per capita and number of Internet users – the number of Internet users is most strongly correlated with the number of volunteers and number of contributions. However, when population is controlled, GDP per capita is found to have greater correlation with numbers of users and volunteers. The correlations are positive, suggesting that wealthier (or more developed) countries are more likely to be involved in the citizen science projects.

The Global distribution of contributions to the projects within our dataset of 35 million records. The number of contributions of each country is normalized to the population of the country.

  • Female volunteers are underrepresented in most countries. Very few countries have gender parity in participation. In many other countries, women make up less than one-third of number of volunteers whose gender is known. The female ratio of participation in the UK and Australia, for instance, is 25 per cent, while the figures for US, Canada and Germany are between 27 and 30 per cent. These figures are notable when compared with the percentage of academic jobs in the sciences held by women. In the UK, women make up only 30.3 percent of full time researchers in Science, Technology, Engineering and Mathematics (STEM) departments (UKRC report, 2010), and 24 per cent in the United States (US Department of Commerce report, 2011).
  • Our analysis of user preferences and activity show that in general, there is a strong subject preference among users, with two main clusters evident among users who participate in more than one project. One cluster revolves around astrophysics projects. Volunteers in these projects are more likely to take part in other astrophysics projects, and when one project ends, volunteers are more likely to start a new project within this cluster. Similarly, volunteers in the other cluster, which are concentrated around life and Earth science projects, have a higher likelihood of being involved in other life and Earth science projects than in astrophysics projects. There is less cross-project involvement between the two main clusters.

Dendrogram showing the overlap of contributors between projects. The scale indicates the similarity between the pools of contributors to pairs of projects. Astrophysics (blue) and Life-Earth Science (green and brown) projects create distinct clusters. Old Weather 1 and WhaleFM are exceptions to this pattern, and Old Weather 1 has the most distinct pool of contributors.

  • In addition to a tendency for cross-project activity to be contained within the same clusters, there is also a gendered pattern of engagement in various projects. Females make up more than half of gender-identified volunteers in life science projects (Snapshot Serengeti, Notes from Nature and WhaleFM have more than 50 per cent of women contributors). In contrast, the proportions of women are lowest in astrophysics projects (Galaxy Zoo Supernovae and Planet Hunters have less than 20 per cent of female contributors). These patterns suggest that science subjects in general are gendered, a finding that correlates with those by the US National Science Foundation (2014). According to an NSF report, there are relatively few women in engineering (13 per cent), computer and mathematical sciences (25 per cent), but they are well-represented in the social sciences (58 per cent) and biological and medical sciences (48 per cent).
  • For the 20 most active countries (led by the UK, US and Canada), the most productive hours in terms of user contributions are between 8pm and 10pm. This suggests that citizen science is an after-dinner activity (presumably, reflecting when most people have free time before bed). This general pattern corresponds with the idea that many types of online peer-production activities, such as citizen science, are driven by ‘cognitive surplus’, that is, the aggregation of free time spent on collective pursuits (Shirky, 2010).

These are just some of the results of our study, which has found that despite being informal, relatively more open and accessible, online citizen science exhibits similar geographical and gendered patterns of knowledge production as professional, institutional science. In other ways, citizen science is different. Unlike institutional science, the bulk of citizen science activity happens late in the day, after the workday has ended and people are winding down after dinner and before bed.

We will continue our investigations into the patterns of activity in citizen science and the behaviour of citizen scientists, in order to help improve ways to make science more accessible in general and to tap into the resources of the public for scientific knowledge production. It is anticipated that upcoming projects on the Zooniverse will be more diversified and include topics from the humanities and social sciences. Towards this end, we aim to continue our investigations into patterns of activity on the citizen science platform, and the implications of a wider range of projects on the user base (in terms of age, gender and geographical coverage) and on user behaviour.


Sauermann, H., & Franzoni, C. (2015). Crowd science user contribution patterns and their implications. Proceedings of the National Academy of Sciences, 112(3), 679-684.

Shirky, C. (2010). Cognitive surplus: Creativity and generosity in a connected age. Penguin: London.

Taha Yasseri is the Research Fellow in Computational Social Science at the OII. Prior to coming to the OII, he spent two years as a Postdoctoral Researcher at the Budapest University of Technology and Economics, working on the socio-physical aspects of the community of Wikipedia editors, focusing on conflict and editorial wars, along with Big Data analysis to understand human dynamics, language complexity, and popularity spread. He has interests in analysis of Big Data to understand human dynamics, government-society interactions, mass collaboration, and opinion dynamics.

How can big data be used to advance dementia research?

16 Mar 2015

Image by K. Kendall of “Sights and Scents at the Cloisters: for people with dementia and their care partners”; a program developed in consultation with the Taub Institute for Research on Alzheimer’s Disease and the Aging Brain, Alzheimer’s Disease Research Center at Columbia University, and the Alzheimer’s Association.

Dementia affects about 44 million individuals, a number that is expected to nearly double by 2030 and triple by 2050. With an estimated annual cost of USD 604 billion, dementia represents a major economic burden for both industrial and developing countries, as well as a significant physical and emotional burden on individuals, family members and caregivers. There is currently no cure for dementia or a reliable way to slow its progress, and the G8 health ministers have set the goal of finding a cure or disease-modifying therapy by 2025. However, the underlying mechanisms are complex, and influenced by a range of genetic and environmental influences that may have no immediately apparent connection to brain health.

Of course medical research relies on access to large amounts of data, including clinical, genetic and imaging datasets. Making these widely available across research groups helps reduce data collection efforts, increases the statistical power of studies and makes data accessible to more researchers. This is particularly important from a global perspective: Swedish researchers say, for example, that they are sitting on a goldmine of excellent longitudinal and linked data on a variety of medical conditions including dementia, but that they have too few researchers to exploit its potential. Other countries will have many researchers, and less data.

‘Big data’ adds new sources of data and ways of analysing them to the repertoire of traditional medical research data. This can include (non-medical) data from online patient platforms, shop loyalty cards, and mobile phones — made available, for example, through Apple’s ResearchKit, just announced last week. As dementia is believed to be influenced by a wide range of social, environmental and lifestyle-related factors (such as diet, smoking, fitness training, and people’s social networks), and this behavioural data has the potential to improve early diagnosis, as well as allow retrospective insights into events in the years leading up to a diagnosis. For example, data on changes in shopping habits (accessible through loyalty cards) may provide an early indication of dementia.

However, there are many challenges to using and sharing big data for dementia research. The technology hurdles can largely be overcome, but there are also deep-seated issues around the management of data collection, analysis and sharing, as well as underlying people-related challenges in relation to skills, incentives, and mindsets. Change will only happen if we tackle these challenges at all levels jointly.

As data are combined from different research teams, institutions and nations — or even from non-medical sources — new access models will need to be developed that make data widely available to researchers while protecting the privacy and other interests of the data originator. Establishing robust and flexible core data standards that make data more sharable by design can lower barriers for data sharing, and help avoid researchers expending time and effort trying to establish the conditions of their use.

At the same time, we need policies that protect citizens against undue exploitation of their data. Consent needs to be understood by individuals — including the complex and far-reaching implications of providing genetic information — and should provide effective enforcement mechanisms to protect them against data misuse. Privacy concerns about digital, highly sensitive data are important and should not be de-emphasised as a subordinate goal to advancing dementia research. Beyond releasing data in a protected environments, allowing people to voluntarily “donate data”, and making consent understandable and enforceable, we also need governance mechanisms that safeguard appropriate data use for a wide range of purposes. This is particularly important as the significance of data changes with its context of use, and data will never be fully anonymisable.

We also need a favourable ecosystem with stable and beneficial legal frameworks, and links between academic researchers and private organisations for exchange of data and expertise. Legislation needs to account of the growing importance of global research communities in terms of funding and making best use of human and data resources. Also important is sustainable funding for data infrastructures, as well as an understanding that funders can have considerable influence on how research data, in particular, are made available. One of the most fundamental challenges in terms of data sharing is that there are relatively few incentives or career rewards that accrue to data creators and curators, so ways to recognise the value of shared data must be built into the research system.

In terms of skills, we need more health-/bioinformatics talent, as well as collaboration with those disciplines researching factors “below the neck”, such as cardiovascular or metabolic diseases, as scientists increasingly find that these may be associated with dementia to a larger extent than previously thought. Linking in engineers, physicists or innovative private sector organisations may prove fruitful for tapping into new skill sets to separate the signal from the noise in big data approaches.

In summary, everyone involved needs to adopt a mindset of responsible data sharing, collaborative effort, and a long-term commitment to building two-way connections between basic science, clinical care and the healthcare in everyday life. Fully capturing the health-related potential of big data requires “out of the box” thinking in terms of how to profit from the huge amounts of data being generated routinely across all facets of our everyday lives. This sort of data offers ways for individuals to become involved, by actively donating their data to research efforts, participating in consumer-led research, or engaging as citizen scientists. Empowering people to be active contributors to science may help alleviate the common feeling of helplessness faced by those whose lives are affected by dementia.

Of course, to do this we need to develop a culture that promotes trust between the people providing the data and those capturing and using it, as well as an ongoing dialogue about new ethical questions raised by collection and use of big data. Technical, legal and consent-related mechanisms to protect individual’s sensitive biomedical and lifestyle-related data against misuse may not always be sufficient, as the recent Nuffield Council on Bioethics report has argued. For example, we need a discussion around the direct and indirect benefits to participants of engaging in research, when it is appropriate for data collected for one purpose to be put to others, and to what extent individuals can make decisions particularly on genetic data, which may have more far-reaching consequences for their own and their family members’ professional and personal lives if health conditions, for example, can be predicted by others (such as employers and insurance companies).

Policymakers and the international community have an integral leadership role to play in informing and driving the public debate on responsible use and sharing of medical data, as well as in supporting the process through funding, incentivising collaboration between public and private stakeholders, creating data sharing incentives (for example, via taxation), and ensuring stability of research and legal frameworks.

Dementia is a disease that concerns all nations in the developed and developing world, and just as diseases have no respect for national boundaries, neither should research into dementia (and the data infrastructures that support it) be seen as a purely national or regional priority. The high personal, societal and economic importance of improving the prevention, diagnosis, treatment and cure of dementia worldwide should provide a strong incentive for establishing robust and safe mechanisms for data sharing.

Read the full report: Deetjen, U., E. T. Meyer and R. Schroeder (2015) Big Data for Advancing Dementia Research. Paris, France: OECD Publishing.

Should we use old or new rules to regulate warfare in the information age?

09 Mar 2015

Critical infrastructures such as electric power grids are susceptible to cyberwarfare, leading to economic disruption in the event of massive power outages. Image courtesy of Pacific Northwest National Laboratory.

Before the pervasive dissemination of Information and Communication Technologies (ICTs), the use of information in war waging referred to intelligence gathering and propaganda. In the age of the information revolution things have radically changed. Information has now acquired a pivotal role in contemporary warfare, for it has become both an effective target and a viable means. These days, we use ‘cyber warfare’ to refer to the use of ICTs by state actors to disruptive (or even destructive) ends.

As contemporary societies grow increasingly dependant on ICTs, any form of attack that involves their informational infrastructures poses serious risks and raises the need for adequate defence and regulatory measures. However, such a need contrasts with the novelty of this phenomenon, with cyber warfare posing a radical shift in the paradigm within which warfare has been conceived so far. In the new paradigm, impairment of functionality, disruption, and reversible damage substitute for bloodshed, destruction, and casualties. At the same time, the intangible environment (the cyber sphere), targets, and agents substitute for beings in blood and flesh, firearms, and physical targets (at least in the non-kinetic instances of cyber warfare).

The paradigm shift raises questions about the adequacy and efficacy of existing laws and ethical theories for the regulation of cyber warfare. Military experts, strategy planners, law- and policy-makers, philosophers, and ethicists all participate in discussions around this problem. The debate is polarised around two main approaches: (1) the analogy approach, and (2) the discontinuous approach. The former stresses that the regulatory gap concerning cyber warfare is only apparent, insofar as cyber conflicts are not radically different from other forms of conflicts. As Schmitt put it “a thick web of international law norms suffuses cyber-space. These norms both outlaw many malevolent cyber-operations and allow states to mount robust responses”. The UN Charter, NATO Treaty, Geneva Conventions, the first two Additional Protocols, and Convention restricting or prohibiting the use of certain conventional weapons are more than sufficient to regulate cyber warfare; all that is needed is an in-depth analysis of such laws and an adequate interpretation. This is the approach underpinning, for example, the so-called Tallinn Manual.

The opposite position, the discontinuous approach, stresses the novelty of cyber conflicts and maintains that existing ethical principles and laws are not adequate to regulate this phenomenon. Just War Theory is the main object of contention in this case. Those defending this approach argue that Just War Theory is not the right conceptual tool to address non-kinetic forms of warfare, for it assumes bloody and violent warfare occurring in the physical domain. This view sees cyber warfare as one of the most compelling signs of the information revolution — as Luciano Floridi has put it “those who live by the digit, die by the digit”. As such, it claims that any successful attempt to regulate cyber warfare cannot ignore the conceptual and ethical changes that such a revolution has brought about.

These two approaches have proceeded in parallel over the last decade, stalling rather than fostering a fruitful debate. There is therefore a clear need to establish a coordinated interdisciplinary approach that allows for experts with different backgrounds to collaborate and find a common ground to overcome the polarisation of the discussion. This is precisely the goal of the project financed by the NATO Cooperative Cyber Defence Centre of Excellence (NATO CCD COE) and that I co-led with Lt Glorioso, a representative of the Centre. The project has convened a series of workshops gathering international experts in the fields of law, military strategies, philosophy, and ethics to discuss the ethical and regulatory problems posed by cyber warfare.

The first workshop was held in 2013 at the Centro Alti Studi Difesa in Rome and had the goal of launching an interdisciplinary and coordinated approach to the problems posed by cyber warfare. The second event was hosted in last November at Magdalen College, Oxford. It relied on the approach established in 2013 to foster an interdisciplinary discussion on issues concerning attribution, the principle of proportionality, the distinction between combatant and non-combatant, and the one between pre-emption and prevention. A report on the workshop has now been published surveying the main positions and the key discussion points that emerged during the meeting.

One of most relevant points concerned the risks that cyber warfare poses for the established political equilibrium and maintaining peace. The risk of escalation, both in the nature and in the number of conflicts, was perceived as realistic by both the speakers and the audience attending the workshop. Deterrence therefore emerged as one of the most pressing challenges posed by cyber warfare – and one that experts need to take into account in their efforts to develop new forms of regulation in support of peace and stability in the information age.

Read the full report: Corinne J.N. Cath, Ludovica Glorioso, Maria Rosaria Taddeo (2015) Ethics and Policies for Cyber Warfare [PDF, 400kb]. Report on the NATO CCD COE Workshop on ‘Ethics and Policies for Cyber Warfare’, Magdalen College, Oxford, 11-12 November 2014.

Dr Mariarosaria Taddeo is a researcher at the Oxford Internet Institute, University of Oxford. Her main research areas are information and computer ethics, philosophy of information, philosophy of technology, ethics of cyber-conflict and cyber-security, and applied ethics. She also serves as president of the International Association for Computing and Philosophy.

How do the mass media affect levels of trust in government?

04 Mar 2015

The South Korean Government, as well as the Seoul Metropolitan Government have gone to great lengths to enhance their openness, using many different ICTs. Seoul at night by jonasginter.

Ed: You examine the influence of citizens’ use of online mass media on levels of trust in government. In brief, what did you find?

Greg: As I explain in the article, there is a common belief that mass media outlets, and especially online mass media outlets, often portray government in a negative light in an effort to pique the interest of readers. This tendency of media outlets to engage in ‘bureaucracy bashing’ is thought, in turn, to detract from the public’s support for their government. The basic assumption underpinning this relationship is that the more negative information on government there is, the more negative public opinion. However, in my analyses, I found evidence of a positive indirect relationship between citizens’ use of online mass media outlets and their levels of trust in government. Interestingly, however, the more frequently citizens used online mass media outlets for information about their government, the weaker this association became. These findings challenge conventional wisdom that suggests greater exposure to mass media outlets will result in more negative perceptions of the public sector.

Ed: So you find that that the particular positive or negative spin of the actual message may not be as important as the individuals’ sense that they are aware of the activities of the public sector. That’s presumably good news — both for government, and for efforts to ‘open it up’?

Greg: Yes, I think it can be. However, a few important caveats apply. First, the positive relationship between online mass media use and perceptions of government tapers off as respondents made more frequent use of online mass media outlets. In the study, I interpreted this to mean that exposure to mass media had less of an influence upon those who were more aware of public affairs, and more of an influence upon those who were less aware of public affairs. Therefore, there is something of a diminishing returns aspect to this relationship. Second, this study was not able to account for the valence (ie how positive or negative the information is) of information respondents were exposed to when using online mass media. While some attempts were made to control for valance by adding different control variables, further research drawing upon experimental research designs would be useful in substantiating the relationship between the valence of information disseminated by mass media outlets and citizens’ perceptions of their government.

Ed: Do you think governments are aware of this relationship — ie that an indirect effect of being more open and present in the media, might be increased citizen trust — and that they are responding accordingly?

Greg: I think that there is a general idea that more communication is better than less communication. However, at the same time there is a lot of evidence to suggest that some of the more complex aspects of the relationship between openness and trust in government go unaccounted for in current attempts by public sector organizations to become more open and transparent. As a result, this tool that public organizations have at their disposal is not being used as effectively as it could be, and in some instances is being used in ways that are counterproductive–that is, actually decreasing citizen trust in government. Therefore, in order for governments to translate greater openness into greater trust in government, more refined applications are necessary.

Ed: I know there are various initiatives in the UK — open government data / FoIs / departmental social media channels etc. — aimed at a general opening up of government processes. How open is the Korean government? Is a greater openness something they might adopt (or are adopting?) as part of a general aim to have a more informed and involved — and therefore hopefully more trusting — citizenry?

Greg: The South Korean Government, as well as the Seoul Metropolitan Government have gone to great lengths to enhance their openness. Their strategy has made use of different ICTs, such as e-government websites, social media accounts, non-emergency call centers, and smart phone apps. As a result, many now say that attempts by the Korean Government to become more open are more advanced than in many other areas of the developed world. However, the persistent issue in South Korea, as elsewhere, is whether these attempts are having the intended impact. A lot of empirical research has found, for example, that various attempts at becoming more open by many governments around the world have fallen short of creating a more informed and involved citizenry.

Ed: Finally — is there much empirical work or data in this area?

Greg: While there is a lot of excellent empirical research from the field of political science that has examined how mass media use relates to citizens’ perceptions of politicians, political preferences, or their levels of political knowledge, this topic has received almost no attention at all in public management/administration. This lack of discussion is surprising, given mass media has long served as a key means of enhancing the transparency and accountability of public organizations.

Read the full article: Porumbescu, G. (2013) Assessing the Link Between Online Mass Media and Trust in Government: Evidence From Seoul, South Korea. Policy & Internet 5 (4) 418-443.

Greg Porumbescu was talking to blog editor David Sutcliffe.

Gregory Porumbescu is an Assistant Professor at the Northern Illinois University Department of Public Administration. His research interests primarily relate to public sector applications of information and communications technology, transparency and accountability, and citizens’ perceptions of public service provision.

Don’t knock clickivism: it represents the political participation aspirations of the modern citizen

01 Mar 2015

Following a furious public backlash in 2011, the UK government abandoned plans to sell off 258,000 hectares of state-owned woodland. The public forest campaign by 38 Degrees gathered over half a million signatures.

How do we define political participation? What does it mean to say an action is ‘political’? Is an action only ‘political’ if it takes place in the mainstream political arena; involving government, politicians or voting? Or is political participation something that we find in the most unassuming of places, in sports, home and work? This question, ‘what is politics’ is one that political scientists seem to have a lot of trouble dealing with, and with good reason. If we use an arena definition of politics, then we marginalise the politics of the everyday; the forms of participation and expression that develop between the cracks, through need and ingenuity. However, if we broaden our approach as so to adopt what is usually termed a process definition, then everything can become political. The problem here is that saying that everything is political is akin to saying nothing is political, and that doesn’t help anyone.

Over the years, this debate has plodded steadily along, with scholars on both ends of the spectrum fighting furiously to establish a working understanding. Then, the Internet came along and drew up new battle lines. The Internet is at its best when it provides a home for the disenfranchised, an environment where like-minded individuals can wipe free the dust of societal disassociation and connect and share content. However, the Internet brought with it a shift in power, particularly in how individuals conceptualised society and their role within it. The Internet, in addition to this role, provided a plethora of new and customisable modes of political participation. From the onset, a lot of these new forms of engagement were extensions of existing forms, broadening the everyday citizen’s participatory repertoire. There was a move from voting to e-voting, petitions to e-petitions, face-to-face communities to online communities; the Internet took what was already there and streamlined it, removing those pesky elements of time, space and identity.

Yet, as the Internet continues to develop, and we move into the ultra-heightened communicative landscape of the social web, new and unique forms of political participation take root, drawing upon those customisable environments and organic cyber migrations. The most prominent of these is clicktivism, sometimes also, unfairly, referred to as slacktivism. Clicktivism takes the fundamental features of browsing culture and turns them into a means of political expression. Quite simply, clicktivism refers to the simplification of online participatory processes: one-click online petitions, content sharing, social buttons (e.g. Facebook’s ‘Like’ button) etc.

For the most part, clicktivism is seen in derogatory terms, with the idea that the streamlining of online processes has created a societal disposition towards feel-good, ‘easy’ activism. From this perspective, clicktivism is a lazy or overly-convenient alternative to the effort and legitimacy of traditional engagement. Here, individuals engaging in clicktivism may derive some sense of moral gratification from their actions, but clicktivism’s capacity to incite genuine political change is severely limited. Some would go so far as to say that clicktivism has a negative impact on democratic systems, as it undermines an individual’s desire and need to participate in traditional forms of engagement; those established modes which mainstream political scholars understand as the backbone of a healthy, functioning democracy.

This idea that clicktivism isn’t ‘legitimate’ activism is fuelled by a general lack of understanding about what clicktivism actually involves. As a recent development in observed political action, clicktivism has received its fair share of attention in the political participation literature. However, for the most part, this literature has done a poor job of actually defining clicktivism. As such, clicktivism is not so much a contested notion, as an ill-defined one. The extant work continues to describe clicktivism in broad terms, failing to effectively establish what it does, and does not, involve. Indeed, as highlighted, the mainstream political participation literature saw clicktivism not as a specific form of online action, but rather as a limited and unimportant mode of online engagement.

However, to disregard emerging forms of engagement such as clicktivism because they are at odds with long-held notions of what constitutes meaningful ‘political’ engagement is a misguided and dangerous road to travel. Here, it is important that we acknowledge that a political act, even if it requires limited effort, has relevance for the individual, and, as such, carries worth. And this is where we see clicktivism challenging these traditional notions of political participation. To date, we have looked at clicktivism through an outdated lens; an approach rooted in traditional notions of democracy. However, the Internet has fundamentally changed how people understand politics, and, consequently, it is forcing us to broaden our understanding of the ‘political’, and of what constitutes political participation.

The Internet, in no small part, has created a more reflexive political citizen, one who has been given the tools to express dissatisfaction throughout all facets of their life, not just those tied to the political arena. Collective action underpinned by a developed ideology has been replaced by project orientated identities and connective action. Here, an individual’s desire to engage does not derive from the collective action frames of political parties, but rather from the individual’s self-evaluation of a project’s worth and their personal action frames.

Simply put, people now pick and choose what projects they participate in and feel little generalized commitment to continued involvement. And it is clicktivism which is leading the vanguard here. Clicktivism, as an impulsive, non-committed online political gesture, which can be easily replicated and that does not require any specialized knowledge, is shaped by, and reinforces, this change. It affords the project-oriented individual an efficient means of political participation, without the hassles involved with traditional engagement.

This is not to say, however, that clicktivism serves the same functions as traditional forms. Indeed, much more work is needed to understand the impact and effect that clicktivist techniques can have on social movements and political issues. However, and this is the most important point, clicktivism is forcing us to reconsider what we define as political participation. It does not overtly engage with the political arena, but provides avenues through which to do so. It does not incite genuine political change, but it makes people feel as if they are contributing. It does not politicize issues, but it fuels discursive practices. It may not function in the same way as traditional forms of engagement, but it represents the political participation aspirations of the modern citizen. Clicktivism has been bridging the dualism between the traditional and contemporary forms of political participation, and in its place establishing a participatory duality.

Clicktivism, and similar contemporary forms of engagement, are challenging how we understand political participation, and to ignore them because of what they don’t embody, rather than what they do, is to move forward with eyes closed.

Read the full article: Halupka, M. (2014) Clicktivism: A Systematic Heuristic. Policy and Internet 6 (2) 115-132.

Max Halupka is a PhD candidate at the ANZOG Institute for Governance, University of Canberra. His research interests include youth political participation, e-activism, online engagement, hacktivism, and fluid participatory structures.