Wednesday, June 18, 2014

Sharing the Cup with the World

On June 17th, Ochoa was the One.

The 2014 World Cup represents fundamentally new stage in the history of World Cups.  World Cups have long been some of the largest international events of mass culture and broadcasting, they have also been celebrated at a local community level.  However, to a degree greater than ever before, the local, micro level of the cultural experiences of the World Cup have the potential to accumulate, spread and influence others well beyond the local.   

This photo-shopped mash up of Neo with Ochoa's face illustrates one of the ways that creative, digital culture invites participation and shapes peoples experience of the World Cup.  This image was created by someone during an impromptu photoshop contest in an online community.  The image was picked up and posted to another online community, where I saw it, and then screen captured it with my phone.  Then posted it on my Facebook feed, which I have reposted here.   Social media and the culture of creative participation have become sufficiently widespread that "everybody" can participate.  This changes what it is like to experience the World Cup-- making it more participatory, less scripted, and more unpredictable.   

Way back in 2007, Clay Shirky wrote that "Communications tools don't get socially interesting until they get technologically boring."  When these ordinary tools are in everyone's hands we do ordinary things differently.  Just watching a game, and posting a message about a goal on FB changes how we and others experience that event.  In millions of small ways more of the cultural creation related to the World Cup is taking place by all of us, between friends, within social networks, and across online communities.  More so than ever before, hundreds of millions of people are participating in their digital lives and the World Cup is part of that, and therefore, the participation of all those people becomes part of the World Cup.  Here are a bunch of ways that people can join the participatory culture is shaping the World Cup: 
  • National supporters groups like the American Outlaws reduce the transaction costs of traveling and increase the commitment and participation of fans.
  • Sport specific online communities (like r/Soccer on reddit)  hasten the enculturation of new fans, and create a platform for ordinary people to participate in the narrative and events of the World Cup.  
  • Social media like Facebook and Twitter make it easier to organize watch parties and for folks to comment on, and share about the World Cup, these include individual participation, as well as hundreds of local community soccer pages.
  • Soccer blogs and other semi-grass roots soccer media sources create and motivate greater levels of fan interaction in the mainstream media, through sites like Men in Blazers, (also on ESPN FC for World Cup).   
  • Attend local watch parties at a local bar.   Soccer provides an international index of soccer bars to see the next match.  
The World Cup is still a giant international media event, organized from the top down by a hugely wealthy sports organization of mixed repute (FIFA).  And watching a soccer game (football match) is still an individual act of mass media cultural consumption.  However, our experiences of the World Cup will be produced increasingly by how we share the cup with our friends, communities, and world beyond.  

Monday, March 31, 2014

Harbinger of change in the digital metropolis?

How does this threadlet reveal a glimmer of social change? This ask reddit thread asks teens of reddit what is cool nowadays?   User Pseudologiac notes that a cool thing about teenagers today is how open minded they are, in contrast to what the kids of the 80's and 90's might have experienced.  

Now, lets take this observation as an anecdotal harbinger, and just accept, for the point of argument that this is a legitimate observation of a form of social change taking place now.  What is it about contemporary life that is causing some populations of teenagers to move beyond the narrow-minded, balkanized subgroup mentality?  I and Simmel would argue that the experience of interacting in crosscutting social circles changes our mentality, makes us more cosmopolitan.  The city, and by extension, the even larger and more interactive digital metropolis has the potential to change us, and makes the intergroup hostility captured by movies like the Breakfast Club seem old and out of touch.   

Of course, there are also stories of bullying and dark sides of the teenage experience related to social computing.   So the change, if real, is not automatic, but it may have to do with the ways that people use social media and how they interact.  This brings us back to Rheingolds argument about the social importance of digital literacies, and the specific literacies that he recommends (managing attention, critical evaluation skills, sharing, collaboration, and network smarts).   Thesis:  populations where these digital literacies are more fully developed and universally distributed will be those where increased levels of participation in social media will be more likely to cause higher levels of movement towards a cosmopolitan mentality. 

Monday, November 5, 2012

Aggregation sites should standardize vote rates from polls

     The Pollster site operated through Huffington Post employs a thoughtful model for estimating a summary trendline from a wide variety of polls.  It deals with house effects, one-off poll weight discounting, timing range, sample size differences, integration of regional effects and relations between state and national polls.  For all of the smart modifications it seems that Pollster and other poll aggregation systems (like RCP) need to address a rather fundamental measurement issue:  a poll statistic is not always the same thing as the key population parameter that we want to estimate.  Rather than modeling the trajectory of the poll statistic we should strive to measure the rate at which voters prefer one candidate over the other.  However, because of the options to declare 'undecided' or 'other' the poll statistic can differ dramatically from the preference rate.  

Consider these recent polls from Colorado.

firm     dates     obama     romney    undecided   other
PPP   11/3 11/4      52        46         3          0
Lake  10/31 11/4     45        44         9          3                                

     Based simply on the polls the comparison between these two surveys seems to show a pronounced 7 point jump for Obama (from 45 to 52).   This is not actually the case though.  Rather than averaging across poll statistics we should standardize values into vote rates before plotting trends or otherwise summarizing values across different polls.  One way is to take polling statistics for candidate A and standardize by the sum of the polling statistics for both candidates. We can do the same for candidate B, although the measures are symmetrical.  This standardization prevents variation in the proportion 'undecided' or 'other' from distracting from our focus on estimating the population parameter of interest.  This standardization shows that the preference for Obama shifted from 50.5 to 53, and suggests a much more modest change in preference.  Another way to think about the difference is that the 45 for Obama makes it seem like a major drop, and it seems as though Romney polled higher than Obama on one of these polls (46 is bigger than 45, after all).  But, in fact, voters preferred Obama in both polls, just by different margins.  

  A / (A+B)   B / (A+B)

This works out in the case of these two polls as:

         Obama        Romney
firm     A/(A+B)      B/(A+B)    diff 
PPP     .530612      .469388   .061224
Lake    .505618      .494382   .011236

     Standardizing poll statistics into vote rates gets us closer to the rate at which voters prefer candidate A compared to candidate B.  Unstandardized poll results included in aggregation cause interpretations to combine changes in vote rates with changes in rates of 'undecided' or 'other' voting.  This can lead to faulty interpretations.  For instance, compare the two plots below (first plot reports vote rates, second is from the Pollster site).  Both use the same polling data, but the top plots vote rates between the top two candidates while the second reports poll percentages (along with the many model improvements mentioned earlier).

    It is hard to compare these plots directly because the Pollster trendline is based on many more observation events, is smoothed nicely, and reports an X axis based on a consistent time metric.  In contrast, my simplistic plot registers one unit of X as one poll.  The main result of these defects is that comparisons of temporal trends are difficult because of the distortion inherent in the first plot.  Despite that difficulty we can read a couple of helpful lessons.  Mainly, it seems that basing model values on the poll statistics rather than some standardized measure seems to run the risk of getting the size of the preference gap and the direction of the trends wrong.  We can see that in the most recent weeks where the gap in vote rate has increased slightly, and the size of the gap is actually much larger than the raw poll data seems to imply.  The poll data puts the race very close (about 0.6%; while the vote rate puts the gap at 1.9%).  
     These are very different lessons to derive from the same underlying data (state level polls, for the most part).  The vote rate seems to suggest that Obama is ahead by about 2 percentage points and that his advantage is increasing.  The poll statistics seem to suggest that the rates are nearly identical and Romney is gaining.   If we take these plots as predictions, the vote rate seems to predict a win by about 2% in Colorado, while the poll statistics suggest a much closer race.  Would a Pollster model based on standardized vote rates still register Colorado is a toss up, or as a leaning Obama?

You can create plots similar to the top plot too.

To create this one I simply followed these steps:

1.  Go to the state page for the pollster data:
2.  Select more data until it maxes out.
3.  Select, copy paste into Excel
4.  Reverse the order of the polls to put them in chronological order
5.  Create a denominator by adding vote rates of the two candidates together
6.  Divide each poll statistic by the denominator
7.  Calculate some type of moving average of the polls (I used 7 poll averages)
8.  Plot!

I pasted my data into a google spreadsheet.  Take a look if you are curious.

     The second sheet on the google spreadsheet is from Ohio poll data, and the plot is below.  It is not really necessary to include lines for both candidates, but the symmetric values highlight the trends in the data.  The growth in the vote rate gap is a little easier to observe than the poll summary, but otherwise the current gaps are similar (Pollster summary difference = 3.3; vote rate difference=3.28).

     The other major differences seem to be the timing and scale of the major fluctuations.   The timing difference is actually due to my hasty construction in the vote rate plot, where the polls are simply ordered and not representative of the actual time elapsed.  This visually compresses the early time period where polls were rare and stretches the recent weeks where polls are common.  Second is the peaks and valleys are more extreme in the standardized vote rate plot.  This difference is either due to the differences in the smoothing rules or to differences between the poll statistics and the vote rates.  Or, probably, some combination of both.
     In the case of Ohio, key aspects of both plots are pretty similar:  current trend direction (slight increase in Obama's edge) and approximate size of the that edge (3.3%).  However, in Colorado both the trend and the gap are strikingly different in the two plots. It would be interesting to note which other state level cases show potentially influential errors related to the unstandardized rates. It would be even more interesting to see a comparison from Pollster between their current model and plots that start from vote rates rather than poll statistics.
     The Pollster site has a long standing tradition of depicting the raw data in the plots-- however, it seems likely that the points and any trendline based on those points can lead to erroneous conclusions.  Through email discussion Dr. Simon Jackman indicated that he is well aware of the deficiencies of using raw poll data but the legacy of earlier plots seems to be an institutional constraint with Pollster.  Perhaps Pollster could institute an option that allows users to switch between the standardized and nonstandardized plots, similar to switching between linear and logged axes in other data visualizations (like Gapminder).     Doing so might lead to better understanding of poll trends and add to the numerate discussion of political and social data.

Tuesday, December 21, 2010

First-Digit Law and Google Ngram

The first-digit law [Benford's Law] describes how the leading digit in count data will tend to over-represent "1", and to a decreasing extent "2", "3" each value less common than the one before.  Google Ngram counts of number frequencies in their book corpus show a similar trend, which is interesting, since these values arise from such heterogeneous sources.

I ran the same set for the hundreds, and the results are similar.  Although, "800" is behaving differently than expected.  One possible explanation might be that our surplus "800"'s come from 1-800 phone numbers.   Running the same thing but substituting "101" for "100" etc. eliminates the 800 bulge that starts in the 1980's.

It is exciting to think about the potential to ask more socially interesting questions of this data.   Note, I stopped the graph at the default (2000).  Although the data set extends to 2008, it seems that there must be data missing after 2000 because of many values that should not drop in concert are dropping.

Friday, December 3, 2010

It takes a digital metropolis to create a belly teddy bear. . . .

It may take a village to raise a child, but it takes a digital metropolis to create a belly teddy bear.  I posted briefly about this before, but there is much more to the story than it might seem at first.

My daughter knows the alien cartoon logo from reddit as the "Belly Teddy Bear."  The cute little image adorns the header, and occasionally appears in advertisement space for the site, which I read pretty regularly in the evenings.   Sydney was and remains struck by how cute the image is.   I hunted for a plush version on sale from reddit, but I quickly learned that no such toy existed.  Mass production was out.  I gave up on the gift idea.  But she kept remarking about the "belly teddy bear" so I tried to think creatively.  Lacking skills or friends with the necessary skills I had to think outside the constraints of my geographic village and my personal social network.

I learned about Etsy from a student in my group processes seminar and searched there until I found someone [Ning Ning Gong] who made knit stuffed animals of her own design (among other things).  I contacted her, we discussed the design, and she agreed to make the "belly teddy bear".   In fact, she worked on it right away and shipped it early, which made Sydney very happy on her birthday.

So, via email alerts and messages on a webpage, I contracted the production of a unique, hand crafted gift for $28 plus shipping, with a person I had never met, who lived in another country and with whom I was not likely to interact again.

How did the digital metropolis make this all possible?  In other words, what digital and social infrastructure did we rely on?  At the very least, we needed:

  • Secure monetary transactions at a distance:  Paypal
  • Efficient distributed web hosting for craft producers:   Etsy
  • Source for stylized but not fully commercialized image:  reddit

Yet these three examples are the tips of branches in the infrastructure;  these in turn, rely upon the lowered transaction costs, ease of search, and distribution of information that is generic to the internet; plus we need the communities and commercial entities that drove the creation of the tools and data management systems behind the scenes that make possible the digital systems we use.   We needed monetization and we needed free information.  We needed property rights, contracts, and we needed a bit of trust.  We needed digital cameras, structured query language, and free and open source software.  We needed online community, reputation systems, and socially based information aggregation.

In "The Internet? Bah!"  Clifford Stoll (1995) famously missed the mark on the potential for the internet to change our lives and alter the way we do things.   Rather than consider the ways that he has been proven wrong by developments in the last 15 years, I would like consider how we can study the current capacities, events and changes in light of what sorts of trends we already see developing.   What does the production of "the belly teddy bear" reveal?  How will that thread of the future contribute to the fabric of everyday life that we will take for granted in the next 15 years?