March 1, 2013

Tales of Big Data

Filed under: digital humanities,profession,writing — jrice @ 12:36 pm


It’s no secret that ours is a wold of pattern formation. I often introduced the concept of pattern formation when I was a WAC director (and later used it as exigence in “Networked Assessment”) by drawing attention to the infamous underwear bomber, a situation where pattern formation would have revealed his intent (instead of catching him while in the act).

The Digital Humanities, too, is interested in patterns. I’ve understood the literary/historical side of its work (even in the call to “build”) as the re-energized study of the novel. Either the novel itself or some variation of Google’s Ngram (the novel over time) will reveal new patterns previously unidentified in whatever kind of hermeneutical work done without algorithm. The 19th century, as our inside joke goes, is suddenly discovered to be all about transportation once such a pattern is identified over a novel or series of novels. Obviously, the work is important to people who study texts in order to produce interpretive meaning.

Big data is a sports issue as well. Henry Abbot on what is basically the Moneyball question asks how stats produce patterns which, in turn, can inform general managers about a player’s likelihood of contributing to a team’s success. Big data, here as well, is about interpreting. The payoff of that interpretation, though, is financial. Find the right pattern, and you may find the right mix of players for success.

Out of all the recent discussion of MOOCs, the one pattern I notice is the utopian/distopic dichotomy. This pattern attracts my attention for a number of reasons. One is it’s obvious (how many anti-MOOC pieces note the benefits of face to face education; how many pro-MOOC pieces note the open access issue). The second reason is that I am basically hearing the exigence I discovered for writing Digital Detroit: Detroit is in ruins/Detroit is on the verge of rejuvenation. It was not too difficult to see that no space (city or otherwise) could be reduced to such a simple binary. MOOCs, on the other hand, are treated as such a binary. We might call this interpretative gesture small data since it is concerned with only pro/con positions or the re-circulation of commonplaces.

Thus, there is one text to mine: the commonplace. We can have a library of commonplaces (and we always do: Middle East, Detroit, presidential campaign, gun control….). Big data, as the Digital Humanities like to call it, is really just building a library of texts so that they may be efficiently mined. And I’m not sure that Moneyball is the same thing. The Moneyball move involves the specific study of statistics (a library of texts) in order to predict future performance. The literary mining of big data is often a return to practices already in circulation, practices that can be simplistically reduced to “what happened back then?”

I prefer to look ahead. Examining data in order to better understand how an audience acts, an idea can be invented, a message will be delivered…..this seems to me the essence of locating and using big data. Part of that examination is understanding how individuals might act in terrorist situations. Part of that examination might involve persuasion (arguing for positions within the university, for instance, as I see my colleagues often do without data of any sort). Part of the examination might be getting students to take courses in a department that is not yet a department (our own dilemma in WRD) . Part of that examination might involve activity tracing (as assessment studies claims for itself, though typically with a pre-determined goal to declare relevance).

There are tales of big data. Moneyball is one such tale. In the film Moneyball, Peter Brand says:

It’s about getting things down to one number. Using the stats the way we read them, we’ll find value in players that no one else can see. People are overlooked for a variety of biased reasons and perceived flaws. Age, appearance, personality. Bill James and mathematics cut straight through that. Billy, of the 20,000 notable players for us to consider, I believe that there is a championship team of twenty-five people that we can afford, because everyone else in baseball undervalues them.

Value, of course, is a shifting concept. But think about this quote for a second. “We’ll find value in players that no one else can see.” What I want out of big data is what I and everyone else can’t yet see. I know that a committee of the most predictable computer people on campus will yield a very predictable response on MOOCs. I know it because the committee itself does not reflect big data (its value is based on a commonplace – circulated role in the university as the familiar, “ASK THAT PERSON”). I know that the search through a novel will produce endless issues of “what we thought was x is really y” because this is a commonplace among literary approaches to interpretative works. Big data, exemplified by Moneyball, is not about re-finding the commonplace. But we are good at mining commonplaces. Our politics, our university policies, our approaches to scholarship – so often we are back with a commonplace. And in this mining, we produce not what no one else can see, but what we already saw. Face to face education is superior. Of course. Or as NPR told me this morning while I was driving to yoga, Detroit is in ruins. Of course.  Are there tales of big data? Or is there just one tale, a commonplace mined again and again?






No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.