ࡱ > bjbj
4 h h
R R
8 d [ $ 2 l * ;2 =2 =2 =2 =2 =2 =2 $ (4 6 f a2 a2 v2 G0 ;2 s- '/ @S $ . 30 2 0 2 . 07 07 ( '/ '/ , 07 S/ a2 a2 2 07 R [ : The promises and perils of large-scale data extraction
Dmitri Williams
University of Southern California
The Virtual World Exploratorium Project was the first to gain access to the large databases controlled by game developers. However, actually using those data came with unforeseen risks and rewards. This paper discusses the lessons learned for future large-scale data projects of virtual worlds. It covers the process of hosting, formatting, and ultimately using the data sets,. These various projects included longitudinal analyses, cross-sectional designs, collapsed time-series, and the coordination of behavioral and attitudinal data, along with the need to understand the context of the data from a more anthropological point of view. The intent of the chapter is to offer the reader a sense of the challenges and potential for using such large datasets.
The promises and perils of large-scale data extraction
In most respects, this chapter is an illustration of the aphorism Be careful what you ask you because you might get it. I have spent the last three years setting up and coordinating a team of social scientists who are all interested in exploiting data from virtual worlds. We believe that these spaces are important to study for two reasons. First, they are simply large and popular, with at least 47 million subscriptions in the West ADDIN EN.CITE White200813791379137927P. WhiteMMOGData: Charts2008Gloucester, United Kingdomhttp://mmogdata.voig.com/July 29, 2008(White, 2008), and perhaps twice that many in Asia. Secondly, they present spaces to test human behaviors on scales usually unavailable to social scientists. These behaviors quickly move beyond why someone would want to slay a dragon (although many of us find this important as well). For example, virtual spaces represent real economies and they might be able to serve as high-quality simulators for those interested in what if scenarios. And as I write this in the midst of the greatest financial crisis the US has seen since the Great Depression, it strikes me that some insight into such what if scenarios of public policy (Should we bail out banks? Should be protect large companies from failing? etc.) would be highly valuable. Wouldnt it be useful to see what a real population of many millions of people would do if the system changed in X way? So, it is perhaps unsurprising that the scientists interested in using virtual world data run the gamut from anthropologists ADDIN EN.CITE Nardi200696896896810Bonnie NardiJustin HarridStrangers and friends: Collaborative play in World of WarcraftConference on Computer-supported Cooperative Work2006NovemberBanff, Canada(Nardi & Harrid, 2006) to economists ADDIN EN.CITE Castronova200614841484148417E. CastronovaOn the research value of large games: Natural experiments in Norrath and CamelotGames and CultureGames and Culture163-18612006(Castronova, 2006) to educators ADDIN EN.CITE Steinkeuhler200696096096017Constance SteinkeuhlerMassively multiplayer online videogaming as participation in a DiscourseMind, Culture, & Activity38-521312006(Steinkeuhler, 2006) to my own group, communication scientists.
Were all interested in the human experience and in testing theories using data from these worlds, but the chief problem has always been access (Kafais work [this volume] being a notable exception). Nearly all of the work in this space has been conducted without the cooperation of virtual world operators, or through the intensive effort of creating a world from scratch (see Barab [this volume]). For most quantitative methodologies this lack of cooperation is a big obstacle because it is difficult to gain systematic access to the users of these spaces. Without a master list, it is difficult to sample intelligently from the player population. Without players patterns of use and access, it is difficult to know whom to reach, or even how to reach them. This makes survey and experimental work very challenging, and all of us in the space have had to settle for self-selected or snowball samples ADDIN EN.CITE ADDIN EN.CITE.DATA (Griffiths, Davies, & Chappell, 2003; Seay, Jerome, Lee, & Kraut, 2004; Yee, 2006). Probably the most systematic work to date used social networking tools to construct networks for sampling ADDIN EN.CITE Williams200698098098017D. WilliamsNicholas DucheneautLi XiongYuanyuan ZhangNick YeeEric NickellFrom tree house to barracks: The social life of guilds in World of WarcraftGames & Culture338-361142006Ducheneaut200694194194117N. DucheneautNicholas YeeEric NickellRobert MooreBuilding a MMO with mass appeal: A look at gameplay in World of WarcraftGames and CultureGames and Culture281-317142006(Ducheneaut, Yee, Nickell, & Moore, 2006; Williams et al., 2006), but even this approach was unable to use a master census-like list.
For experimentalists like myself, this has meant resorting to recruiting subjects via referral or recruiting through forum pages and in-world settings (yes, I have paid a research assistant to walk around virtual cities looking for subjects) ADDIN EN.CITE ADDIN EN.CITE.DATA (Williams, 2006; Williams, Caplan, & Xiong, 2007; Williams & Xiong, in press). Whats worse, weve all had to settle for self-reports of actual behaviors. To take the most obvious case, instead of knowing how much people play, we have to ask them how much they play. And of course they are answering incorrectly for a wide range of reasons ranging from ego protection to simple recall difficulty ADDIN EN.CITE Cook19794254254256Thomas D. CookDonald T. CampbellQuasi-experimentation: Design & analysis issues for field settings1979BostonHoughton Mifflin Company(Cook & Campbell, 1979). This has always been frustrating because we know that the actual answers not only exist, but that they are complete and accurate. The companies that run these virtual worlds typically store all or some time range of the actions carried out within the space. If these could be accessed, they would be nearly perfect unobtrusive data ADDIN EN.CITE Webb19661024102410246Eugene WebbDonald CampbellRichard SchwartzLee SechrestUnobtrusive measures: Non-reactive research in the social sciences1966ChicagoRand McNally and Company(Webb, Campbell, Schwartz, & Sechrest, 1966). This chapter is a post-mortem of a team that gained access to this data. In getting everything we asked for, we were confronted with several unforeseen challenges ranging from where to put the data to how we should use it to how we should manage a virtual team ourselves. The hope is that by sharing what went right and what didnt, future work in this areawork which we think will be both important and inevitablewill be easier.
Setting up the study
Thanks to the leadership of key players at Sony Online Entertainment, we were able to gain access to data from their MMO EverQuest II (EQ2). This began with a large (n = 7,000), original survey of the player base done in conjunction with Sony ADDIN EN.CITE Williams200814241424142417Williams, D.Yee, N.Caplan, S.Who plays, how much, and why? A behavioral player census of a virtual worldJournal of Computer Mediated CommunicationJournal of Computer Mediated Communication993-10181342008http://www3.interscience.wiley.com/cgi-bin/fulltext/121394419/HTMLSTART(Williams, Yee, & Caplan, 2008). The key advantage to this survey was that it was conducted within the game world, and with the approval of the operator. In prior survey work, I have encountered not only difficulty in reaching players, but significant skepticism among them over the veracity of the study. Many assume that any study is a hoax, and others are hostile towards the academic establishment because of the continued focus on antisocial effects ADDIN EN.CITE Williamsin press1333133313335D. WilliamsL. XiongE. HargittaiHerding cats onlineFrom the Trenchesin pressAnn ArborUniversity of Michigan Press(Williams & Xiong, in press). With the study blessed and run by Sony, these problems vanished. Better still, Sony was able to create a virtual item to use as an incentive for the survey. This yielded a response rate about two to three times as great as cash incentives have done in similar work.
With demographic and psychological profiles in hand, we moved on to the data collected by the game servers themselves. We did not know what size or shape this data would take, or precisely what we would do with it. All we knew was that we probably wouldnt be able to deal with all of it at once, so we had to make decisions about cutting down the total amount of incoming data. This inevitably meant what servers to use. In EQ2, as in most virtual worlds (Second Life and Eve being notable exceptions), there are multiple copies of the virtual world in operation. Each is typically called a server, although in reality it is often run by multiple pieces of hardware. This practice enables the right ratio of population to virtual space so that cities and countrysides dont get too crowded (or too empty) as the overall number of players fluctuates. Game operators add or subtract servers, offering transfers and mergers to manage the populations. It also allows for different versions of the game to exist since the managers have learned that different players gravitate to slightly different rule sets. For most MMOs, these variations include PvE servers where players battle with the environment, PvP servers where they may also battle with other players, and Role Play servers, where they are encouraged to perform in character. Additionally, EQ2 offers Exchange servers, which allow for virtual goods to be bought and sold with real money. So, with four server types possible, we asked for data from one of each.
When it arrived (compressed on external disk drives), we were surprised by many things. First, it was far, far bigger than we had initially suspected. Without divulging trade secrets and violating a non-disclosure agreement, it is safe to say that these spaces generate terabytes of data per server per year of operation. So, it instantly becomes clear to anyone looking at these data that this is not going to be an operation that can take place on standard PCs, most of which can only store half a terabyte at most. We quickly realized that we could not only not handle the volume, but that accessing and using the data would be beyond our expertise as well. So, it was time for computer scientists to get involved.
Working with computer scientists and CS PhD students at the National Center for Supercomputing Applications (NCSA) and the University of Minnesota, we discovered that simply hosting and accessing the data was going to be an immense database challenge. Speaking for most social scientists, I had no expertise (and didnt want any) in large-scale database design. Nevertheless, gaining working knowledge of the basics of databases is a necessary step for dealing with large-scale data like these. Without a grasp of at least the basics, it would be impossible to identify and buy the right hardware, or to know what sort of personnel would be needed to manage the data. Several questions quickly emerged from the initial conversations: What kind of machine do you want? What do you want it to be able to do? How much data will you need access to at once? What format are the data in? And of course, the big one: who is going to pay for all of this?
Most quantitative social scientists, if they are like me, are familiar with basic statistics packages like SPSS, SAS and STATA. We are comfortable with tackling datasets of 20,000 subjects or more from surveys, and accustomed to running complex models through these packages on desktop computers. Advances in computing power have enabled us to run regressions, ANOVAs, time series and the like in seconds or minutes on these vast populations. Unfortunately, these things do not scale well. As of the time of this writing, there simply is no supercomputing version of SPSS ready to tackle datasets 10, 50, 100 or 1,000 times the typical size. It is one thing to ask SPSS to calculate the mean and standard deviation on a dataset with 20,000 rows. It is quite another when that dataset has 20 billion rows. It is another thing further to try a regression model on those 20 billion rows with, say just four variables (80 billion values at once). At some future date, these tasks will no doubt be possible on a desktop or a mobile device, but no time soon. For now, operations like these require immense processors, immense RAM, and of course, immense storage. Therefore, there is a quick reality check: do you spend many hundreds of thousands or millions of dollars on a system that can tackle these issues quickly, or do you give up on some of the possible forms of analysis right away? In other words, performance becomes a crucial, and very expensive variable in planning. It is possible to get a bare-bones infrastructure to tackle these data, but simple queries like a mean or a regression might take weeks to run. Throwing many millions of dollars at it would enable these same operations to run in seconds. We ultimately chose a middle path in which we spent over $70,000 on infrastructure and allowed operations to last for several days at the outside, with the most basic ones taking a few minutes.
Yet another issue is how complex the operations will be. We used an Oracle database design, which allows for means and simple regressions, but not, say, hierarchical linear models. One of the first questions we had to answer was Do you want the machine to run these queries, or do you just want it to retrieve smaller versions for you to run on your desktops? We opted for a limited form of the former to take advantage of the large processors wed purchased, but also knew that we would pull down segments of data for local processing. One nice option with a system like this is that it will allow for random sampling. A multi-billion row table can be randomly sampled to get a mere 50,000 representative entries that a desktop machine can hack. Storage was immense and also expensive. In order to enable speedy searching and processing, the base data needs to be indexed and organized by the system, and to have a lot of extra space in which to handle any calculations. Indexing takes about the same space as the source data, and another equal amount is needed for the calculations. The bottom line is that storage has to exist at a three-to-one ratio. In other words, for every terabyte of data, the system actually needs three terabytes of space.
Still another unforeseen challenge was data formatting. Most social science datasets are neatly organized affairs created by online survey tools or professional survey organizations. When a political communication expert taps a dataset from the ICSPR archives, it arrives in a perfect matrix organized and separated by tabs or characters. These are read into neat rows and columns with variable names at the top, typically linked to a codebook. To give you a sense of the data generated by this MMO, here is a single entry on a row in a table called experience:
2006-02-17 00:00:00 zone.exp01_rgn_pillars_of_flame_epic01_cazel account=xxxxxxxxx, amount=109, char_id=xxxxxxxx, character=xxxxxxxxxxx, pc_class=conjuror, pc_effective_level=57, pc_group_level=58, pc_group_size=6, pc_level=57, pc_trade_class=scholar, pc_trade_level=15, reason=combat -- killed npc [a crag tarantula/L61/T8], type=exp given, zone=exp01_rgn_pillars_of_flame_epic01_cazel
This couldnt be farther from the norm in social science, and its not readable by any statistics package in existence. How does that fit into a matrix set up to run regressions? We quickly learned that it didnt. In fact, this raw form of data not only didnt work in a matrix, it needed some translation before it could even go into a database. Whats worse, the entries from row to row and table to table were in different formats. Each entry showed something a character didas often as every second over yearsbut for different kinds of actions.
The solution to making this messy data source usable required two key ingredients. First, it took domain-specific knowledge. In other words, making sense of this entry took the equivalent of a native guide through a strange and foreign land. For example, the entry above says that a player, while acting as part of a six-person group of very high-level characters adventuring in an area called the Pillars of Flame, killed a big and very difficult spider and gained some experience points. We were fortunate to have the assistance of EQ2s Senior Producer at the time (Scott Hartsman), who helped us write a codebook for the various tables and entry types. With over 500 different variable types, this was a laborious task. Secondly, to translate the data into a readable format, we needed the assistance of a professional database manager. This is not a task to be delegated to an RA. This is hard-core programming, and it is expensive and time-consuming.
While the preceding paragraphs might seem technical, they are a very brief summary of a process that took over a year and a half to complete. When all was said and done, our data sat on servers that our students could access remotely, and could perform simple queries on. It may be obvious, but if this effort had been part of the virtual world operators plan from the start, this costly translation step would have been unnecessary. So, a real advancement on the current method would be if the researchers could help with the database design before the game launches. This may sound like only a benefit for the researchers, but a more systematic plan would also help the game operators themselves with their own analysis and retrieval.
The total cost for this set-up effort reached over six figures, a couple of times overwell before hiring any RAs to tackle actual analysis. We were fortunate enough to gain the support of the National Science Foundation and the Army Research Institute, both of which saw the potential for learning about human behaviors using this unique data. The eventual price tag for the entire process, including the team described below and lasting for three years was roughly $1.5 million.
Now what do we do with it?
So, the good news was that there was a lot of data ready for analysis. If there was bad news, its that it was difficult to know where to start or how to make sense of the data for testing theories, and that some of the analyses would require CS students just to run in the first place. Lets start with the kinds of data that were possible, and should be possible in any virtual space. These fall into three categories. First is the survey data. These are of course cross-sectional, and do not represent the entire player base. They may be a representative sample, but they are constrained to one portion of the players and at one point in time. Next comes the longitudinal data, which are the kind given in the example above. These show, in second-by second resolution, every action, transaction and interaction that takes place within the world: questing, killing, dying, chatting, buying, selling, etc. In other words, they cover everything that happens that has any impact on the virtual world, no matter how small, but more than nothing, i.e. a player moving from A to B isnt recorded, but a player selling a sword at B is. Last is a cumulative data source listing all accounts and all of the total accomplishments by character. This gives the total number of things any player had done in a category, plus some basic description of their profile.
A quick examination of these three data types will demonstrate that they allow for several kinds of analysis. To those of us schooled in regression approaches for survey data, the thinking tends to skew towards collapsing time into means. This approach misses the potential for time-series analysis and survival analysis. These latter approaches are often better tailored to data where you can create any kind of sequential batches of databe they by the second or sequences of collapsed minutes, days, weeks or months. Many of the most powerful techniques are derived from marketing research which focuses on how to retain and attract customers, i.e. what makes some people stay and others leave? This is particularly germane to the business models of the game operators, who only make money when the players stay and keep paying a monthly subscription fee.
To make sense of these options, the team had to first decide what unit of analysis was of interest, and what role time should play. For every test, there has to be some guiding theory, and some fit with the data. In many cases, we were interested in testing theories for which there was no obvious measurement, and we were wary of trying to force the data to fit our ideas ADDIN EN.CITE Kuhn196169669669617T. KuhnThe Function of Measurement in Modern Physical ScienceIsis161-190521961(Kuhn, 1961). In those cases, we either found appropriate proxies (e.g. the more social people may be the ones who send more chat messages per hour) or abandoned the tests. With so many variables, it would be easy to think that any theory could be tested easily, but it is important to adhere to good, rigorous methods.
We started our tests with an economic question because it required the simplest possible form: collapsing a lot of time and looking at a whole population, rather than individuals. We felt that a kind of proof of concept test would be a wise place to start. Our question was whether or not we could derive the kind of statistics that economists use in the real worldGDP, inflation, and a price index. We did this at one-month intervals, by simply asking the system to sum and average the sales values of all of the transactions that fit the theory. This gave us the total amount of virtual money flowing into and out of a server, and thus showed a surplus or deficit of cash flowing. That in turn allowed us to predict prices when combined with population levels. This simple test took place with four months of data and for two servers, yet it took nearly a month to organize, model and run the data queries. At the end, we were able to perfectly track sales, the monetary supply, and how much contribution the average player made to the overall economy. And although out population was in the tens of thousands rather than the millions, we did not have to resort to the sampling schemes that the US Government does, i.e. the data were complete and accurate, not estimated.
Tests of individual and group-based behaviors are much more complex. Lets say the research question involves the effectiveness of different kinds of groupsperhaps based on their size or composition. What would the level of analysis be? Would it be the individual player or the group size? In other words, would the entries into the model be all of the players who played in a given time period? If so, would each task outcome be an entry, or would we collapse all of the players outcomes into one value? Or would we collapse everything into group sizes, e.g. groups of one, two, three, etc.? Each of these questions offers very different theoretical tests and very different mathematical models.
One fruitful approach stemmed from the social networks experts on our team. Rather than looking at what people did, they were first interested in with whom they did these things. Basic tests of network homophily ADDIN EN.CITE Monge20037737737736Peter S. MongeNoshir S. ContractorTheories of communication networks2003OxfordOxford University Press(Monge & Contractor, 2003) suggest that players of similar backgrounds and even offline proximity would group together more often. The networks were constructed with three kinds of behavioral datathose who grouped together to go out and kill monsters, those who traded together, and those who sent each other messages. Each of these networks offered ways of finding patterns of frequency and centrality to test the homophily hypotheses. A second step would be to check on the outcomes of these groups.
As with the effectiveness question posed above, the dependent measures were rarely clear-cut. What exactly is the right form of operationalization of effectiveness in a virtual world? Is it simply the raw amount of experience or stuff acquired? Or, since that privileges those who play more, is it more appropriate to use a rate? And how do we equate tasks which might be of different levels of difficulty? Each of these questions again offers different assumptions and tests. Ultimately, we defined things like success based on data patterns from the game itself. Things that were hard were the things that generated the most experience points. The computer scientists call this kind of measurement recursive since it is dependent on the actions and outcomes of others. Player As effectiveness rating depends on what player B and C did, while player Bs depends on A and C, etc., etc. over many tens of thousands of cases. If a player continually attacked low-level challenges, his success rating would be adjusted as well.
The computer science contribution to these issues is immense. As noted above, the computational challenges for the hardware are large, and making the best use of them while still testing the right data is difficult. The CS team managed the data loads with two tricks. The first is developing computational algorithms that are one-pass, that is, they only need to go through the data one time to make a calculation. The second is developing repeated series queries, which are essentially breaking up a large task into smaller chunks and then handling them one at a time. Note that these are issues that social scientists do not have to confront when using packages like SPSSthat programs algorithms dont have to be as efficient because the data loads are so much smaller.
The final challenge with this analysis is the very real danger of working in a contextual vacuum. I have sped through several quantitative and technical challenges, but a team that focused solely on these would inevitably make serious interpretive errors. These systems, after all, are not based on physics and mechanical gears. The moving pieces here are individuals, communities, and arguably entire societies. This means that they will come with the complexity, messiness and culture that any group of humans will have. To make things more difficult, they will also occur within a virtual space that has its own values and affordances built in. So, not only are there cultures and communities to deal with, there are the intended and unintended consequences of computer code to shape behaviors. This is Lessigs code is law hypothesis ADDIN EN.CITE Lessig19992652652656Lawrence LessigCode and other laws of cyberspace1999New YorkBasic Books(Lessig, 1999), and it lies at the heart of any analysis I undertake. I recognize that the system itself can impact the outcomes, and that one virtual system might also be unlike the next. To understand the impacts that code and culture can have, I insist that any working team have first-hand experience within the virtual world. My rule is simple: If you havent played it, you cant study it. It is as obvious to me as a film scholar needing to see the movie first, only much more complicated. Scholars skipping this crucial participant observation step will ask the wrong questions in surveys, miss obvious explanations, and make incorrect assumptions about outcomes. It might sound unreasonable to ask computer science students or experimentalists to play a computer game in order to do their job properly, but it is non-negotiable. And, to be blunt, if I can learn the difference between SQL and Oracle, these people can play a game. These are interdisciplinary spaces and they will take interdisciplinary and multimethodological approaches to study. As with any large team, if there is an expertise gap, it must be filled. This might suggest hiring an outside expert (I hired an ethnographer for one phase of the work), but each member should have at least a working knowledge of the different approaches.
There is one final note on the management of the team. Ironically for a group studying virtual behaviors, we were a virtual team ourselves. Because we were studying subjects of such a broad range, we wanted to tap subject-matter expertise from a wide pool. This lead to a team with four PIs at four universities, nearly 20 PhD students, and outside collaborators at three more universities. This sprawling conglomeration of interests and expertise meant that we ourselves would need to conduct our research through virtual networks. We accomplished this by enabling remote access to the data as noted, but by also scheduling regular meetings based on research topics rather than on location. Groups studying, for example, trust in teams would be comprised of a PI or two and a mix of graduate students from a subset of schools, and would meet once a week via teleconference. For continuity and to share results and get feedback, the larger total group would teleconference every other week. Additionally we met once a year in person, but the expenses involved kept our teams physically separate for the most part. Our groupware also included liberal use of Googles shared documents functions (e.g. to share a codebook or SQL query language) and Internet-based desktop sharing programs like Adobes Connect Pro.
Conclusion
The Virtual World Exploratorium Project has been (and continues to be), a large and challenging team effort, but one with the potential to study human behaviors on unusually large scales and with unusually good data. The promise of the project is that it establishes these methods as workable for future teams. And, as processing power increases, such efforts should be easier. But regardless of the amount of RAM required or the database design, there are two obvious takeaway lessons that will apply to future work. First, it is expensive to do work on very large populations with a lot of data. For many researchers and their theory tests, the approach might be overkill, and their questions may be better answered with smaller-scale efforts. Second, a project like this is inevitably interdisciplinary and multi-methodological. We have benefitted immensely from the combined teamwork of network scientists, computer scientists, social psychologists and anthropologists. The benefits come at some cost, of course, given that these disparate groups speak different languages and report back to different masters. As a team, weve found these costs worth paying, but a large-scale effort like this shouldnt be entered into lightly. Still, as virtual worlds continue to grow and become a larger part of mainstream life, we expect efforts like this one to become more common as well.
References
ADDIN EN.REFLIST Castronova, E. (2006). On the research value of large games: Natural experiments in Norrath and Camelot. Games and Culture, 1, 163-186.
Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation: Design & analysis issues for field settings. Boston: Houghton Mifflin Company.
Ducheneaut, N., Yee, N., Nickell, E., & Moore, R. (2006). Building a MMO with mass appeal: A look at gameplay in World of Warcraft. Games and Culture, 1(4), 281-317.
Griffiths, M., Davies, M. N., & Chappell, D. (2003). Breaking the stereotype: The case of online gaming. CyberPsychology & Behavior, 6(1), 81-91.
Kuhn, T. (1961). The Function of Measurement in Modern Physical Science. Isis, 52, 161-190.
Lessig, L. (1999). Code and other laws of cyberspace. New York: Basic Books.
Monge, P. S., & Contractor, N. S. (2003). Theories of communication networks. Oxford: Oxford University Press.
Nardi, B., & Harrid, J. (2006, November). Strangers and friends: Collaborative play in World of Warcraft. Paper presented at the Conference on Computer-supported Cooperative Work, Banff, Canada.
Seay, A. F., Jerome, W. J., Lee, K. S., & Kraut, R. E. (2004, April 24-29). Project Massive: A study of online gaming communities. Paper presented at the CHI 2004, Vienna, Austria.
Steinkeuhler, C. (2006). Massively multiplayer online videogaming as participation in a Discourse. Mind, Culture, & Activity, 13(1), 38-52.
Webb, E., Campbell, D., Schwartz, R., & Sechrest, L. (1966). Unobtrusive measures: Non-reactive research in the social sciences. Chicago: Rand McNally and Company.
White, P. (2008). MMOGData: Charts. Gloucester, United Kingdom.
Williams, D. (2006). Groups and goblins: The social and civic impact an online game. Journal of Broadcasting and Electronic Media, 50(4), 651-670.
Williams, D., Caplan, S., & Xiong, L. (2007). Can you hear me now? The social impact of voice on internet communities. Human Communication Research, 33(4), 427-449.
Williams, D., Ducheneaut, N., Xiong, L., Zhang, Y., Yee, N., & Nickell, E. (2006). From tree house to barracks: The social life of guilds in World of Warcraft. Games & Culture, 1(4), 338-361.
Williams, D., & Xiong, L. (in press). Herding cats online. In E. Hargittai (Ed.), From the Trenches. Ann Arbor: University of Michigan Press.
Williams, D., Yee, N., & Caplan, S. (2008). Who plays, how much, and why? A behavioral player census of a virtual world. Journal of Computer Mediated Communication, 13(4), 993-1018.
Yee, N. (2006). The demographics, motivations and derived experiences of users of massively-multiuser online graphical environments. PRESENCE: Teleoperators and Virtual Environments, 15, 309-329.
Among many others at SOE, our thanks go to Raph Koster, Scott Hartsman and Bruce Ferguson, the three executive producers we have worked with.
7 i k C E j k A ` a b c d P
Q
ߵߦ߅vcWcvcvv hZQ CJ OJ QJ aJ %j h4 h9 CJ OJ QJ UaJ h4 h9 CJ OJ QJ aJ "h4 h,: 5CJ OJ QJ \aJ h4 h,: CJ OJ QJ aJ h4 h"" CJ OJ QJ aJ hA CJ OJ QJ aJ h4 hRy CJ OJ QJ aJ h4 h>a CJ OJ QJ aJ h4 h7/ CJ OJ QJ aJ "h4 h7/ 5CJ OJ QJ \aJ 7 G i j k b d 6 1' :2 O2 <