Editor's note: Alan Mislove is an assistant professor at the College of Computer and Information Science at Northeastern University. His research is focused on online social networks.
(CNN) -- How are fads started and spread? Do certain influential people play a key role, or is it truly random? How does a trend go from new and exciting to old and passe so quickly? Does having happy friends have an effect on our own happiness?
Maybe Twitter can tell us.
Every second, millions of people across the world are sharing their thoughts in the form of 140-character messages using Twitter. The "tweets" range from the mundane to the profound, and convey, for example, what people are doing, thinking and reading at any moment.
The amount of information in any individual tweet is highly variable, but in aggregate the more than 65 million tweets composed per day represent a detailed, real-time trace of the collective thoughts and feelings of a significant fraction of the population -- potentially offering valuable information to everyone from politicians to advertisers to social researchers.
To demonstrate the unique power of Twitter data, our research group at Northeastern University and Harvard Medical School recently began a study to infer the mood of Twitter users in the United States from their public tweets.
We can observe very distinct patterns over the course of the day, as well as weekly patterns that match conventional wisdom, such as the tendency of users to write happier tweets on weekends.
We also observe geographic variations, with users from Hawaii, California, and Florida, for example, using happier words in their tweets.
With Twitter, we now not only know which users are communicating, but we also know what they are saying. From a research perspective, Twitter is more than just a new tool; it's an entirely new kind of tool. Never before have academic researchers had access to this much real-time public information about what people are thinking and saying.
It is analogous to being allowed to tap into millions of water-cooler conversations, school rooms and other public conversations across the globe.
Our study is preliminary; we need more data to do a proper evaluation, and the results are subject to any number of biases (people using language differently across the United States, and different demographics using Twitter at different times). Our approach simply looks, for example, for occurrences of "happy" or "unhappy" words in tweets. However, because we take words out of context, our approach will not correctly interpret tweets like "I am not happy".
Even so, initial results demonstrate that Twitter data contain a wealth of information, and that even relatively simplistic approaches such as ours can extract interesting results.
In fact, other research groups have also begun to examine Twitter data and have demonstrated that it can be used to predict the box-office success of an upcoming movie. And Twitter data yields much more detailed polling when compared to traditional methods, enabling real-time feedback for issues that are of local, national or international interest.
In the past, researchers studying traces of human communication, such as phone records, have shown that the social network that connects us has rich hidden complexity. But legal and privacy concerns have caused these previous studies to almost universally omit the content of the communication. Because most users leave their tweets public, Twitter represents an unprecedented opportunity.
This is why researchers owe a debt of gratitude to Twitter for its policy of open access to public tweets (exemplified by the recent donation of its entire public tweet history to the Library of Congress).
Like any scientific tool, the ability to use the data for research is subject to caveats and limitations. Unlike many existing tools, such as surveys and polls, researchers cannot ask a question of the Twitter users directly; instead, researchers must determine whether the question is one that the Twitter data can answer.
Should we determine how to extract information reliably, which we're working on, the potential applications of the data are almost endless. For example, monitoring the mood of the public chatter on Twitter could allow businesses to quickly identify and respond to incidents, mitigating the effect of negative publicity on their brand. The data could be used to inform public policy, allowing public officials and politicians to receive feedback from their constituents in real time.
From a scientific standpoint, Twitter data can shed light on how information spreads through society. Researchers can also investigate network effects: How does what our friends discuss influence what we discuss?
In short, the data that is now becoming available from Twitter and related websites offers a new lens through which we can view society. It promises new approaches to understanding social phenomena, what some colleagues have dubbed "computational social science."
This new kind of data presents new challenges, such as privacy, anonymity and legality, but developing a science around it has the potential to do enormous good.
The opinions expressed in this commentary are solely those of Alan Mislove.