- What does a docial media sentiment score "mean"?
Sentiment can be measured using advanced analysis techniques and provides insight into public opinion on the issues that are most important to energy firms. Both wide scale and group specific measures of sentiment can be calculated. This includes by demographic categories and by topic differentiators.
- How do you deal with "bots" on Twitter?
Automated accounts, typically known as "bots", are actively combated by Twitter, and evidence suggests that they currently account for a very small proportion of total volume in Twitter. Furthermore, the most common type of content in bot tweets are URLs and hash tagged words - neither of these two language types are sentiment scored as part of the methodology.
- What about the effect of sarcasm in tweets?
We cannot detect sarcasm, and so these tweets would be rated as positive. However, we find that sarcasm is used in a minority of tweets. More importantly, the use of sarcasm is presumably constant over periods of time measured in years (rather than decades). Therefore, while the use of sarcasm may bias the base level of the sentiment score slightly upward, it is unlikely to materially affect variation in the index (i.e. the degree of change in sentiment on a particular day or hour is unlikely to be affected by sarcasm).
- How do you determine gender of a Twitter user?
We assign a gender to a Twitter user by examining their self-created username, and comparing against a large list of male and female first names. Twitter usernames contain a male first name are assigned to the male category, and similarly for the female category. First names that have poor predictive capacity because of their frequent use by both genders are not used.
- How do you determine/segregate tweets by location?
About 25% of all twitter users self-report location in their user profile, and we use this information to geographically sort tweets. Users who do not report a geographic location that corresponds to a real point on the globe are not included in location-specific index values, such as the USSI. Also, note that a much smaller number of twitter users (about 1%) enable geolocation tracking, which allows their tweets to be stamped with latitude and longitude coordinates they are at when the tweet is published. Geo-tagged data are available separately.
- Can sentiment be tracked by city/state/region?
Yes. The sentiment can be calculated for desired sub-national regions as needed.
- What about re-tweets?
Messages created using the Twitter "retweet" function, as well as messages that have been copied by other means, are treated the same as all other tweets, although we do have the ability to identify and filter them if desired. A number of recent research studies suggest that retweets account for 3-5% of total tweet volume. It isn't clear to what extent the sentiment of a user who is retweeting matches the sentiment of the original author, as reflected in the tweet.
- What is the relationship between the sentiment measured by Twitter and the sentiment of the general population? Aren't Twitter demographics skewed?
A recent Pew study indicates that a sixth of US adult internet users also use twitter, representing tens of millions of Americans. The Pew study indicates that the twitter user base skews toward urban residents and toward persons in the 18-29 and 30-49 age brackets (persons younger than 18 were not surveyed), but has a relatively even distribution across gender, education, and income levels. Nevertheless, recent research indicates that the measured sentiment of twitter users shows a high degree of correlation with measures generated by traditional representative surveys.
- What proportion of all tweets is being scored?
IHS Markit scores all tweets that it intakes via an official, Twitter-provided "garden hose" that provides a random sample of 10% of all tweets. Since world average volume is about 500 million tweets/day, IHS Markit is processing approximately 50 million tweets per day. Tweets with a reported location in the United States are approximately 4 million tweets a day of this subset, and are scored to create the sentiment measures.
- How is your sentiment evaluation method different from that of competitors in the marketplace?
Most other methods apply existing word lists that have sentiment values attached to them, looking for these words within tweets, and scoring the tweets accordingly. However, the effectiveness of these approaches is limited by many particular characteristics of Twitter messages, such as their short length (limited to 140 characters), use of abbreviations, use of neologisms and acronyms (e.g. OMG, LOL) and Twitter-specific syntax (e.g. hashtags like #fail).
Our evaluations of the accuracy of this type of sentiment inference indicate that these methods suffer from both relatively low accuracy, when compared to evaluations of sentiment by human assessors, and that the fraction of messages that can actually be evaluated by these methods is low.
IHS Markit has developed a proprietary method that measures sentiment associated with words, based on the frequency with which they occur in messages with four different types of emoticons that convey sentiment. This method produces a scoring word list that, relative to other methods, is automated from the Twitter data alone (no human ratings required), substantially larger than existing lists (allowing more words in a tweet to be scored, and more tweets themselves to be scored), a word list that is specific to the Twitter lexicon (includes abbreviations, neologisms, etc.), and can easily be applied in a consistent manner to other languages. Evaluation of the effectiveness of our method shows that it very closely approaches the degree of agreement that human raters have on the sentiment of a particular tweet.
- Is the USSI adjusted to account for changes in Sentiment caused by broad environmental factors like time of day, day of the week, or seasons of the year?
No. Because actual Sentiment is affected by times and seasons, the USSI is un-modified. However, adjustments can be made to account for these regular patterns, if it is desired to focus on other factors affecting Sentiment.