Data Essays

Feedback Analysis and Summarization for a Computer Game.

Click here to view interactive dashboard

The popular online game ‘New World’ released in September of 2021, and in the game’s first month alone the main subreddit associated with the game saw a total of 28,480 posts and a whopping 386,767 comments.

While developers of large-scale software products may stuggle to keep abreast of such a voluminous amount of feedback, machine-learning techniques hold great potential for facilitating the work of assisting anyone interested in analyzing, summarizing and understanding user experiences.

Players of New World frequently visit the subreddit r/newworldgame to share their opinions on the state of the game and their hopes as to what features the developers may implement in the future. Player have discussed bugs, fishing, battling monsters, furnishing their in-game homes, and the game's taxation systems, among many other topics.

I’ll start my analysis with with basic questions; what are players talking about and how are they feeling about the different activities within the game? For analyzing user sentiment I chose to use Textblob, a popular Python NLP library. Textblob is easy to use and interpret and can quickly gauge sentiment for large numbers of comments. While not as accurate as a BERT-style tanformer-based model, Textblob will be good enough to see overall trends and will be able to process the 1.8 million comments collected much more quickly.

When tested against 64 postive comments and 64 negative comments taken from r/newworldgame, Textblob assigned positive comments with a score greater than zero 57/64 = 89% of the time, while negative comments received a score lower than zero 39/64 = 61% of the time.

When tested against a mixed set of 64 positive and 64 negative comments, textblob produced an overall score of .0533; when 32 of these negative comments were removed the score rose to .116.

Clearly Textblob will not do a particularly good job of diagnosing whether an individual comment is positive or negative. However, we can still use it to gauge the overall positivity of a larger set of comments in order to get an overall ‘temperature reading’ on how users are feeling at a given time or how they are feeling about a particular topic.

Reddit’s official API does not provide functionality for posts to be requested based on a historical timeframe (say, all posts made to a subreddit during January 2021.) Fortunately, Pushshift.io collects this data from Reddit and and has an API capable of fulfilling such requests. Once the post IDs have been collected, the comments can then be retrieved separately either from Pushshift’s API or from the official Reddit API.

New World’s fabled land of Aeternum enticed millions of players with stunning visuals and beautiful landscapes replete with rolling forests and mysterious ancient ruins. Unfortunately, they stepped into a world plauged with repetitive gameplay, bugs, and other issues on which I’ll get into more detail later. The above chart shows a sharp drop in player-count and subreddit activity shortly after launch.

While r/newworldgame represents just a part of the game’s playerbase, activity on the subreddit closely tracks with the game’s performance as a whole. When activity in the game increases, so does subreddit traffic – and when players in the game are upset and therefore quitting, sentiment on the subreddit will be likewise gloomy.

Historically, in-game playercount has had a remarkably strong 80% Pearson correlation with subreddit activity since launch.

Feedback Summary

The sheer scale of the feedback is almost overwhelming. Thankfully, we can leverage LLMs to quickly interpret large numbers of comments. Since each API query is limited to about 3 thousand words, we will need to feed in smaller ‘segments’ of feedback one at a time, and then recursively ask the model to summarize its own output.

To be continued...

We have already uncovered some interesting historical trends, but we need to know more. Just what did people get so upset about at launch? Were major game issues not discussed or discovered until after launch, or were they already known to some players during the game’s beta period? What topics are players concerned about right now? In the next part of this data essay, I will begin delving much deeper into the data in order to understand what topics players care about the most and what exactly they are saying.