Technology

Model Predicts The Size And Shape Of Online Comment Threads

The Comment Thread Prediction Model (CTPM) uses Hawkes process to predict how successful a post will be on social media.

If you want to get involved in a constructive debate, as well as find the most creative and interesting people you might never have found otherwise, head over to sites like Hacker News, Twitter or Reddit. Through the comment threats, these social platforms allow you to voice your opinions and engage in a conversation on wide array of topics with different online users.

In computer science, this system of communication (the comment threat) is often structured as “tree” with nodes representing the post and its subsequent comments, and directed edges representing “reply-to” relationships.

Now a duo of researchers from the University of Notre Dame have created a model to predict post’s popularity on social media sites by determining the size and shape of comment threats when viewing them as trees. The model known as the Comment Thread Prediction Model (CTPM) is being put forth on the paper pre-published on arXiv.

“Our main research goal is to predict the size and shape of a comment thread on social media sites,” Tim Weninger, author of the study, told TechXplore in a statement. “These sites allow users to post news or images or other content. Then other users like, share or comment on the post. We are interested mostly in comment threads, where a user can comment on the post itself or reply to comments like on Reddit and Twitter (but not Facebook or YouTube).”

With financial backing from the US Defense Advanced Research Project Agency (DARPA) program, Weninger and his colleague Rachel Krohn centered their work on social simulation.

Earlier studies suggest that a post’s trajectory is determined by how users react during the first few hours of posting. The posts that draw a lot of attention early and are promptly commented on generally spur further engagement in the future, while the ones that receive less attention get little to no attention in the future.

In fact, most existing tools intended to predict the size and shape of comment threads work by calculating the first several comments added to a post and then preparing a predictive model based on the nature of engagement. But, as most comment threads tend to be relatively small, waiting for new data to be generated can corrupt the model’s accuracy in prediction.

Hawkes-branching-process-used-in-predicting-posts-popularity-in-social-media-sites-like-reddit-twitter
Hawkes branching process: the red node represents a social media post, while green and blue nodes represent ‘immigrant’ and ‘offspring’ events respectively. [Image: Krohn & Weninger via – TechXplore]

The Comment Thread Prediction Model (CTPM), however, analyzes the words in a Reddit post’s title, the posting user and the subreddit to which it was submitted. Using these variables to create Hawkes process – a statistical model used to represent mathematical points in space, and running it on thousands of real user discussions on Reddit, the CTPM model did a better job in predicting the size and shape of comment threads than any models in existence.

“To me the most meaningful contribution of this work is the ability of our model to predict the size and shape of online conversations,” explained Weninger. “This is important to US law enforcement and defense agencies because being able to predict the future in cyberspace enables these agencies to prepare effective defenses against cyber-attacks and other events which frequently move from the cyber world to the physical world.”

Going forward, the team says the CTPM could be used to predict how successful posts on Twitter or Reddit would be – based solely on the title. They’re also planning on implementing some other interesting features to investigate how humans consume and curate information online, including their interactions with others’ posts through likes, shares and retweets.

“The likes, shares, upvotes, and retweets provided by users are the single most important thing to social media companies because they indicate which content to promote and which content might be spam or low quality,” Weninger said. “We study these processes and how they can be corrupted by individuals or groups with bad intentions. Our future work in this area will look at manipulations of social content (e.g. image alterations, photoshops, deepfakes, etc.), as we can learn a lot about people and their culture by watching how they alter images in social media.”

Although the CTPM outperforms the others in the test, its predictions have been accurate to just nine varied subreddits, particularly for new posts. So it isn’t perfect.  I hope they also figure a way to determine the Circlejerk factor (on Reddit), where posts shared by clique-ish members get upvoted, while the ones submitted by ciphers get downvoted to oblivion.

Reference: Modelling structure and predicting dynamics of discussion threads in online boards (Journals of Complex Networks)

One comment

What Do You Think?