Online feedback left by software users can help software teams improve their products. Feedback useful for product improvements exists online in app store reviews, tweets, and many other sources. Understanding the topics mentioned in this feedback is vitally important to maintaining a responsive, attractive software product so development teams know where to focus their product improvements. One way in which topics can automatically be detected is by performing text clustering, a common natural language processing technique that can group similar feedback together.
However, the groups that clustering techniques identify can often be large. To understand the common topic across all feedback in one of these groups, some summarisation techniques are needed. While multiple ways have been used to summarise or characterise these groups of feedback, there is little consensus on which way is best. Should the summary contain a set of common words across the feedback, how many word? Or would a short phrase summarising the feedback be better? For example, should a group of app reviews all complaining about a problem on the “sign-in” page be summarised by displaying five common words such as “login”, “error”, “password”, “authentication”, “connectivity” or a phrase like “Won’t let me log in”?
HASEL members, Peter Devine, James Tizard, Sunny Wang, and Dr Kelly Blincoe, together with Dr Yun Sing Koh from the School of Computer Science at The University of Auckland, recently studied several ways to summarise this feedback to make it most easily understood.
We looked at user feedback on a variety of apps from five sources, Google Play Store reviews, Apple App Store reviews, Tweets, Reddit posts, and product forum posts. We generated clusters of feedback from this text. We created a bunch of different summaries for these clusters (single words, two concurrent words, three concurrent words, and sentences). We then examined which of these were best in helping people choose the correct summary for a given group of feedback and which best helped them understand the software product improvement content of the cluster.
We found that sentences were best at conveying the complex requirements that users have for a piece of software compared to numerous shorter summaries using only common keywords. This opens up interesting future work into the possibility of characterising clusters using abstractive summarisation models such as PEGASUS, which can take information from multiple pieces of feedback and then potentially generate a short, descriptive summary
Want to learn more, including all of the details of how we did this? Check out the full paper at https://kblincoe.github.io/publications/2022_RE_Clusters.pdf
Are you working on a software team and want to try clustering the feedback for your product? Please see our repository here with code to help: https://zenodo.org/record/6585857