Evaluating the Accuracy and Utility of Recommender Systems

Alan Said. 2013, "Evaluating the Accuracy and Utility of Recommender Systems". .


Recommender systems have become a ubiquitous feature on the World Wide Web. Today, most websites use some form of recommendation to heighten their users’ experience. Over the last decade, vast advancements in recommendation have been done, this has however not been matched in the processes involved in evaluating these systems. The evaluation methods and metrics currently used for this have originated in other related fields, e.g. information retrieval, statistics, etc. For most cases, these evaluation processes are able to show how well a recommender system performs – to some point. However, after a certain threshold, it is not often clear whether a lower error, or higher accuracy metric accounts for an actual quality improvement. This dissertation focuses on the research question how can we further estimate whether a measured accuracy level actually corresponds to a quality improvement from the user’s perspective, or whether the measured improvement is lost on the end user. We introduce some of the concepts related to recommendation quality and user perception, and continue on to show that currently widely-used evaluation metrics do not capture the quality of recommendation when the algorithm is specifically tuned to offer recommendation of a higher diversity. Following this we present a formalization of the upper limit of recommendation quality, a magic barrier of recommendation, and evaluate it in a real-world movie recommendation setting. The work presented in this dissertation concludes that current recommendation quality has outgrown the methods and metrics used for the evaluation of these systems. Instead, we show how qualitative approaches can be used, with minimal user interference, to correctly estimate the actual quality of recommendation systems.