TV operators and content providers use recommender systems to connect consumers directly with content that fits their needs, their different devices, and the context in which the content is being consumed. Choosing the right recommender algorithms is critical, and becomes more difficult as content offerings continue to radically expand. Because different algorithms respond differently depending on the use-case, including the content and the consumer base, theoretical estimates of performance are not sufficient. Rather, evaluation must be carried out in a realistic environment. The Reference Framework described here is an evaluation platform that enables TV operators to compare impartially not just the qualitative aspects of recommendation algorithms, but also non-functional requirements of complete recommendation solutions. The Reference Framework is being created by the CrowdRec project which includes the most innovative recommendation system vendors and university researchers in the specific fields of recommendation systems and their evaluation. It provides batch-based evaluation modes and looks forward to supporting stream-based modes in the future. It is also able to encapsulate open source recommender and evaluation frameworks, making it suitable for a wide scope of evaluation needs.