Evaluating LLM outputs: building an eval set before you ship | TechTrio Blog