$ ls papers/

Papers

Research publications on LLM evaluation, AI creativity, and natural language processing.

Evaluation Framework for AI Creativity: A Case Study Based on Story Generation

NLP2026 (NLP Society of Japan) · 2026

We propose an evaluation framework addressing challenges in assessing creative text generation. The framework includes four main dimensions — Novelty, Value, Adherence, and Resonance — and eleven sub-components. Through controlled story generation using Spike Prompting and a crowdsourced evaluation involving 115 readers, we find that creativity is evaluated hierarchically rather than cumulatively, with different dimensions becoming salient at different stages of judgment. Reflective evaluation substantially shifts both ratings and inter-rater agreement, demonstrating the framework's capacity to expose creativity dimensions that conventional reference-based metrics overlook.

Pharath Sathya, Yin Jou Huang, Fei Cheng

Creativity Is Not Enjoyment: Rethinking Human Evaluation of AI Story Generation

PDF

ANLP (Advanced NLP) · 2026

Large language models are often optimized to generate creative text, yet it remains unclear whether creativity translates to user satisfaction. We propose a framework that evaluates creativity and enjoyment as separate dimensions. Through a controlled study with diverse AI-generated stories, we show that creativity judgments rely primarily on novelty, whereas enjoyment depends on emotional resonance. Optimizing for novelty alone increases perceived creativity but can reduce user satisfaction, revealing a fundamental trade-off in current generation methods.

Pharath Sathya, Yin Jou Huang, Fei Cheng