nick.dev
← projects
reinforcement-learningreward-modelingpythonbenchmarking

RMSearch: Reward Model Training & LLM Benchmarking

End-to-end reinforcement learning pipeline for training custom reward models, with a comprehensive benchmarking suite evaluating LLM coding performance against BigCodeBench standards.

Context

At KyotoAI (a government-subsidized startup), we needed to build reliable reward models for retrieval optimization and validate LLM coding capabilities for B2B enterprise deployments.

Reward Model Training

  • RL Pipeline: Implemented end-to-end reinforcement learning loops to train custom reward models
  • RMSearch Algorithm: Optimized the custom "RMSearch" algorithm for efficient, high-accuracy retrieval
  • Iterative Refinement: Continuous reward model improvement through human feedback integration

Evaluation Engine

  • BigCodeBench Integration: Built comprehensive benchmarking suite evaluating LLM coding performance against industry standards
  • Regression Testing: Strict regression testing pipeline ensuring B2B deployment reliability
  • Multi-Model Comparison: Systematic evaluation across open-weight and proprietary models

Impact

  • Custom reward models achieving high-accuracy retrieval for production search
  • Benchmarking suite used for enterprise model selection decisions
  • Regression testing preventing quality degradation across model updates