← projects
reinforcement-learningreward-modelingpythonbenchmarking
RMSearch: Reward Model Training & LLM Benchmarking
End-to-end reinforcement learning pipeline for training custom reward models, with a comprehensive benchmarking suite evaluating LLM coding performance against BigCodeBench standards.
Context
At KyotoAI (a government-subsidized startup), we needed to build reliable reward models for retrieval optimization and validate LLM coding capabilities for B2B enterprise deployments.
Reward Model Training
- RL Pipeline: Implemented end-to-end reinforcement learning loops to train custom reward models
- RMSearch Algorithm: Optimized the custom "RMSearch" algorithm for efficient, high-accuracy retrieval
- Iterative Refinement: Continuous reward model improvement through human feedback integration
Evaluation Engine
- BigCodeBench Integration: Built comprehensive benchmarking suite evaluating LLM coding performance against industry standards
- Regression Testing: Strict regression testing pipeline ensuring B2B deployment reliability
- Multi-Model Comparison: Systematic evaluation across open-weight and proprietary models
Impact
- Custom reward models achieving high-accuracy retrieval for production search
- Benchmarking suite used for enterprise model selection decisions
- Regression testing preventing quality degradation across model updates