Music |
Video |
Movies |
Chart |
Show |
Autoregressive Models for Offline Policy Evaluation and Optimization (Michael Zhang) View | |
Session 7: Off Policy Actor Critic for Recommender Systems (ACM RecSys) View | |
[QA] Bootstrapping Language Models with DPO Implicit Rewards (Arxiv Papers) View | |
[2024 Best AI Paper] DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Mod (Paper With Video) View | |
[2024 Best AI Paper] ReFT: Reasoning with Reinforced Fine-Tuning (Paper With Video) View | |
[2024 Best AI Paper] SimPO: Simple Preference Optimization with a Reference-Free Reward (Paper With Video) View | |
Controlled Decoding from Language Models (Arxiv Papers) View | |
Data Efficient Methods for Reinforcement Learning (Simons Institute) View | |
[2024 Best AI Paper] Training Language Models to Self-Correct via Reinforcement Learning (Paper With Video) View | |
Accelerating exploration and representation learning with offline pre-training - ArXiv:2 (Academia Accelerated) View |