mastodon.world is one of the many independent Mastodon servers you can use to participate in the fediverse.
Generic Mastodon server for anyone to use.

Server stats:

8.1K
active users

To learn from experience, a reinforcement learning (RL) agent needs four key elements:

:blobcoffee: State: What situation is the agent in?
:blobcoffee: Actions: What are the possible moves from here?
:blobcoffee: Reward: What does the agent receive after an action?
:blobcoffee: Value function: How good is a state (or action), both now and in the future?

This is the foundation of how an RL agent learns to make better decisions over time.

Sarah Lea<p>Can you remember learning to walk as a baby? You didn’t read a manual. Neither does an AI agent.</p><p>Reinforcement Learning (RL) isn’t about knowing the correct answer.<br>It’s about learning through trial and error, by interacting with an environment &amp; receiving feedback.</p><p>That’s how AlphaGo defeated a world champion:<br>It first learned from expert games. Then it played against itself, millions of times, using RL to get better with each game. That’s how it mastered Go.</p><p><a href="https://techhub.social/tags/machinelearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>machinelearning</span></a> <a href="https://techhub.social/tags/ai" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ai</span></a> <a href="https://techhub.social/tags/ki" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ki</span></a> <a href="https://techhub.social/tags/google" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>google</span></a> <a href="https://techhub.social/tags/reinforcementlearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>reinforcementlearning</span></a> <a href="https://techhub.social/tags/alphago" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>alphago</span></a> <a href="https://techhub.social/tags/datascience" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>datascience</span></a> <a href="https://techhub.social/tags/datascientist" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>datascientist</span></a></p>
Sarah Lea<p>Which strategy do you use when learning something new?</p><p>3 strategies AI agents use to learn what works:<br>:blobcoffee: Greedy: Stick with what has worked best so far.<br>:blobcoffee: ε-Greedy: Mostly stick with the best. But try something new every now and then.<br>:blobcoffee: Optimistic Start: Assume everything is great until proven otherwise.</p><p>They all come from something called the “Multi-Armed Bandit” problem.</p><p>But they show up in real life too:<br>→ Trying a new café.<br>→ Deciding what to study <br>→ Choosing which project to pursue at work.</p><p>Which one do you use most often?<br>And should you change it?</p><p>Curious to dive deeper? I covered both topics in my latest two articles: <a href="https://towardsdatascience.com/author/schuerch_sarah/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">towardsdatascience.com/author/</span><span class="invisible">schuerch_sarah/</span></a></p><p><a href="https://techhub.social/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a> <a href="https://techhub.social/tags/KI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>KI</span></a> <a href="https://techhub.social/tags/machinelearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>machinelearning</span></a> <a href="https://techhub.social/tags/reinforcementlearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>reinforcementlearning</span></a> <a href="https://techhub.social/tags/learning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>learning</span></a> <a href="https://techhub.social/tags/datascience" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>datascience</span></a></p>
michabbb<p><a href="https://social.vivaldi.net/tags/ART" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ART</span></a> Agent Reinforcement Trainer: <a href="https://social.vivaldi.net/tags/Opensource" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Opensource</span></a> <a href="https://social.vivaldi.net/tags/RL" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>RL</span></a> <a href="https://social.vivaldi.net/tags/framework" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>framework</span></a> for building reliable <a href="https://social.vivaldi.net/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a> agents 🤖</p><p>🎯 Improved email agent success rate from 74% to 94% using <a href="https://social.vivaldi.net/tags/ReinforcementLearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ReinforcementLearning</span></a> on <a href="https://social.vivaldi.net/tags/Qwen" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Qwen</span></a> model 💰 Reduced costs from $55 to $0.80 per 1,000 requests 🧵👇</p>
Sarah Lea<p>🍕 Imagine trying two pizzerias and always going back to the one that seemed better. Sounds simple? Maybe.</p><p>But what if there’s another one in the city that’s even better – and you never tried it?</p><p>That’s what the greedy strategy does. It sticks to what worked best so far.</p><p>In Multi-Armed Bandits, a classic problem that helps us understand Reinforcement Learning, there are three common strategies to deal with this dilemma:<br>:blobcoffee: Greedy<br>:blobcoffee: ε-Greedy<br>:blobcoffee: Optimistic Initial Values</p><p><a href="https://towardsdatascience.com/simple-guide-to-multi-armed-bandits-a-key-concept-before-reinforcement-learning/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">towardsdatascience.com/simple-</span><span class="invisible">guide-to-multi-armed-bandits-a-key-concept-before-reinforcement-learning/</span></a></p><p><a href="https://techhub.social/tags/DataScientist" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DataScientist</span></a> <a href="https://techhub.social/tags/datascience" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>datascience</span></a> <a href="https://techhub.social/tags/data" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>data</span></a> <a href="https://techhub.social/tags/machinelearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>machinelearning</span></a> <a href="https://techhub.social/tags/ai" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ai</span></a> <a href="https://techhub.social/tags/ki" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ki</span></a> <a href="https://techhub.social/tags/reinforcementlearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>reinforcementlearning</span></a> <a href="https://techhub.social/tags/agents" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>agents</span></a> <a href="https://techhub.social/tags/agenticai" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>agenticai</span></a> <a href="https://techhub.social/tags/technology" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>technology</span></a> <a href="https://techhub.social/tags/tech" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>tech</span></a></p>
Sarah Lea<p>Reinforcement Learning starts with a simple but powerful idea:<br>Trial &amp; Error. Learning what works.</p><p>The Multi-Armed Bandit problem is a first step into this world.<br>It's not just about slot machines. Iit's about how AI (and humans) learn to choose.</p><p><a href="https://towardsdatascience.com/simple-guide-to-multi-armed-bandits-a-key-concept-before-reinforcement-learning/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">towardsdatascience.com/simple-</span><span class="invisible">guide-to-multi-armed-bandits-a-key-concept-before-reinforcement-learning/</span></a></p><p><a href="https://techhub.social/tags/ReinforcementLearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ReinforcementLearning</span></a> <a href="https://techhub.social/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a> <a href="https://techhub.social/tags/CognitiveScience" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>CognitiveScience</span></a> <a href="https://techhub.social/tags/Psychology" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Psychology</span></a> <a href="https://techhub.social/tags/Behavior" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Behavior</span></a> <a href="https://techhub.social/tags/DecisionMaking" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DecisionMaking</span></a> <a href="https://techhub.social/tags/Bandits" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Bandits</span></a> <a href="https://techhub.social/tags/machinelearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>machinelearning</span></a> <a href="https://techhub.social/tags/KI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>KI</span></a> <a href="https://techhub.social/tags/Datascience" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Datascience</span></a> <a href="https://techhub.social/tags/datascientist" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>datascientist</span></a></p>
Sarah Lea<p>Do you always go to the same café? Or do you try something new?</p><p>That’s the exploration vs. exploitation dilemma: Decision under uncertainty.</p><p>Multi-armed bandits model exactly that.</p><p>And this dilemma shows up everywhere: Recommender systems, A/B tests, online ads, even in human psychology.</p><p>Nobel Prize winner Daniel Kahneman called this one of the most fundamental cognitive patterns.</p><p>🎰 I explain what it is, why it matters, and how AI systems handle it. </p><p>:blobcoffee: Full article here: <a href="https://towardsdatascience.com/simple-guide-to-multi-armed-bandits-a-key-concept-before-reinforcement-learning/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">towardsdatascience.com/simple-</span><span class="invisible">guide-to-multi-armed-bandits-a-key-concept-before-reinforcement-learning/</span></a></p><p><a href="https://techhub.social/tags/ReinforcementLearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ReinforcementLearning</span></a> <a href="https://techhub.social/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a> <a href="https://techhub.social/tags/CognitiveScience" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>CognitiveScience</span></a> <a href="https://techhub.social/tags/Kahneman" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Kahneman</span></a> <a href="https://techhub.social/tags/Psychology" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Psychology</span></a> <a href="https://techhub.social/tags/Behavior" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Behavior</span></a> <a href="https://techhub.social/tags/DecisionMaking" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DecisionMaking</span></a> <a href="https://techhub.social/tags/Bandits" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Bandits</span></a> <a href="https://techhub.social/tags/machinelearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>machinelearning</span></a> <a href="https://techhub.social/tags/KI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>KI</span></a> <a href="https://techhub.social/tags/Datascience" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Datascience</span></a> <a href="https://techhub.social/tags/datascientist" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>datascientist</span></a></p>
Hacker News<p>Shoggoth Mini – A soft tentacle robot powered by GPT-4o and RL</p><p><a href="https://www.matthieulc.com/posts/shoggoth-mini" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="ellipsis">matthieulc.com/posts/shoggoth-</span><span class="invisible">mini</span></a></p><p><a href="https://mastodon.social/tags/HackerNews" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>HackerNews</span></a> <a href="https://mastodon.social/tags/ShoggothMini" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ShoggothMini</span></a> <a href="https://mastodon.social/tags/SoftRobot" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>SoftRobot</span></a> <a href="https://mastodon.social/tags/GPT4o" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>GPT4o</span></a> <a href="https://mastodon.social/tags/ReinforcementLearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ReinforcementLearning</span></a> <a href="https://mastodon.social/tags/TechInnovation" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>TechInnovation</span></a></p>
hlfshell<p>I had an idea scratching at the back of my head based on some of my recent talks given on AI research.</p><p>It's essentially hoping to train better generalist long-horizon robotic policies via a generative reward model and multi-turn credit assignment.</p><p><a href="https://hlfshell.ai/posts/proposal-robotic-complex-task-rl-training/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">hlfshell.ai/posts/proposal-rob</span><span class="invisible">otic-complex-task-rl-training/</span></a></p><p><a href="https://hachyderm.io/tags/research" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>research</span></a> <a href="https://hachyderm.io/tags/ai" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ai</span></a> <a href="https://hachyderm.io/tags/robotics" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>robotics</span></a> <a href="https://hachyderm.io/tags/reinforcementlearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>reinforcementlearning</span></a></p>
Hacker News<p>How to scale RL to 10^26 FLOPs</p><p><a href="https://blog.jxmo.io/p/how-to-scale-rl-to-1026-flops" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">blog.jxmo.io/p/how-to-scale-rl</span><span class="invisible">-to-1026-flops</span></a></p><p><a href="https://mastodon.social/tags/HackerNews" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>HackerNews</span></a> <a href="https://mastodon.social/tags/scaleRL" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>scaleRL</span></a> <a href="https://mastodon.social/tags/FLOPs" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>FLOPs</span></a> <a href="https://mastodon.social/tags/reinforcementLearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>reinforcementLearning</span></a> <a href="https://mastodon.social/tags/AIresearch" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AIresearch</span></a> <a href="https://mastodon.social/tags/optimization" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>optimization</span></a></p>
Hacker News<p>The upcoming GPT-3 moment for RL</p><p><a href="https://www.mechanize.work/blog/the-upcoming-gpt-3-moment-for-rl/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="ellipsis">mechanize.work/blog/the-upcomi</span><span class="invisible">ng-gpt-3-moment-for-rl/</span></a></p><p><a href="https://mastodon.social/tags/HackerNews" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>HackerNews</span></a> <a href="https://mastodon.social/tags/GPT3" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>GPT3</span></a> <a href="https://mastodon.social/tags/RL" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>RL</span></a> <a href="https://mastodon.social/tags/ReinforcementLearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ReinforcementLearning</span></a> <a href="https://mastodon.social/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a> <a href="https://mastodon.social/tags/Innovation" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Innovation</span></a></p>
hlfshell<p>The idea of using the multi-turn credit assignment combined with a more fleixble verifier system - like Deepseek's Generative Reward Model - creats a potentially FASCINATING idea that I wish I had the time and compute to explore.</p><p>Imagine a robotic system with a system dedicated to judging it (in simulation) based on a set of small piece meal sub-tasks for any long running tasks. You could in theory create a more robust generalzied model from this.</p><p><a href="https://hachyderm.io/tags/reinforcementlearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>reinforcementlearning</span></a> <a href="https://hachyderm.io/tags/robotics" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>robotics</span></a> <a href="https://hachyderm.io/tags/ai" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ai</span></a></p>
Hacker News<p>RULER – Easily apply RL to any agent</p><p><a href="https://openpipe.ai/blog/ruler" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">openpipe.ai/blog/ruler</span><span class="invisible"></span></a></p><p><a href="https://mastodon.social/tags/HackerNews" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>HackerNews</span></a> <a href="https://mastodon.social/tags/RULER" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>RULER</span></a> <a href="https://mastodon.social/tags/RL" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>RL</span></a> <a href="https://mastodon.social/tags/agents" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>agents</span></a> <a href="https://mastodon.social/tags/ReinforcementLearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ReinforcementLearning</span></a> <a href="https://mastodon.social/tags/OpenAI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>OpenAI</span></a> <a href="https://mastodon.social/tags/AItools" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AItools</span></a></p>
LLMsA Professional Guide to Reinforcement Learning Models in Machine Learning By Moustafa Mohamed AI Developer | Machine Learning, Deep Learning, LLM Engineering LinkedIn | GitHub | Portfolio | Kaggle ...<br><br><a rel="nofollow noopener" class="mention hashtag" href="https://mastodon.social/tags/data-science" target="_blank">#data-science</a> <a rel="nofollow noopener" class="mention hashtag" href="https://mastodon.social/tags/reinforcement-learning" target="_blank">#reinforcement-learning</a> <a rel="nofollow noopener" class="mention hashtag" href="https://mastodon.social/tags/machine-learning" target="_blank">#machine-learning</a> <a rel="nofollow noopener" class="mention hashtag" href="https://mastodon.social/tags/ai" target="_blank">#ai</a> <a rel="nofollow noopener" class="mention hashtag" href="https://mastodon.social/tags/deep-learning" target="_blank">#deep-learning</a><br><br><a href="https://moustafamohamed01.medium.com/a-professional-guide-to-reinforcement-learning-models-in-machine-learning-2197bfdf5e5d?source=rss------machine_learning-5" rel="nofollow noopener" target="_blank">Origin</a> | <a href="https://awakari.com/sub-details.html?id=LLMs" rel="nofollow noopener" target="_blank">Interest</a> | <a href="https://awakari.com/pub-msg.html?id=N5ITRf6rnm6lHoZm21YAd5tWUCG&amp;interestId=LLMs" rel="nofollow noopener" target="_blank">Match</a>
Georg Weissenbacher<p>A nice <a href="https://fediscience.org/tags/arstechnica" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>arstechnica</span></a> article on <a href="https://fediscience.org/tags/ReinforcementLearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ReinforcementLearning</span></a> in <a href="https://fediscience.org/tags/LLMs" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>LLMs</span></a> </p><p><a href="https://arstechnica.com/ai/2025/07/how-a-big-shift-in-training-llms-led-to-a-capability-explosion/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">arstechnica.com/ai/2025/07/how</span><span class="invisible">-a-big-shift-in-training-llms-led-to-a-capability-explosion/</span></a></p>
Ars Technica News<p>How a big shift in training LLMs led to a capability explosion <a href="https://arstechni.ca/FerG" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">arstechni.ca/FerG</span><span class="invisible"></span></a> <a href="https://c.im/tags/reinforcementlearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>reinforcementlearning</span></a> <a href="https://c.im/tags/imitationlearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>imitationlearning</span></a> <a href="https://c.im/tags/explainers" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>explainers</span></a> <a href="https://c.im/tags/Features" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Features</span></a> <a href="https://c.im/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a></p>
Erik Jonker<p>Good article how reinforcement learning improved current AI models. Also illustrates that LLMs today are not just imitating.<br><a href="https://arstechnica.com/ai/2025/07/how-a-big-shift-in-training-llms-led-to-a-capability-explosion/?utm_brand=arstechnica&amp;utm_social-type=owned&amp;utm_source=mastodon&amp;utm_medium=social" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">arstechnica.com/ai/2025/07/how</span><span class="invisible">-a-big-shift-in-training-llms-led-to-a-capability-explosion/?utm_brand=arstechnica&amp;utm_social-type=owned&amp;utm_source=mastodon&amp;utm_medium=social</span></a><br><a href="https://mastodon.social/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a> <a href="https://mastodon.social/tags/reinforcementlearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>reinforcementlearning</span></a></p>
Hacker News<p>Reinforcement Learning from Human Feedback (RLHF) in Notebooks</p><p><a href="https://github.com/ash80/RLHF_in_notebooks" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">github.com/ash80/RLHF_in_noteb</span><span class="invisible">ooks</span></a></p><p><a href="https://mastodon.social/tags/HackerNews" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>HackerNews</span></a> <a href="https://mastodon.social/tags/ReinforcementLearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ReinforcementLearning</span></a> <a href="https://mastodon.social/tags/HumanFeedback" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>HumanFeedback</span></a> <a href="https://mastodon.social/tags/RLHF" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>RLHF</span></a> <a href="https://mastodon.social/tags/Notebooks" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Notebooks</span></a> <a href="https://mastodon.social/tags/AIResearch" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AIResearch</span></a></p>
LLMsProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models The key idea The key idea Reinforcement learning (RL) has had a resurgence in LLMs with application to ...<br><br><a rel="nofollow noopener" class="mention hashtag" href="https://mastodon.social/tags/LLMs" target="_blank">#LLMs</a> <a rel="nofollow noopener" class="mention hashtag" href="https://mastodon.social/tags/training-dynamics" target="_blank">#training-dynamics</a> <a rel="nofollow noopener" class="mention hashtag" href="https://mastodon.social/tags/fine-tuning" target="_blank">#fine-tuning</a> <a rel="nofollow noopener" class="mention hashtag" href="https://mastodon.social/tags/reasoning" target="_blank">#reasoning</a> <a rel="nofollow noopener" class="mention hashtag" href="https://mastodon.social/tags/reinforcement-learning" target="_blank">#reinforcement-learning</a><br><br><a href="https://graphcore-research.github.io/prorl/" rel="nofollow noopener" target="_blank">Origin</a> | <a href="https://awakari.com/sub-details.html?id=LLMs" rel="nofollow noopener" target="_blank">Interest</a> | <a href="https://awakari.com/pub-msg.html?id=9N5KR0r2wXZVbgWCE3Sv9k3kf6e&amp;interestId=LLMs" rel="nofollow noopener" target="_blank">Match</a>
US<p><a href="https://www.europesays.com/us/41491/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="">europesays.com/us/41491/</span><span class="invisible"></span></a> Multi objective reinforcement learning driven task offloading algorithm for satellite edge computing networks <a href="https://pubeurope.com/tags/AerospaceEngineering" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AerospaceEngineering</span></a> <a href="https://pubeurope.com/tags/Computing" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Computing</span></a> <a href="https://pubeurope.com/tags/DQN" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DQN</span></a> <a href="https://pubeurope.com/tags/Engineering" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Engineering</span></a> <a href="https://pubeurope.com/tags/HumanitiesAndSocialSciences" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>HumanitiesAndSocialSciences</span></a> <a href="https://pubeurope.com/tags/multidisciplinary" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>multidisciplinary</span></a> <a href="https://pubeurope.com/tags/ReinforcementLearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ReinforcementLearning</span></a> <a href="https://pubeurope.com/tags/SatelliteEdgeComputing" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>SatelliteEdgeComputing</span></a> <a href="https://pubeurope.com/tags/Science" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Science</span></a> <a href="https://pubeurope.com/tags/SpaceBasedInformationNetwork" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>SpaceBasedInformationNetwork</span></a> <a href="https://pubeurope.com/tags/TaskOffloading" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>TaskOffloading</span></a> <a href="https://pubeurope.com/tags/Technology" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Technology</span></a> <a href="https://pubeurope.com/tags/UnitedStates" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>UnitedStates</span></a> <a href="https://pubeurope.com/tags/UnitedStates" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>UnitedStates</span></a> <a href="https://pubeurope.com/tags/US" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>US</span></a></p>
404Not Found