How rewards teach reinforcement learning agents to behave

Curated from: thenextweb.com

Ideas, facts & insights covering these topics:

Artificial Intelligence

Technology & The Future

4 ideas

153 reads

Explore the World's Best Ideas

Join today and uncover 100+ curated journeys from 50+ topics. Unlock access to our mobile app with extensive features.

Reinforcement Learning

In reinforcement learning (RL), a software agent learns through trial and error. When it takes the desired action, the model receives a reward.

Over time, the agent works out how to execute the task to optimize its reward.

The technique can be applied to a vast array of tasks, from controlling autonomous vehicles to improving energy efficiency. But its most celebrated achievements have come in the world of games.

81 reads

The AlphaGo Milestone

In March 2016, the Reinforcement Learning technique had a landmark moment.

A DeepMind system called AlphaGo became the first computer program to defeat a world champion in Go, a famously complex board game.

The victory was reportedly watched by over 200 million people.

AlphaGo learns the game from scratch by playing against different versions of itself thousands of times, incrementally learning through a process of trial and error, known as reinforcement learning. This means it is free to learn the game for itself, unconstrained by orthodox thinking.

23 reads

How a Reward Function Works

In AI systems, the rewards and punishments are calculated mathematically. A self-driving system could receive a -1 when the model hits a wall, and a +1 if it safely passes another car. These signals allow the agent to evaluate its performance.

The algorithm then learns through trial and error to maximize the reward — and ultimately, complete the task in the most desirable manner.

24 reads

The Bottom Line

There are still major challenges to overcome. RL agents struggle to maximize rewards in complex environments and assess the long-term repercussions of their actions. Nonetheless, the reward-is-enough proponents believe the algorithms’ adaptability could pave a path to AGI.

25 reads

IDEAS CURATED BY

Theodore H.

@theodorexh

There is a difference between patience & procrastination.

Theodore H.'s ideas are part of this journey:

Learn more about artificialintelligence with this collection

Machine Learning With Google

Understanding machine learning models

Improving data analysis and decision-making

How Google uses logic in machine learning

Related collections

Introduction to Web 3.0

Metaverse

Hiring Without an Office

The Podcasting Ecosystem

Similar ideas

7 ideas

Computers Evolve a New Path Toward Human Intelligence

quantamagazine.org

4 ideas

Happiness: why learning, not rewards, may be the key – new research

theconversation.com

18 ideas

How Search Engines Use Machine Learning: 9 Things We Know For Sure

searchenginejournal.com

Read & Learn

20x Faster

without
deepstash

with
deepstash

with

deepstash

Personalized microlearning

—

100+ Learning Journeys

—

Access to 200,000+ ideas

—

Access to the mobile app

—

Unlimited idea saving

—

Unlimited history

—

Unlimited listening to ideas

—

Downloading & offline access

—

Supercharge your mind with one idea per day

Enter your email and spend 1 minute every day to learn something new.

I agree to receive email updates

deepstash

Content

Ideas

Collections

Stories

Explore

Product

Pricing

Businesses

Resources

Terms

Privacy

Press Kit

Sitemap

Company

About

Contact