久久综合色88_欧美激情国产日韩精品一区18_午夜精品一区二区三区在线观看 _自拍日韩亚洲一区在线

課程目錄: 基于函數逼近的預測與控制培訓
4401 人關注
(78637/99817)
課程大綱:

    基于函數逼近的預測與控制培訓

 

 

 

Welcome to the Course!

Welcome to the third course in the Reinforcement Learning Specialization:

Prediction and Control with Function Approximation, brought to you by the University of Alberta,

Onlea, and Coursera.

In this pre-course module, you'll be introduced to your instructors,

and get a flavour of what the course has in store for you.

Make sure to introduce yourself to your classmates in the "Meet and Greet" section!

On-policy Prediction with Approximation

This week you will learn how to estimate a value function for a given policy,

when the number of states is much larger than the memory available to the agent.

You will learn how to specify a parametric form of the value function,

how to specify an objective function, and how estimating gradient descent can be used to estimate values from interaction with the world.

Constructing Features for Prediction

The features used to construct the agent’s value estimates are perhaps the most crucial part of a successful learning system.

In this module we discuss two basic strategies for constructing features: (1) fixed basis that form an exhaustive partition of the input,

and (2) adapting the features while the agent interacts with the world via Neural Networks and Backpropagation.

In this week’s graded assessment you will solve a simple but infinite state prediction task with a Neural Network and

TD learning.Control with ApproximationThis week,

you will see that the concepts and tools introduced in modules two and three allow straightforward extension of classic

TD control methods to the function approximation setting. In particular,

you will learn how to find the optimal policy in infinite-state MDPs by simply combining semi-gradient

TD methods with generalized policy iteration, yielding classic control methods like Q-learning, and Sarsa.

We conclude with a discussion of a new problem formulation for RL---average reward---which will undoubtedly

be used in many applications of RL in the future.

Policy GradientEvery algorithm you have learned about so far estimates

a value function as an intermediate step towards the goal of finding an optimal policy.

An alternative strategy is to directly learn the parameters of the policy.

This week you will learn about these policy gradient methods, and their advantages over value-function based methods.

You will also learn how policy gradient methods can be used

to find the optimal policy in tasks with both continuous state and action spaces.

主站蜘蛛池模板: av在线com| 欧美 日韩 国产在线观看| 97精品在线观看| 久久免费视频在线观看| 亚洲精品成人久久久998| 精品不卡在线| 亚洲国产成人不卡| 114国产精品久久免费观看| 国产精品美女在线播放| 国产专区精品视频| 久久免费99精品久久久久久| 日韩视频中文字幕| 亚洲精品欧洲精品| 国产成人精品在线播放| 国产精品盗摄久久久| 国产精品视频永久免费播放| 狠狠干视频网站| 精品久久久久久无码中文野结衣 | 国产日韩在线精品av| 久久精品人人做人人爽电影 | 久久久久久美女| 久久久中文字幕| 久久久精品美女| 久久精品国产欧美激情| 久久免费视频观看| 久久99精品久久久久久青青日本| 久久精品国亚洲| 国产精品久久久久久久久久免费| 国产精品视频在线观看| 国产va免费精品高清在线观看| 俄罗斯精品一区二区| 岛国一区二区三区高清视频| 91av在线国产| 日韩欧美亚洲天堂| 久久这里只有精品视频首页| 久久久久成人精品| 国产精品人成电影在线观看| 91精品国产综合久久香蕉最新版| 亚洲国产精品综合| 日韩视频免费大全中文字幕| 欧美在线观看日本一区|