000 | 01842nam a22005297a 4500 | ||
---|---|---|---|
999 |
_c13885 _d13884 |
||
003 | OSt | ||
005 | 20231103135354.0 | ||
006 | m|||||o||d| 00| 0 | ||
007 | cr || auc||a|a | ||
008 | 210422s2018 maua||||sb||| 001 0 eng d | ||
040 | _cQCPL | ||
100 | 1 |
_91469 _aSutton, Richard S. _eauthor |
|
245 | 1 | 0 |
_aReinforcement learning _b: an introduction _c/ Richard S. Sutton and Andrew G. Barto |
250 | _aSecond edition | ||
264 | 1 |
_aCambridge, Massachusetts : _bMIT Press, _c[2018] |
|
300 |
_a1 online resource : _billustrations |
||
336 |
_2rdacontent _atext |
||
337 |
_2rdamedia _acomputer |
||
338 |
_2rdacarrier _aonline resource |
||
490 | _aAdaptive computation and machine learning | ||
504 | _aIncludes bibliographical references and index. | ||
505 | 0 | _aReinforcement learning | |
505 | 0 | _aPart I. Tabular solution methods | |
505 | 0 | _aMulti-armed bandits | |
505 | 0 | _aFinite Markov decision processes | |
505 | 0 | _aDynamic programming | |
505 | 0 | _aMonte Carlo methods | |
505 | 0 | _aTemporal-difference learning | |
505 | 0 | _an-step bootstrapping | |
505 | 0 | _aPlanning and learning with tabular methods | |
505 | 0 | _aPart II. Approximate solution methods | |
505 | 0 | _aOn-policy prediction with approximation | |
505 | 0 | _aOn-policy control with approximation | |
505 | 0 | _aOff-policy methods with approximation | |
505 | 0 | _aEligibility traces | |
505 | 0 | _aPolicy gradient methods | |
505 | 0 | _aPart III. Looking deeper | |
505 | 0 | _aPsychology | |
505 | 0 | _aNeuroscience | |
505 | 0 | _aApplications and case studies | |
505 | 0 | _aFrontiers | |
650 | _aReinforcement learning | ||
655 | _aElectronic Books | ||
700 | 1 |
_91470 _aBarto, Andrew G. _eauthor |
|
856 |
_amit.edu _uhttps://www.andrew.cmu.edu/course/10-703/textbook/BartoSutton.pdf |
||
942 |
_2ddc _cEBOOK |