You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
项目背景
HalfCheetah 是强化学习中一个经典的控制问题,目标是在固定的步数内让机器人尽可能多地向前奔跑,同时保持能量消耗较小。D4RL 提供了标准的离线强化学习数据集,其中的
halfcheetah-medium-v2数据集被用于训练和评估控制策略。本项目旨在通过 REVIVE SDK,利用历史数据驱动的方法来优化 HalfCheetah 的控制策略。挑战与解决方案
delta_x信息,以计算奖励函数。halfcheetah-medium-v2数据集,通过数据处理脚本将其转换为 REVIVE SDK 所需的格式。项目实现
halfcheetah-medium-v2数据集,运行数据处理脚本data/generate_data.py,将数据转换为.npz格式。obs是否与 t 时刻的next_obs一致,来切分轨迹并生成index信息。delta_x信息:通过数据集中的reward信息来还原delta_x,计算公式为:\[delta_x := x_{t+1} - x_{t}\]
examples/task/HalfCheetah目录下,运行以下命令进行虚拟环境模型训练和策略模型训练:成果与影响
代码与资源链接
策略控制对比动画
Beta Was this translation helpful? Give feedback.
All reactions