Skip to content

bug in reinforce with baseline #37

Open
@hlhang9527

Description

@hlhang9527

the update value network should be:

    alpha_w = 1e-3  # 初始化

    optimizer_w = optim.Adam(**s_value_func**.parameters(), lr=alpha_w)
    optimizer_w.zero_grad()
    policy_loss_w =-delta
    policy_loss_w.backward(retain_graph = True)
    clip_grad_norm_(policy_loss_w, 0.1)
    optimizer_w.step()

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions