I finally implemented the backpropagation through a camera.
We have optimized a controller for the pendulum-cart system, by backpropagating through the physics, vision and controller. By treating the whole system as a differentiable function, we could update the controller parameters, and have it solve this problem. In only 2420 update steps (or trials in the simulator), it managed to completely learn the vision and the control task, and solve the pendulum-cart system. I hope the paper for ICLR will be accepted soon, such that I can publish the code and see what other people will do with it.
Next step: evaluating this method on the real setup!
Control the pendulum-cart setup with only visual perception. Optimized deep controller with backpropagation through time, camera and physics pic.twitter.com/6gu7GHpQBv
— 317070 (@317070) January 16, 2017