In an earlier blog post, I mentioned how we had 3 interns working at our Applied Research and Innovation Lab at Pier 9. A few weeks ago, we had our end of intern presentations. I have already shared the story of Jack Reinke and Brice Dudley's presentation.
Here is another story.
Hi, my is Charlott and I'm a mechanical engineering Ph.D. student at UC Berkeley. This summer, I was a machine learning intern in the Applied Research and Innovation Lab, working with Dr. Hui Li to teach virtual robots by demonstration in virtual reality (VR).
My summer project was part of a bigger project in the lab. The lab has a Kuka industrial robot arm, and we want to teach it how to do tasks using a machine learning technique called teaching by demonstration. As the name implies, the process requires demonstrating to the robot how to solve a particular task and then training a neural network from those demonstrations. This is an appealing training method because it's much more intuitive to show how a task should be completed than to describe it in equations and numbers. The lab also has a digital copy of the Kuka robot in Autodesk Stingray, and the big idea here is to try to teach the virtual robot by demonstration in a VR setting, and then use what the neural network learned from the virtual demonstrations to control the real robot. This is an enticing strategy because:
This is an enticing strategy because:
- As opposed to the physical robot, the VR robot is accessible from anywhere in the world, requiring only access to the VR system, and there are no time restrictions for when to train the robot.
- Operating a robot in virtual reality is faster than in the real world because you don't have to worry about safety issues and don't have to spend time coding.
- You don't need any technical robot knowledge to operate a VR system, and your movements when demonstrating a task can be much more intuitive.
After demonstrating how to solve the task, you can have a neural network learn from that, and allow the network to control the robot. The way that control works is that during runtime, the trained neural network receives the current state of the system — typically the positions of the robot and any relevant objects — and then suggests what the next position of the robot should be, based on the task demonstrations it has seen. Before we send that next state to the robot, we pass it through an inverse kinematics solver, which calculates whether the suggested next state is even reachable for the robot. If it is, we send the command for the robot to move to that new state. If it isn't, we tweak the output slightly so that it is reachable, and then send the edited command. That command leads to a new current state that becomes the next input to the neural network, and you continue this cycle until your task is complete.
Now my goal for the summer was to tackle the first part of this project: teaching the virtual Kuka robot by demonstration in Stingray. Stingray had never been used as a machine learning tool before, so one of the goals of this project was to see how it would fare. My first step was to set up the Stingray VR environment around the virtual Kuka robot. I decided to train the robot to push a block into a goal area marked on the floor, so I added those two objects to the existing Stingray environment. The block and goal were coded so that they would spawn in random new locations around the robot whenever the block reached the goal, so that a new demonstration could start as soon as the previous one finished.
For training the neural network, we had to collect the demonstrations of how the robot should push the block into the goal. We used a Vive controller tethered to the virtual robot arm so that when we moved our arms in real life, the robot arm moved along with us in the VR world as we pushed the block into the goal area to complete the task. Lab members took turns training the robot, and overall, we collected around 10 hours' worth of demonstrations.
We used this collected data to train a neural network. The training data was unrolled for 50 time-steps, and then, building upon previously published work, we used a recurrent neural network with long short-term memory to encode the training data. This type of neural network is especially suited to learning trajectory-based tasks because it uses both the current and previous state as inputs, as represented by these connections here between the different time steps.
Over the course of a month, we had to slowly tease out what the model wanted from us in terms of training data. After trying various combinations of those changes, we trained a model that works! This model has a 65% success rate.
Hey that's very good, Charlott. I just wanna live in a world where robots are my assistants. Consider it the anthem of my calling.*
A virtual learning experience is alive in the lab.