-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem with testing in another envrionment #148
Comments
Sorry, the top two curves in this graph are 2000 real_time_rate and the bottom is raw time. |
Hi, |
Hello Mr. reincimurs, |
|
Are you using the main branch? The laser sensor is only used to detect collisions. It will not affect how the state is represented. |
Yes. I use the main branch, I use the state[8:-12]. Additionally, because the map I created have many cylinders, and everytime I train I find that after a while almost half of the posts fall down, I suspected that the posistion of the robots was reset to the cardboardbox, which caused them ot crash violently and affected the posts, so I set every one‘s tags |
I think I am missing the overall picture of what changes you have made to the code here. Can you please explain all changes in detail is a structured way please? It is quite hard for me to follow all the details here. For instance, you mention changing the max distance in laser sensor setup. However, laser sensor is not used to get state information. For that you would have to update the supporting velodyne puck sensor files, but that is not mentioned, so it is unclear to me if it was done or not. It is also unclear to me what you mean by OU noise. |
Hello Mr Reinicimurs, I have finished this task, the status information I used is completely original, I keep all the configuration in the original state, but I created a new 8x8 environment for testing, I found it difficult for the robot to adapt. So I load the already trained model into a new environment for secondary training, because I find it very difficult to train from scratch in a new environment. |
Again, please provide as much information as possible with full list of changes (code snippets, images and so on). There should not be any major issues with moving to another environment for testing or training. You should just make sure to cap the laser readings and that state values are capped at maximally seen values during training. |
Okay, the following image is the model I trained on the map you provided using the method you posted and the unmodified configuration, and then tested it on the map I created. I adjusted the environment file to ensure that random points do not appear within obstacles. I used your method and did not change the configuration of the Velodyne sensor. The reward value eventually converged. Most of the time, robots can execute characters normally, but sometimes they get stuck as shown in the video:Kazam_screencast_00000.webm The current results are directly tested using the trained model in a new environment, and the test results show that there were 64 collisions out of 800 tasks (excluding being trapped),and I am quite satisfied with this result. |
I see. Is this the only kind of scenario that the model struggles? Meaning, does it struggle when you place the goal point quite close to the obstacle? If that is so (and it is what is happening here) then it seems to me the issue is same as i explain in another issue: #157 (comment) Essentially the goal point is located too close to the obstacle so when the model evaluates the state-action pair, it will base that more on the experience of collisions with obstacles than reaching a goal. It would be entirely about Q-value for the goal placement and not so much the environment itself. |
Hello Mr. Reiniscimurs, |
Yes that would be one option: |
Hello, Mr. Reiniscimurs, I have done the test after training, bleow is the average reward function generated by my training of the improved method.In addition, I conducted tests on the original map, and basically reached the target, although there was a collision in a few cases. I then created a 30x30 size map with some sparse and regular obstacles in it and replaced the TD3.world file, and I modified the velodyne_env.py file so that the goal would not fall in the obstacles.The question is, when everything is running properly, the robot will stay in place and rotate at a small Angle to the left and right, and will not reach its destination. The answer seems to be obvious that the robot has never been trained beyond the present situation, so,dose it needs to be trained in a new environment until the positive reward is stable, right? I also wonder if the new envrionment requires more parameters for the TD3 network? If I increase the number of states, do I need a large number of network parameters? Looking forward to your reply. Thank you so much!
The text was updated successfully, but these errors were encountered: