-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmark Discussion #1
Comments
We plan to reuse benchmarks from 2023, but feel free to propose new benchmarks or modifications to existing ones. |
From last year, we have modified the specifications (properties) to 3 benchmarks:
|
I have some suggestions:
|
Hello, We would like to propose a cartpole balancing benchmark. A single pole is mounted on a movable cart, with an upright starting position. The controllers goal is to balance the pole in the center of the track. The safety goal of this 4-dimensional problem would be a stable position after some swing-in time. balancing.mp4We will upload the required files shortly. |
Thanks for the suggestions.
I also want to bring up another point: |
I prefer numbers because
Yes, I wanted to have more formats, not fewer. And my main goal is to have some formats supported across all benchmarks so that you do not need a different parser for a different benchmark.
That sounds exciting. I guess I have to wait until the page is online again to see whether the format is clear. Can you provide the converter from JSON to TikZ as a standalone tool? That would generally be interesting and help writing such a JSON writer. |
@schillic I have attached the parser from Matlab figure to json and vice-versa. I hope they are helpful to you. |
I added a pull request for the new navigation benchmark. I also organized the benchmarks into a table in the readme, as I think this might be clearer to see which benchmarks were modified etc. I was also debating on specifying the dimensions to plot for the figures here, but I guess they rather belong to the respective benchmark. Let me know if that is sensible. |
thanks! I have merged the pull request |
As mentioned before, we would like to propose a balancing benchmark. I created a Pull Request containing all files needed for this benchmark. Feel free to comment and/or demand changes for this benchmark to be complete. |
This question has come up from @SophieGruenbacher , and as it seems the repeatability server (and its instructions) still down, @toladnertum can you please answer? In general, the typical repeatability instructions have been to provide an installation script as Dockerfile, and script to reproduce everything. I did not use the repeatability server last year, so not sure its interfacing (with detailed instructions here, we did not typically do automatic table creation in many categories and basically created reachable set plots: https://gitlab.com/goranf/ARCH-COMP/-/tree/master/#repeatability-evaluation-instructions-2021-and-2022 ). "as we are preparing our package for the repeatability evaluation and we have a question about the desired output of the tool. In the docs it’s written that "The executable should be accompanied by instructions on how to run the tool, what OS is used, and how to interpret the result of the tool (whether the result is safe or unsafe)“. So would the following output be what you are expecting?
|
@SophieGruenbacher you can take a look at the README of our repeatability package from last year here. To summarize: The server will run the Docker script
This corresponds to this table:
|
TLDR: We finally got the ARCH submission system back up and running. 🎉 Longer explanation: If you have any questions, don't hesitate to contact me! |
I submitted our NLN package. Let's see what happens. |
Great! How long do you expect the submission to run? |
It has finished and I just published the results. Note that you have to do this manually in "My Submissions". |
Great. We could also move "My Submission" into the general header, I guess that would be clearer then. One can now also download the results folder by clicking on the respective name. |
I cannot log in with the old account. I thought you said that this account would work. Is that expected? |
That was the plan, but unfortunately, we had to clear the database. I think you can just create a new one. |
No description provided.
The text was updated successfully, but these errors were encountered: