Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a set of seed data #1318

Open
mmurto opened this issue Oct 29, 2024 · 9 comments
Open

Create a set of seed data #1318

mmurto opened this issue Oct 29, 2024 · 9 comments
Labels
enhancement New feature or request.

Comments

@mmurto
Copy link
Contributor

mmurto commented Oct 29, 2024

We should have an easily bootstrappable set of seed data that can be used for development and manual testing, and later automated testing. This would provide a good starting point for both backend and UI development to ensure a real (though quite small) database to test the UI, API and migrations against.

The data should include organizations, projects and repositories, and some runs for repositories that include issues, rule violations and vulnerabilities. The data should also include correct permissions in Keycloak for a test user to be able to use the data.

Some possible ways to create and maintain the data:

  1. Create a script that executes API calls against the Docker Compose environment to first get a Keycloak token and then create the hierarchy items and runs for some known repositories. Adjust the rules to produce the needed rule violations against the known repositories.
  2. Create a script that does the above but with ORT Server library functions or SQL.
  3. Add a database dump of a good dataset to the repository that will automatically be included when starting the Docker Compose environment.
@sschuberth sschuberth added the enhancement New feature or request. label Oct 29, 2024
@sschuberth
Copy link
Contributor

Maybe this could later also be made a part of #1319.

@mmurto
Copy link
Contributor Author

mmurto commented Oct 29, 2024

Maybe this could later also be made a part of #1319.

What would be the use-case for that?

@sschuberth
Copy link
Contributor

sschuberth commented Oct 29, 2024

What would be the use-case for that?

I don't really get the question. What I was thinking about aloud was to add e.g. a create-test-data sub-command to the planned CLI that pretty much does what your point 2. describes above.

@mmurto
Copy link
Contributor Author

mmurto commented Oct 29, 2024

What would be the use-case for that?

I don't really get the question. What I was thinking about aloud was to add e.g. a create-test-data sub-command to the planned CLI that pretty much does what your point 2. describes above.

I meant the use-case for having it in the CLI, which I guess basically asks is that when will an end-user want to seed an instance. IMO the main (maybe only) users for seed data are developers, and depending on the format, seed data can be relatively large in size, so I'm not sure if it makes sense to ship it to the CI runners.

@sschuberth
Copy link
Contributor

I meant the use-case for having it in the CLI

It's not really about a "use-case", but for our convenience: The CLI already has the build infrastructure set up to consume ORT Server artifact for programmatic use. The same infrastructure that we'd need a tool (or multiple tools) to create / seed test data.

will an end-user want to seed an instance.

Probably not, but I don't think it matters much to "hide" such capabilities in an end-user CLI. But maybe it does. Like I said, I was just think out aloud.

seed data can be relatively large in size

But wouldn't our tool just implicitly create the (large parts of) seed data by creating runs, and not really ship with the data?

@mmurto
Copy link
Contributor Author

mmurto commented Oct 29, 2024

seed data can be relatively large in size

But wouldn't our tool just implicitly create the (large parts of) seed data by creating runs, and not really ship with the data?

Depends on the approach, but agreed, if it's done through API calls rather than stored data like in approach 3, then the amount of data is not a lot. I'm not very familiar with Kotlin projects, but I'd guess even if it's wrapped in the CLI, it would be easy to call like git clone && docker compose up && ./gradlew cli seed or something like that?

@sschuberth
Copy link
Contributor

./gradlew cli seed or something like that?

Something like that. Instead of involving Gradle, the CLI would be called like ort-server seed.

@mmurto
Copy link
Contributor Author

mmurto commented Oct 29, 2024

./gradlew cli seed or something like that?

Something like that. Instead of involving Gradle, the CLI would be called like ort-server seed.

IMO it would be great for the seed command to work without installing/adding anything to path with the whatever is the current checked out revision, so I think involving Gradle would be good here? As said, not too familiar with Kotlin projects.

@sschuberth
Copy link
Contributor

so I think involving Gradle would be good here?

Yes, implementing this via Gradle tasks is also possible, and probably preferable than putting it into a stand-alone CLI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request.
Projects
None yet
Development

No branches or pull requests

2 participants