Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about the training data #9

Closed
Trumpet63 opened this issue May 2, 2023 · 7 comments
Closed

Question about the training data #9

Trumpet63 opened this issue May 2, 2023 · 7 comments

Comments

@Trumpet63
Copy link

I'm going to try really hard not to be the fun-police.

When you say "Thank all the Charters / Mappers in the community. It's you who endowed MuG Diffusion with intelligence.", the worst case I can imagine is that you just downloaded several thousand Stepmania sm's and mp3's and used that to train the model.

Could I trouble you to explain more specifically what data you used, where you got it from, and under what permissions you used them?

@Keytoyze
Copy link
Owner

Keytoyze commented May 3, 2023

I downloaded around 30k charts from osu and malody (I would publish the list this week). As for the permission, it's impossible to request so many mappers. To be honest, this problem is very complicated because there is no explicit regulation about AI training dataset currently. I think the trained model weights and AI-created charts are in the public domain and not owned by myself.

@Trumpet63
Copy link
Author

You're correct. The legal system (regardless of which coutry) is still trying to figure out the implications of modern AI. I think it's great that you're trying to do a good thing for the community by releasing the weights in the public domain. I have a lot of respect for the work you've put in.

That being said, my personal feeling is this project is... questionable. I suppose it's up to the community at large to decide how they feel about it, and I'm not a stepartist, but here's my two cents:

PC rhythm games are already notorious for using pirated music. Then, in games like Osu, permission is given by music artists exclusively to Osu and only under the condition that Osu doesn't charge money for their game. Presumably stepartists are contributing under those same assumptions.

Given that the weights are in the public domain, it's entirely possible that tomorrow someone will release a game, charge money for it, and use this ML model on the backend - which in a way is circumventing the wishes of the authors of the training data.

Maybe if you wanted to prevent that specific possibility you could modify the license to be non-commercial? Even so I would still say the whole project is iffy. Totally up to you.

@Keytoyze
Copy link
Owner

Keytoyze commented May 3, 2023

Thank you very much for your understanding. I agree with your opinion about the license problem and I have modified README.MD to declare that model weights are non-commercial (3612b74).

Additionally, my motivation for this project is to explore whether the machine can understand music and meet my curiosity, rather than earning something or violating charters' rights. I am glad to reduce the negative effect brought by this project, but I think even if I didn't create this project, there must be someone in the future to train a similar AI since AIGC is the future trend (To the best of my knowledge, at lease five persons trying to create charting AI recently). Feel free to give me more advice to make things better.

@Trumpet63
Copy link
Author

I don't have any more comments for now. This was a productive conversation, thank you :)

@Keytoyze Keytoyze pinned this issue May 4, 2023
@Keytoyze
Copy link
Owner

Keytoyze commented Jun 5, 2023

I downloaded around 30k charts from osu and malody (I would publish the list this week). As for the permission, it's impossible to request so many mappers. To be honest, this problem is very complicated because there is no explicit regulation about AI training dataset currently. I think the trained model weights and AI-created charts are in the public domain and not owned by myself.

I published the dataset here, also in the commit c2903d4.

@kevinchang214
Copy link

你好,6keyMalody什么时候支持这个选项?

@Keytoyze
Copy link
Owner

@kevinchang214 我最近非常忙,AI写谱的相关事宜会再过2-3周处理。6K 的时间暂时没有规划。

另外,最好新开一个 issue 来提问,不要在这里回复。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants