-
-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feat]: Separate the Workspace name from its data directory fields #582
Comments
The Cloud upload code by DXQ could then read the
|
I had a quick look. I see that the project currently uses At the same time, that would be a good time to remove Simultaneously, making sure that when TrainConfig serializes itself back to JSON, it writes the original non-expanded strings. Could be done by remembering those exact values as strings, while storing the expanded (used by OneTrainer at runtime) paths in other TrainConfig fields instead. So the TrainConfig would hold both the str version and the expanded Path version of each dir. Possibly via Having looked at this, I can do it, but don't have the time at the moment. And don't know when I'll have time. Could be days, a week or much more. It will depend on having free time. So if someone wants to implement it, let me know. |
Good idea, but I'd apply it differently. You might want different cache dirs between runs, but usually you want the cache dir shared. What you usually want changed between runs, for example to test multiple parameters, is:
Currently I usually change these 3 things, and often forget one. |
In that case, you'd just set the workspace cache to a static path without
Really good idea to support
Yeah we could make the samples automatically prefix themselves with the I had a bit more thinking about where to place the "variable expander" code. And I think it belongs as a method on the So that class could have a It returns a Path object, because this means we can concatenate the return value with other Path objects to easily create the full paths. The Path concatenation logic also automatically gives us support for path overriding in individual setting fields, because if the concatenated part is an absolute path, it takes over:
Here's the general outline of what needs to be done, whether I do it or someone else beats me to it:
|
I think adding a user configurable placeholder name makes sense. But I don't really understand how this would help cloud training in any way. Can you explain that a bit more? The user would still be able to configure the workspace directory. Adding the expand functionality into the TrainConfig class is the right decision, but I wouldn't use anything more complicated than a simple string replacement:
Also, I'm a bit confused what you mean by expanding One thing I'm not really sure about: How do we design the user interface so it isn't too confusing? I don't want to change the current default directories for the reasons dxqbYD already mentioned. You probably don't want to change the cache directory for every run. And adding a new |
I think any relation to cloud training is a misunderstanding. It doesn't need any placeholder, I understood this as a separate feature. You can find the path adjustments done for cloud training here: https://github.com/dxqbYD/OneTrainer/blob/e42585e8314cf71562764ac3f64222f1b494a4bd/modules/trainer/CloudTrainer.py#L165
|
Yeah, the cloud training can then use the workspace
This is a good example of a cloud issue that can be solved with this. Currently, it hacks the config path to try to convert it to a cloud-usable path. The cloud code could instead use Ensuring clean separation between local user paths and the RunPod server's paths.
Yeah if I have time to do this (which is likely but not until I have less other projects on my plate), I will change it everywhere by looking for all path str usage and Pathlib was added to Python because str paths are such a pain to work with. Pathlib makes every path into first-class objects where we can analyze and modify anything we want via simple fields and calls, like It's a great help when writing code. Full IDE autocompletion, all the calls we need, calls can be chained (like Another benefit of passing them around as Paths is that concatenation is easy, as mentioned: We can just do The third benefit is that it has brilliant methods for walking directory trees, globbing files matching patterns, etc. The fourth benefit is that pathlib has built-in methods for cross-platform path manipulation, which is a huge help when paths are being forwarded from Windows machines to Unix-based servers, for example. On Windows, the Path will be stored as a WindowsPath internally, but we call
Yeah it would, and should not be merged until that's fully verified. I'll be testing meticulously as usual.
If we don't expand it, the files will be written into a relative directory under the current working dir, named That's why expansion of
I think it makes sense and will be easy to understand when applied as follows:
|
Actually, I will keep the string type for now, because it reduces the amount of work. My mind always defaults to Path since it's vastly superior, but that's not necessary for implementing this particular feature. And when it's needed (for other features such as cloud and rsync paths), we can always So then the work will be:
This greatly cuts down the amount of hard work to implement this feature. 👍 |
That would change the behavior between cloud training and local training. Not something a user would or should expect. It should be very clear and intuitive what each setting does.
There is no requirement that the workspace needs to be saved under
No, it can't. Because the workspace could be anywhere.
That's even more confusing. If I want to set the workspace directory to
Properties of the TrainConfig class are directly saved in the json file with their names. Using private (underscore prefixed) names doesn't makes sense for this class. Also, actually storing the expanded path isn't really possible, because you never know when the unexpanded string changes. The structure of the config class is intentionally kept simple to serve as a single contact point between ui/cli code and trainer code. There shouldn't be any assumptions about the use of the properties beyond setting and getting values. The better approach would be a new function that returns the expanded path. Similar things are already done for other values in that class. Honestly, the more I think about this idea, the more I believe it just complicates things without providing any benefit.
|
That is necessary for this scenario:
Yes, and people can change the default to anything they want, of course. A default is just a default.
It can. I described the algorithm in the first post. If the old config uses
Result: Exactly the same path is being used after automatic migration. But as I said, automatic migration of their hardcoded path from existing configs is not necessary and can be skipped.
You wouldn't need to set a name. You would just ignore the name field and let its default "project1" value stay like that. Like when people ignore the 19382 other fields that are visible but not used in OneTrainer unless very specific options are clicked or specific values are entered. This is no different.
Ah okay. Well I wanted to avoid having to call "expand it" in every single consumer of the Normally this would be solved by having a Then SmartPath then implements a setter and getter, and automatic JSON serialization. So when the TrainConfig.workspace_dir field is serialized, it saves the raw (unexpanded) string to the config JSON file. That can be achieved in python via This means there's a single point which handles reading/writing conversion and serialization transparently. This can be implemented even with the existing architecture of TrainConfig. (There's also another technique, with methods marked Now whether all of this is worth doing:
|
Describe your use-case.
What would you like to see as a solution?
{name}
as a placeholder which expands to the Workspace Name value.workspace_dir = config.workspace_dir.replace("{name}", config.workspace_name)
, which could be wrapped in amodule.utils.expand_path(config, config.workspace_dir)
utility function taking the config (to grab the name from and possible future variables) and the string that we need to expand.~/
prefix expanding toPath.home()
(which I think works on Windows too). In fact, there's expanduser which explicitly expands such paths. See example implementation below.{name}
can be used.{name}
placeholders in the backup too would be ideal, to make the backup config portable)./
component, take the last component ofworkspace_dir
asname
, use the rest as directory:<the dir>/{name}
andcache: <the cache dir>/{name}
. This splitting while migrating could be achieved viax = pathlib.Path(oldvalue)
. Then to get the dir, we usestr(x.parent)
and to get the name we usex.name
(name is always a string already).Here is an example cross-platform implementation of path expansion:
Results:
The text was updated successfully, but these errors were encountered: