-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Which hardware/software specifications should we support? #62
Comments
Personally I don't see the software as implying one or another use case. They depend more on the size of the problem – number of datapoints and number of variates – than the software itself? A problem with 30 datapoints and 10 variates can be solved on a laptop. One with 5000 datapoints and 100 variates needs a workstation. It's a matter of how much RAM and how many and how fast CPUs the user has? One counter-argument to what I just wrote is that with many datapoints it might be necessary, just as an example, to use different approaches that use disk memory rather than RAM. However, the fact that the software relies on the Nimble package means that we have to load all data in RAM anyway. |
The software can be made to support some or all of the set of all possible use cases. The modifiable parallel feature is an example of one of inferno's software features which enables support for utilising powerful/high budget hardware. In the case of a run producing a huge mcoutput/the process running out of memory, we might want to change the solution for storage of output files to something better than the native .rds files. If it's just a small output we might not even want to automatically generate all the output files, but enable a faster in-memory approach between learn() and Pr() for instance. It's these kind of things I'm trying to gather information about here. |
I think right now the software already supports all those use cases. I know because I've used it both on a small Windows laptop and on Sigma2's unix HPC centre. The way the functions are used is exactly the same. Although things may change in the future, of course. I checked into the storage question. It's good to save the main object, 'learnt.rds', which the inferences depend on, on disk. Because the user may need to close an R session and continue with new inferences in a later one. Regarding the kind of file, I checked other possibilities like Parquet, Arrow, NetCDF, and similar. Some of them are not appropriate because they only work well with tabular data (and the Of course the user can export plots and numerical results in any way they please. You mean we should provide some sort of graphical or numerical export functions? Isn't it enough if the user checks the basic R commands for this? |
Many people are unfamiliar with R. To many, an export_to_csv function would be very attractive, so they could open their numbers in Excel/using python. The plotting function should definitely have an export-to-pdf/png option. I can't predict all possible cases, that's why we want to know what software/hardware environment people are working in already when we talk to them. |
Sure. In many cases we can also simply refer to base R functions (no need to reinvent the wheel); for example there's |
Right now the code is really only tested for running in Linux on a workstation with an ok amount of memory and CPU cores. What kind of computing setup do potential users have, and what should we support?
Types of users + environments
Do we know if inferno can adapt to these cases?
Potential use cases
Wishlist for improvements
Check and create an Issue for each case if it is deemed worth spending time on at some point.
The text was updated successfully, but these errors were encountered: