Which hardware/software specifications should we support? #62

choisant · 2024-08-29T07:12:06Z

Right now the code is really only tested for running in Linux on a workstation with an ok amount of memory and CPU cores. What kind of computing setup do potential users have, and what should we support?

Types of users + environments

Do we know if inferno can adapt to these cases?

#63
High budget, low tech user? Do we have users that have powerful machines, but might not even have Linux installed?
Low budget, low tech user: ability to at least test the software on a standard laptop running Windows.
Low budget, high tech user: Getting the most amount of computer per dollar. This is basically our current developing environment.

Potential use cases

Few variates, many datapoints
Many variates, few datapoints
Few variates, few datapoints: a small laptop should suffice
BIG DATA: the more CPU cores the better, but will we run out of memory?

Wishlist for improvements

Check and create an Issue for each case if it is deemed worth spending time on at some point.

Ability to continue MCMC calculations if the calculations are interupted. Low priority for now.
Support Windows (this might already be the case).

pglpm · 2024-08-29T08:02:03Z

Personally I don't see the software as implying one or another use case. They depend more on the size of the problem – number of datapoints and number of variates – than the software itself? A problem with 30 datapoints and 10 variates can be solved on a laptop. One with 5000 datapoints and 100 variates needs a workstation. It's a matter of how much RAM and how many and how fast CPUs the user has?

One counter-argument to what I just wrote is that with many datapoints it might be necessary, just as an example, to use different approaches that use disk memory rather than RAM. However, the fact that the software relies on the Nimble package means that we have to load all data in RAM anyway.

choisant · 2024-08-29T08:27:33Z

The software can be made to support some or all of the set of all possible use cases. The modifiable parallel feature is an example of one of inferno's software features which enables support for utilising powerful/high budget hardware. In the case of a run producing a huge mcoutput/the process running out of memory, we might want to change the solution for storage of output files to something better than the native .rds files. If it's just a small output we might not even want to automatically generate all the output files, but enable a faster in-memory approach between learn() and Pr() for instance. It's these kind of things I'm trying to gather information about here.
This is all related to making the software work smoothly for a lot of different people's projects, which would increase it's value to the scientific community. In HEP we have to think about this all the time, as we are in the extreme end of use cases.

pglpm · 2024-08-29T08:36:50Z

I think right now the software already supports all those use cases. I know because I've used it both on a small Windows laptop and on Sigma2's unix HPC centre. The way the functions are used is exactly the same. Although things may change in the future, of course.

I checked into the storage question. It's good to save the main object, 'learnt.rds', which the inferences depend on, on disk. Because the user may need to close an R session and continue with new inferences in a later one. Regarding the kind of file, I checked other possibilities like Parquet, Arrow, NetCDF, and similar. Some of them are not appropriate because they only work well with tabular data (and the learnt object is not tabular). And the rds format turned out to give a very good compression. The fact that it's R-specific in not really a problem, because it is used by R functions in any case.

Of course the user can export plots and numerical results in any way they please. You mean we should provide some sort of graphical or numerical export functions? Isn't it enough if the user checks the basic R commands for this?

choisant · 2024-08-29T10:46:35Z

Many people are unfamiliar with R. To many, an export_to_csv function would be very attractive, so they could open their numbers in Excel/using python. The plotting function should definitely have an export-to-pdf/png option. I can't predict all possible cases, that's why we want to know what software/hardware environment people are working in already when we talk to them.

pglpm · 2024-08-29T10:59:06Z

Sure. In many cases we can also simply refer to base R functions (no need to reinvent the wheel); for example there's write.csv(). We can add a filetype argument or similar to tplot() and related functions, so that the user can directly save the plot as pdf/svg/png etc.

choisant added the enhancement New feature or request label Aug 29, 2024

choisant assigned choisant and h587916 Aug 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Which hardware/software specifications should we support? #62

Which hardware/software specifications should we support? #62

choisant commented Aug 29, 2024 •

edited by pglpm

Loading

pglpm commented Aug 29, 2024 •

edited

Loading

choisant commented Aug 29, 2024

pglpm commented Aug 29, 2024 •

edited

Loading

choisant commented Aug 29, 2024

pglpm commented Aug 29, 2024

Which hardware/software specifications should we support? #62

Which hardware/software specifications should we support? #62

Comments

choisant commented Aug 29, 2024 • edited by pglpm Loading

Types of users + environments

Potential use cases

Wishlist for improvements

pglpm commented Aug 29, 2024 • edited Loading

choisant commented Aug 29, 2024

pglpm commented Aug 29, 2024 • edited Loading

choisant commented Aug 29, 2024

pglpm commented Aug 29, 2024

choisant commented Aug 29, 2024 •

edited by pglpm

Loading

pglpm commented Aug 29, 2024 •

edited

Loading

pglpm commented Aug 29, 2024 •

edited

Loading