Internal and external handling of rounded and censored variates & data #50

pglpm · 2024-08-05T08:13:14Z

Rounded data – sometimes technically called grouped – can lead to artefacts if they're treated as continuous; see for example the studies in https://doi.org/10.1214/ss/1177012601, https://doi.org/10.1214/aos/1176348396, and in the references cited there. This is especially true for Bayesian nonparametric methods: owing to rounding, multiple datapoints can end up having identical values, and the nonparametric inference would conclude that there must be a concentration of probability – delta-distributions – at such values.

The present software can handle rounded data properly, so no such artefacts appear.

There is, however, a difference in the way such variates must be handled in drawing inferences about new points or subjects.

Although the data used for learning are rounded, the values of new points will not be rounded.
The values of new points will be rounded, just like the data used for learning.

In case 1. we'd have two options: ("round") round the precise value, in the same way as the data used for learning, and use the rounded value for the inference; ("keep") use the precise value for the inference. Option ("keep") can in some situations lead to improved inferences.

Both options could be implemented in the software. But for the moment we only use option ("round"). In future development we could give the possibility of using option ("keep"). This requires some thinking on how to implement it in an efficient way, in functions like samplesFdistribution() and mutualinfo().

Censored data are a special case of this, where the grouping only happens at the boundaries of the variate's domain. The same considerations and options apply. The software for the moment uses option ("round") for these too.

The text was updated successfully, but these errors were encountered:

pglpm added enhancement New feature or request invalid This doesn't seem right labels Aug 5, 2024

pglpm self-assigned this Aug 5, 2024

pglpm removed the invalid This doesn't seem right label Aug 5, 2024

choisant mentioned this issue Aug 9, 2024

Eliminate L type variate and verbose buildmetadata() #52

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Internal and external handling of rounded and censored variates & data #50

Internal and external handling of rounded and censored variates & data #50

pglpm commented Aug 5, 2024 •

edited

Loading

Internal and external handling of rounded and censored variates & data #50

Internal and external handling of rounded and censored variates & data #50

Comments

pglpm commented Aug 5, 2024 • edited Loading

pglpm commented Aug 5, 2024 •

edited

Loading