You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Rounded data – sometimes technically called grouped – can lead to artefacts if they're treated as continuous; see for example the studies in https://doi.org/10.1214/ss/1177012601, https://doi.org/10.1214/aos/1176348396, and in the references cited there. This is especially true for Bayesian nonparametric methods: owing to rounding, multiple datapoints can end up having identical values, and the nonparametric inference would conclude that there must be a concentration of probability – delta-distributions – at such values.
The present software can handle rounded data properly, so no such artefacts appear.
There is, however, a difference in the way such variates must be handled in drawing inferences about new points or subjects.
Although the data used for learning are rounded, the values of new points will not be rounded.
The values of new points will be rounded, just like the data used for learning.
In case 1. we'd have two options: ("round") round the precise value, in the same way as the data used for learning, and use the rounded value for the inference; ("keep") use the precise value for the inference. Option ("keep") can in some situations lead to improved inferences.
Both options could be implemented in the software. But for the moment we only use option ("round"). In future development we could give the possibility of using option ("keep"). This requires some thinking on how to implement it in an efficient way, in functions like samplesFdistribution() and mutualinfo().
Censored data are a special case of this, where the grouping only happens at the boundaries of the variate's domain. The same considerations and options apply. The software for the moment uses option ("round") for these too.
The text was updated successfully, but these errors were encountered:
Rounded data – sometimes technically called grouped – can lead to artefacts if they're treated as continuous; see for example the studies in https://doi.org/10.1214/ss/1177012601, https://doi.org/10.1214/aos/1176348396, and in the references cited there. This is especially true for Bayesian nonparametric methods: owing to rounding, multiple datapoints can end up having identical values, and the nonparametric inference would conclude that there must be a concentration of probability – delta-distributions – at such values.
The present software can handle rounded data properly, so no such artefacts appear.
There is, however, a difference in the way such variates must be handled in drawing inferences about new points or subjects.
In case 1. we'd have two options: ("round") round the precise value, in the same way as the data used for learning, and use the rounded value for the inference; ("keep") use the precise value for the inference. Option ("keep") can in some situations lead to improved inferences.
Both options could be implemented in the software. But for the moment we only use option ("round"). In future development we could give the possibility of using option ("keep"). This requires some thinking on how to implement it in an efficient way, in functions like
samplesFdistribution()
andmutualinfo()
.Censored data are a special case of this, where the grouping only happens at the boundaries of the variate's domain. The same considerations and options apply. The software for the moment uses option ("round") for these too.
The text was updated successfully, but these errors were encountered: