Making sure credence is properly calibrated and conveyed using hedge words #3464
Replies: 2 comments
-
The big risk of this is creating extremely compelling dis/misinformation. Why would this not converge rapidly on "telling people what they want to hear" as opposed to what "is factual". If you can implement it and show some promising results that would be great, but do you have any theoretical basis for why you think the users will be good at determining factuality without resorting to looking things up on their own? |
Beta Was this translation helpful? Give feedback.
-
To clarify, this isn't being done by end users, it's being done during training. The contributors would be looking things up, doing research. If they can't figure it out, I guess you exclude it from the training set? And anything that is overly subjective would probably be excluded (since the assistant shouldn't be making many subjective claims anyways). Accuracy is one of the overarching goals of the open assistant anyways, and needs to be solved somehow. This idea is just to tack credence onto that accuracy system, so the assistant conveys uncertainty in a natural way. If the accuracy is achieved with something other than human feedback, credence would be trained by connecting to that system instead. |
Beta Was this translation helpful? Give feedback.
-
One of the complaints about ChatGPT is it's overconfidence. ChatGPT is probably better than previous assistants in this regard, but I have an idea for how open assistant might be able do better!
The important bit about credence calibration is that it gives a very large punishment if a high credence claim is incorrect. So even though humans typically prefer confident claims, the assistant still learns to hedge it's bets to avoid the possibility of a large credence penality. (The reward for correct claims is slightly higher for high credences though, so it's still optimal to give high credence to obvious claims.)
(A question is whether we treat the entire response as a single claim, or split it up using NLP (perhaps part of the model in (1)).)
Beta Was this translation helpful? Give feedback.
All reactions