-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
updated description of input features #356
updated description of input features #356
Conversation
|
||
If input to the LLM is just the content of a single dataset column (without any other prefixed or | ||
suffixed text), no `prompt` template should be provided and the `name` of the feature must | ||
correspond to a column in the input dataset. See the following example. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sanjaydasgupta likely, I would say here "and the name
of the feature must correspond to that specific database column".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took the liberty of changing your database
to dataset
for the sake of consistency throughout the document
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sanjaydasgupta Thank you for doing this. I think your direction is correct in general (I just left a few clarifying suggestions). Thank you.
Thanks @alexsherstinsky. I will work with your suggestions during the weekend. |
Hi @alexsherstinsky, please let me know if you think the new description is too wordy. |
|
||
There are a couple of things to note here: | ||
- the prompt `template` contains named placeholders (`context` and `question`) for content | ||
from the dataset's columns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sanjaydasgupta Just to note, these values can also come from a dictionary. For example, we can use this prompt template and format()
it with those dictionary name-value pairs. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't sure if you wanted any change here. Please clarify.
Hi @alexsherstinsky, can you please review this and this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sanjaydasgupta This LGTM! Thank you very much for making these clarifications in the documentation. I would like to please get a glance-over by @arnavgarg1 before merging. Thank you very much!
If it is intended for the input to the LLM to be just the content of a single dataset column (without | ||
any other prefixed or suffixed text), no `prompt` template should be provided and the `name` of the | ||
feature must correspond to that specific dataset column. See the following example. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can also mention that it is still possible to wrap the single column with a static prompt template if desired.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see my last comment below
type: text | ||
``` | ||
|
||
There are a couple of things to note here: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: note in this example
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see my last comment below
Also note that a prompt template can be used even with a single dataset column. | ||
But that is necessary only when static text is required to be prefixed and/or | ||
suffixed to the content of the dataset column. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can move this up into the single column dataset section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see my last comment below
- the `name` of the `input_feature` (`prompt` here) is immaterial; it is not the name | ||
of a dataset column as in the previous example. This example uses `prompt` just to | ||
emphasize that the formatted output obtained by applying the prompt template to the | ||
dataset columns is the input. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While true, I think this may be confusing to readers/first time users. What if perhaps we actually just use question
as the column name but then say it is a placeholder and will be substituted by the value from the prompt template? This just reduces scope for unintended user errors!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After making the extensive changes referred in my last comment below I felt that prompt
was sounding ok. However, if you still think question
will be better, please let me know and I will change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making this part of the docs clearer! I just left a few minor comments for consideration
Hi @arnavgarg1, thank you very much for your very insightful comments. I believe documentation should be widely reviewed, and having another pair of eyes was truly helpful. Since this was a particularly troublesome section of the documentation, I though (after reading all of your comments) that it would be best to add one more yaml example. So I have added one more case (one dataset column + template). That enabled all of your structural comments to be handled in a much better way, and should give readers a much better understanding of the possible variations. I request you and @alexsherstinsky to please review my changes critically once more and let me know if anything has fallen between the cracks. |
Translate into French: | ||
{english_input} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love it! Can we update it to add this?
prompt:
template: |
Translate into French
Input: {english_input}
Translation:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love the clarity and the extra example YAML helps bring it home. Thanks for taking on one of the hardest parts of our documentation and simplifying it! I left one final comment, but approving since this is ready to go after that minor change.
Thanks for your contribution @sanjaydasgupta
@arnavgarg1 @alexsherstinsky changes completed |
Thanks! Will defer to Alex to do a final pass before merging |
### Single Dataset Column with Additional Text | ||
|
||
If the input to the LLM must be created by prefixing and/or suffixing some static text | ||
to the content of one dataset column, then a `prompt` `template` should be provided to specify how |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sanjaydasgupta Should be provided or must be provided? Sorry for seeking this level of precision -- I feel that it is important (we have come a long way to make it this good, might as well make it beyond reproach!). 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, will change and let you know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sanjaydasgupta Other than my minor comment ("should" -> "must"), I have nothing to add -- looks great. Before we merge, let us just make sure that everything, including links render properly (mkdocs
) -- and then please merge, or let me know, and I will in the morning Pacific Time. Thanks a lot for this important documentation change!
I'm not sure how to complete the |
No worries, @sanjaydasgupta -- there is a README file which describes how to launch |
Hi @alexsherstinsky, all complete now. I have checked using |
No description provided.