Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updated description of input features #356

Merged
merged 7 commits into from
Apr 2, 2024

Conversation

sanjaydasgupta
Copy link
Contributor

No description provided.


If input to the LLM is just the content of a single dataset column (without any other prefixed or
suffixed text), no `prompt` template should be provided and the `name` of the feature must
correspond to a column in the input dataset. See the following example.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sanjaydasgupta likely, I would say here "and the name of the feature must correspond to that specific database column".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took the liberty of changing your database to dataset for the sake of consistency throughout the document

Copy link
Collaborator

@alexsherstinsky alexsherstinsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sanjaydasgupta Thank you for doing this. I think your direction is correct in general (I just left a few clarifying suggestions). Thank you.

@sanjaydasgupta
Copy link
Contributor Author

Thanks @alexsherstinsky. I will work with your suggestions during the weekend.

@sanjaydasgupta
Copy link
Contributor Author

Hi @alexsherstinsky, please let me know if you think the new description is too wordy.


There are a couple of things to note here:
- the prompt `template` contains named placeholders (`context` and `question`) for content
from the dataset's columns
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sanjaydasgupta Just to note, these values can also come from a dictionary. For example, we can use this prompt template and format() it with those dictionary name-value pairs. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure if you wanted any change here. Please clarify.

@sanjaydasgupta
Copy link
Contributor Author

Hi @alexsherstinsky, can you please review this and this.

Copy link
Collaborator

@alexsherstinsky alexsherstinsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sanjaydasgupta This LGTM! Thank you very much for making these clarifications in the documentation. I would like to please get a glance-over by @arnavgarg1 before merging. Thank you very much!

Comment on lines +97 to +99
If it is intended for the input to the LLM to be just the content of a single dataset column (without
any other prefixed or suffixed text), no `prompt` template should be provided and the `name` of the
feature must correspond to that specific dataset column. See the following example.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can also mention that it is still possible to wrap the single column with a static prompt template if desired.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my last comment below

type: text
```

There are a couple of things to note here:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: note in this example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my last comment below

Comment on lines 147 to 149
Also note that a prompt template can be used even with a single dataset column.
But that is necessary only when static text is required to be prefixed and/or
suffixed to the content of the dataset column.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can move this up into the single column dataset section?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my last comment below

Comment on lines 142 to 145
- the `name` of the `input_feature` (`prompt` here) is immaterial; it is not the name
of a dataset column as in the previous example. This example uses `prompt` just to
emphasize that the formatted output obtained by applying the prompt template to the
dataset columns is the input.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While true, I think this may be confusing to readers/first time users. What if perhaps we actually just use question as the column name but then say it is a placeholder and will be substituted by the value from the prompt template? This just reduces scope for unintended user errors!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After making the extensive changes referred in my last comment below I felt that prompt was sounding ok. However, if you still think question will be better, please let me know and I will change it.

Copy link
Contributor

@arnavgarg1 arnavgarg1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making this part of the docs clearer! I just left a few minor comments for consideration

@sanjaydasgupta
Copy link
Contributor Author

Hi @arnavgarg1, thank you very much for your very insightful comments. I believe documentation should be widely reviewed, and having another pair of eyes was truly helpful.

Since this was a particularly troublesome section of the documentation, I though (after reading all of your comments) that it would be best to add one more yaml example. So I have added one more case (one dataset column + template). That enabled all of your structural comments to be handled in a much better way, and should give readers a much better understanding of the possible variations.

I request you and @alexsherstinsky to please review my changes critically once more and let me know if anything has fallen between the cracks.

Comment on lines 121 to 123
Translate into French:
{english_input}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it! Can we update it to add this?

prompt:
  template: |
    Translate into French

    Input: {english_input}
    Translation: 

Copy link
Contributor

@arnavgarg1 arnavgarg1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love the clarity and the extra example YAML helps bring it home. Thanks for taking on one of the hardest parts of our documentation and simplifying it! I left one final comment, but approving since this is ready to go after that minor change.

Thanks for your contribution @sanjaydasgupta

@sanjaydasgupta
Copy link
Contributor Author

@arnavgarg1 @alexsherstinsky changes completed

@arnavgarg1
Copy link
Contributor

Thanks! Will defer to Alex to do a final pass before merging

### Single Dataset Column with Additional Text

If the input to the LLM must be created by prefixing and/or suffixing some static text
to the content of one dataset column, then a `prompt` `template` should be provided to specify how
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sanjaydasgupta Should be provided or must be provided? Sorry for seeking this level of precision -- I feel that it is important (we have come a long way to make it this good, might as well make it beyond reproach!). 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, will change and let you know.

Copy link
Collaborator

@alexsherstinsky alexsherstinsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sanjaydasgupta Other than my minor comment ("should" -> "must"), I have nothing to add -- looks great. Before we merge, let us just make sure that everything, including links render properly (mkdocs) -- and then please merge, or let me know, and I will in the morning Pacific Time. Thanks a lot for this important documentation change!

@sanjaydasgupta
Copy link
Contributor Author

I'm not sure how to complete the mkdocs part, but will read up. It will likely be late morning PT when I finish. Thanks for your help!

@alexsherstinsky
Copy link
Collaborator

I'm not sure how to complete the mkdocs part, but will read up. It will likely be late morning PT when I finish. Thanks for your help!

No worries, @sanjaydasgupta -- there is a README file which describes how to launch mkdocs -- you basicaly pip install ludwig and then run mkdocs and connect to localhost on a port it says, then navigate to your pages and make sure that things look good to you. Thank you!

@sanjaydasgupta
Copy link
Contributor Author

I'm not sure how to complete the mkdocs part, but will read up. It will likely be late morning PT when I finish. Thanks for your help!

No worries, @sanjaydasgupta -- there is a README file which describes how to launch mkdocs -- you basicaly pip install ludwig and then run mkdocs and connect to localhost on a port it says, then navigate to your pages and make sure that things look good to you. Thank you!

Hi @alexsherstinsky, all complete now. I have checked using mkdocs too, and the text displayed has all the changes we made. The index in the right margin is also linking correctly to the three subheadings. Thanks for your help.

@alexsherstinsky alexsherstinsky merged commit fd21ef6 into ludwig-ai:master Apr 2, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants