Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor the llama.cpp interface #1298

Merged
merged 10 commits into from
Dec 16, 2024
Merged

Conversation

rlouf
Copy link
Member

@rlouf rlouf commented Nov 29, 2024

In this PR I refactor the interface to the models provided by llama-cpp-python for the release of Outlines' v1.0. A few notable changes:

  • I removed the lora method as it has been deprecated in llama-cpp-python
  • We follow the init API of the library, with the default requiring a model path, and a from_pretrained classmethod that takes a repo name and filename.
  • We don't try to normalize the arguments either to initialization or to text generation. Users can pass the same arguments as their would pass to the original model.

TODO

  • Update the tests
  • See if the class can inherit from Llama would make llama-cpp-python a hard dependency
  • Update the LlamaCpp documentation
  • Update the OpenAI documentation
  • Update the Gemini documentation
  • Update the Anthropic documentation

@rlouf rlouf added this to the 1.0 milestone Nov 29, 2024
@rlouf rlouf added enhancement llama.cpp Related to the `llama.cpp` integration labels Nov 29, 2024
@rlouf rlouf force-pushed the refactor-llamacpp branch 5 times, most recently from cd70a83 to c3951ae Compare November 29, 2024 21:34
@rlouf rlouf requested a review from torymur November 30, 2024 11:31
@rlouf rlouf linked an issue Nov 30, 2024 that may be closed by this pull request
@rlouf rlouf force-pushed the refactor-llamacpp branch 2 times, most recently from dec4bc0 to ea4454c Compare December 2, 2024 12:58
@rlouf
Copy link
Member Author

rlouf commented Dec 2, 2024

@torymur this is ready for review

@rlouf rlouf marked this pull request as ready for review December 2, 2024 17:31
regex_string, model.tokenizer
)
else:
self.output_description = None
Copy link
Contributor

@torymur torymur Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You only manipulating with self.output_description and don't assign self.model = model, which would fully redefine class attrs in __init__ and could be okay, but for such partial redefinition in @dataclass case I see people suggest __post_init__, which will be called after __init__? But I'm not sure here.

Overall, took me a second to understand this fork: APIModel <-> output type, Model <-> processor. I can't put my finger on it with a proper design idea, but the fact that we have two good matching pairs suggests a good potential for separation? Or not, if uniting is what's its for.

There is also a simplification of code, but it doesn't help to get the idea of what's happening (your example is much clearer on why), so I'm going to leaving it here just in case:

if isinstance(model, get_args(APIModel)) or output_type is None:
    self.output_description = output_type
else:
    regex_string = output_type.to_regex()
    self.output_description = RegexLogitsProcessor(regex_string, model.tokenizer)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree, maintain a single interface for API-based models and local models is annoying.

One possible way out of this design issue is to have models build and hold the LogitsProcessor but it does feel unintuitive. The other solution is to add a Generator building function that dispatches to an APIGenerator and a LogitsProcessorGenerator, but I feel the duplication of code is not necessarily justified. It might be more a case of documenting the behavior better?

I'll see if I can come up with something better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other solution is to add a Generator building function that dispatches to an APIGenerator and a LogitsProcessorGenerator, but I feel the duplication of code is not necessarily justified.

Actually, this might be enough, I'd vote for duplications like this to make design more self-descriptive rather than bringing it on a higher level of clarifications in docs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the change. It actually does not look like duplication so much, and makes the code much easier to follow.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was worth it, crystal clear now 🔥

Copy link
Contributor

@torymur torymur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for delay!
Nice to see The Generator in docs & in the code 🔥 Left some minor comments, but overall LGTM! 🚀

@rlouf rlouf force-pushed the refactor-llamacpp branch from 1eca31e to d6d56dc Compare December 16, 2024 12:38
@rlouf rlouf requested a review from torymur December 16, 2024 12:45
@rlouf rlouf merged commit b352c62 into dottxt-ai:v1.0 Dec 16, 2024
4 of 5 checks passed
@rlouf rlouf deleted the refactor-llamacpp branch December 16, 2024 15:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement llama.cpp Related to the `llama.cpp` integration
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update the llama.cpp integration
2 participants