-
Notifications
You must be signed in to change notification settings - Fork 826
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support updating template processors #1652
base: main
Are you sure you want to change the base?
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
…["$0", "</s>"]` does not !
11533c5
to
4bb595b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Self-approving here as it is your PR @ArthurZucker, waiting for your review before merging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! WOuld just add python tests! 😉
let's check set_item and also that get_item_ is mutable
@@ -14,7 +14,7 @@ serde = { version = "1.0", features = ["rc", "derive"] } | |||
serde_json = "1.0" | |||
libc = "0.2" | |||
env_logger = "0.11" | |||
pyo3 = { version = "0.23", features = ["abi3", "abi3-py39"] } | |||
pyo3 = { version = "0.23", features = ["abi3", "abi3-py39", "py-clone"] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for this we probably need to rebase!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's revert these as well!
PyNormalizerTypeWrapper::Single(ref inner) => match &*inner | ||
.as_ref() | ||
.read() | ||
.map_err(|_| PyException::new_err("rwlock is poisoned"))? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.map_err(|_| PyException::new_err("rwlock is poisoned"))? | |
.map_err(|_| PyException::new_err("rwlock was poisoned when trying get subtype of PyNormalizer"))? |
@@ -218,7 +223,7 @@ macro_rules! getter { | |||
($self: ident, $variant: ident, $name: ident) => {{ | |||
let super_ = $self.as_ref(); | |||
if let PyNormalizerTypeWrapper::Single(ref norm) = super_.normalizer { | |||
let wrapper = norm.read().unwrap(); | |||
let wrapper = norm.read().expect("rwlock is poisoned"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here let's be a little bit more helpful
fn __len__(self_: PyRef<'_, Self>) -> usize { | ||
match &self_.as_ref().normalizer { | ||
PyNormalizerTypeWrapper::Sequence(inner) => inner.len(), | ||
PyNormalizerTypeWrapper::Single(_) => 1, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, thanks!
I think the issue you mentioned about length comes from here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it's a Single
? Not sure, it should deser to empty Sequence, but the type is Sequence
with length 1 containing an NKFC
.read() | ||
.map_err(|_| PyException::new_err("rwlock is poisoned"))? | ||
.clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah so we get a clone of the rust object with wrtie lock nice
pub(crate) enum PyNormalizerTypeWrapper { | ||
Sequence(Vec<Arc<RwLock<PyNormalizerWrapper>>>), | ||
Single(Arc<RwLock<PyNormalizerWrapper>>), | ||
} | ||
|
||
impl<'de> Deserialize<'de> for PyNormalizerTypeWrapper { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment as to why we need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(We have to implement this to make sure the it's a Sequence(Sequence) and not Single(Sequence) wirite?
.map_err(|_| PyException::new_err("rwlock is poisoned"))? | ||
.normalize(normalized), | ||
PyNormalizerTypeWrapper::Sequence(inner) => inner.iter().try_for_each(|n| { | ||
n.read() | ||
.map_err(|_| PyException::new_err("rwlock is poisoned"))? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same let's have a more helpful error. I think in totenizers we create a type for the PyException, can be "IndexingError" or something
|
||
// Setter for `single` | ||
pub fn set_single(&mut self, single: Template) { | ||
println!("Setting single to: {:?}", single); // Debugging output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to remove!
normalizers[1] = Strip() | ||
assert normalizers[1].__class__ == Strip | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's check set_item and also that get_item_ is mutabkle
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It isn't atm, need to impl getters and setters for each variant of the Normalizer/Pretok/Processor. Will do in a subsequent PR.
Goal: