-
-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How do I only pull messages matching a label? #236
Comments
Not at the moment, but something based on changing query: https://github.com/gauteh/lieer/blob/master/lieer/remote.py#L85 and probably the query here https://github.com/gauteh/lieer/blob/master/lieer/gmailieer.py#L333 .. but there's a danger that gmailieer somewhere will think that all those messages that don't match the query are deleted locally. Maybe that won't be an issue. |
Correct --- I would only be pulling those emails that match the |
I need this badly too. I'm trying to sync a gmail account which has something like 18 years of email (~1.5m messages) and it is taking multiple days to do the initial pull. This problem is made worse by the fact that I ran into the same symptoms as in #63 (probably due to #229 and #277), so I have 1.5m messages many of which have the |
The difficulty is to make a remote query (remote.py) which is stable, that is probably possible but requires testing. Any changes to this query will cause an initialized repository to become massively out of sync. |
Thanks for the quick reply! Why would it cause a repo to become out of sync? assuming that this kind of filtered pull would automatically not touch any messages or labels of messages which were not retrieved by the sync. |
In case someone changes the query. Or if you make a query that is since: last month, that is always changing. Maybe it works, but would be good to test. |
Probably my understanding of how lieer works is woefully inadequate, so please pardon the stupid questions. Why would it be a problem if the query changed or returned different results? I would expect that to only change which emails get pulled down and their local labels updated, without touching ones which are already there. |
That would probably result in deleting the ones that are already there. Plus you need to do a full sync to be sure, otherwise you need to replicate the behavior through the incremental sync. And if you change the query in more radical ways this could result in a huge number of emails to require deleting or syncing. |
@gauteh commented on September 22, 2024 5:21 PM:
That's why I wrote this:
Surely it would be possible to avoid any location deletions when only doing a partial pull?
By full sync you mean a full sync of just the messages returned by the label filter, right? Couldn't this be enforced when pulling with a label filter active?
Why would they require deleting? As per above, I wouldn't want a single deletion to ever happen in this filtered pull mode.
I don't see a problem with that. If I want to pull all messages matching label A, and then all matching label B, it should be up to me to assess how many messages are associated with both and decide whether syncing just that label makes sense. That's precisely the flexibility I need. (Actually not just with labels but ideally general search filters, but labels would be a great start.) |
Yes, but then partial pull no longer matches a full sync. It should always match, otherwise the behavior will be unpredictable to the user, and it is also an assumption in lieer so there may be other side-effects (what if you add an label on a message which is no longer synced, but exists locally).
No. If you have a query with a limit on the number of messages or all the messages in the last month you have to implement that logic in lieer: that means that you have to implement any logic in lieer which you allow to be in the query, and if it mismatches things are going to be out of sync. This makes it more difficult to deal with any changes back and forth. Maybe it is possible to do (or avoid), but I'm not sure how.
Then you do not match partial sync and full sync, and you are able to modify messages in notmuch that are not in your query. Expect weird side-effects or infinite-sync-cycles. You would have to have another index of messages which are now actively synced, but for that I think you need a full-sync.
Doing full syncs are a huge chore and threshold to get started with lieer (as you have noticed here). I'm just saying that you should expect to have to do a full sync whenever you tweak your query. By the way, I would also like this feature, syncing only the last month or so would be useful. It would also be helpful to sync and index everything and then be able to delete most of the messages on disk (so that they can be searched with notmuch, but openend in gmail). |
@gauteh commented on September 22, 2024 6:34 PM:
Sorry, I don't follow. What wouldn't match, and what would be unpredictable? If the user chooses to pull only mails matching a label, then they know that only mails matching that label will be up to date, and others may be out of date. That doesn't strike me as problematic. Is there some other problem I'm missing?
What do you mean by "no longer synced" here? If you mean adding a label remotely on gmail to a mail which is not pulled because it doesn't match the label filter for the pull, then it would simply remain out of date until the filter is relaxed. As per above I don't see an issue with that.
Why? Isn't the filtering simply in the request to the gmail API? i.e. "give me all messages matching label X". Perhaps I should check my understanding - when you refer to "query", I'm assuming you're talking about one/some of the queries which lieer sends to the gmail API to retrieve the remote mails - right?
I don't follow the point here :-( Feels like maybe I'm missing something fundamental about how lieer works.
By "partial sync" do you mean the partial sync proposed by this feature request, or the incremental sync performed by default when the Either way, what doesn't match and why does this matter? Sorry if I'm asking a load of stupid questions, but something's just not clicking in my brain so I think I'm missing the crux of your point!
I don't get why that would need to happen. When the query is limited to only emails with a remote label, then surely any other emails already locally should just be not touched at all? It doesn't matter whether they're modified locally or remotely; either way lieer wouldn't look at them at all.
I'm hopelessly lost by this but hopefully I've written enough that you can pinpoint where my understanding is failing me 😅
I still don't understand why this would be necessary. BTW, when you say "tweak your query", are you imagining that the query would be configured in the config? For the OP's use case this would make sense, but for my use case of trying to incrementally grab the most important (active, non-historical) areas of my Gmail account, I would more likely want some ephemeral filters, e.g.
to temporarily restrict the
You mean a query filter which could also be based on dates rather than just labels? Yes that would be great.
Cool idea! When you say "opened in gmail", how would that work from a UX perspective? |
Maybe it would be more useful to try and play around with the code to understand it, if you change the above reference line in remote.py you should be able to achieve what you want -- and possibly start to see some of the interesting side-effects :) it might mess up some of your labels though.
|
Having got to know lieer a bit better, I think I'm understanding the above better now:
So the API doesn't support filtering of partial sync via a query, therefore that filtering would have to be done separately, which might not even be possible via the API. E.g. if you only wanted to synchronize only messages matching a label
This seems more awkward than the per-mail Furthermore, if you change the query used for syncing, then as noted in previous comments above:
I hope I got that right but corrections very welcome of course :) |
I want to use lieer to sync only those messages that have already been filtered by Gmail to a particular label. E.g., I have a
git
label which is the only group of emails I'm interested in synching with lieer. Is this feasible?The text was updated successfully, but these errors were encountered: