-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zeno v2 #166
Draft
equals215
wants to merge
249
commits into
main
Choose a base branch
from
dev/v2
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…d type and stats: add draft of package
…list of regex to use for exclusion
* postprocess: corrected some smells * postprocess: renamed some variables and corrected forloop variables * postprocess: postprocessItem args * postprocess: never set the state of the parent before adding a child, this is done via AddChild() method * item: reinforced CheckConsistency method * global: enforcing stricter state and consistency check for items throughout stages in the pipeline * item: corrected CheckConsistency() and made more unit tests * item&finisher: make use of CompleteAndCheck() method on an item to parse the tree before handling further * item: CompleteAndCheck() overlooked return conditions * pre/postprocess: trying to fix the flow of childs * dumper: add a Dump() function to properly dump an Item for further debugging * preprocessor: correct exclusion logic * item.Dedupe: corrected an edge case where a completed child has the same URL as the seed and dedupe was trying to remove the seed * postprocess: correct failed outlink extraction behaviour * Add more detailed pyroscope information * postprocess: add more debug logging to troubleshoot an unknown bug * preprocess: add itemId in panic * postprocess: always postprocess an item EVEN IF ASSETS CAPTURE IS DISABLED * archiver: close spooledBuffer if error happened during body processing * postprocess: close all bodies of an item tree before continuing in the pipeline * archiver: try to write bodies only on disk * add: small memory optimization for URLToString & encodeQuery * chore: upgrade Go version & dependencies * chore: bump warc lib to v.0.8.62 * fix: usage of spooledtempfile lib * chore: bump warc lib to v.0.8.63 * postprocess: defer a closeBodies call on every item that goes through * log: disable log queue full error message when TUI is used * cmd: add no-stderr-log flag * hq.consumer: replace previousBatch check with a reactor duplicate check * pyroscope: bump upload rate from 15s to 5s * fix: add panic for errors in startPipeline, retry indefinitely on HQ start error * fix: not returning when hq.Start fails to init HQ client * fix: typo * fix: HQ Start failure marking init as already done * fix: panic when HQ init fails * add: truthsocial.com preprocessing & post-processing * chore: bump warc lib to v.0.8.64 * add: more truthsocial.com special handling * add: more truthsocial.com special handling * add: more truthsocial.com special handling * fix: variable scope for truthsocial special handling * fix: domains crawl * fix: set assets hops to their seed hop * fix: extraction of outlinks on assets --------- Co-authored-by: Jake L <[email protected]> Co-authored-by: Corentin Barreau <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.