You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The project needs slight improvement in terms of outputs.
It is best to modify the program so that it only saves content that has been successfully extracted, rather than saving empty text files if there is no extracted content. Among the positives of this:
Facilitating and accelerating the process of analyzing the extracted data
Save storage space
This can be achieved by using a simple if statement for each regex.
The text was updated successfully, but these errors were encountered:
Traditionally, we left the 0-length feature files so that users could know that a particular scanner ran and found nothing. There is minimal overhead associated with storing zero-length files.
Previously, we also stored data in an SQLite3 database, which dramatically improved performance and reduced overhead. However, nobody used it.
Your suggestion of adding a regex filter on each feature file to further prune the output is a curious one. This program has been in use for 14 years and no one has ever suggested this before. It is straighforward to run grep on a feature file; it is not straightforward to re-run bulk_extractor if the there is a typo in the filter.
Do you have an actual use case for which the output size is problematic and a filter is required, or is this a request based on what a hypothetical user would like? If you are indeed in need of this feature, you are welcome to submit it as a pull request. I'm happy to design it with you. Adding more command line switches is problematic at this point, so you might also want to add the ability to have a yaml or JSON configuration file.
If you aren't able to implement this yourself but are willing to pay for this feature to be created, I can hook you up with a consultant.
The project needs slight improvement in terms of outputs.
It is best to modify the program so that it only saves content that has been successfully extracted, rather than saving empty text files if there is no extracted content. Among the positives of this:
This can be achieved by using a simple if statement for each regex.
The text was updated successfully, but these errors were encountered: