Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File system iterator needs to keep a set of directories and files visited and not process the same directory or file twice. #472

Open
martinmdp opened this issue May 7, 2024 · 12 comments

Comments

@martinmdp
Copy link

Hi

Thanks for this amazing tool.

I managed to compile and install bulk_extractor on linux debian 12, but when I run with te command:

bulk_extractor -o results -R / -E accts

fails with:

terminate called after throwing an instance of 'std::filesystem::__cxx11::filesystem_error'
what(): filesystem error: status: Too many levels of symbolic links [/run/udev/watch/25]
Aborted

any suggestions?

Thanks in advance

@martinmdp martinmdp changed the title Make on Debian 12 fail Abort on Debian 12 May 7, 2024
@simsong
Copy link
Owner

simsong commented May 9, 2024

This likely happened because you have a recursive symlink - a symlink that points to a directory that ultimately points back to the symlink. The filesystem iterator does not keep a set of all directories and symlinks that have been previously visited to make sure that it never processes the same symlink or directory twice. That's a good thing to add. It wouldn't be hard to do. Would you like to add it?

@simsong simsong changed the title Abort on Debian 12 File system iterator needs to keep a set of directories and files visited and not process the same directory or file twice. May 9, 2024
@martinmdp
Copy link
Author

@simsong thanks for the response, unfortunatly i'm not a C++ developer, there is no way to exclude directories from the scan? In linux some folders like /sys, /dev or /run are links to devices or dynamic files, during the scans of bulk_extractor the procedure fails with "segmentation fault" ( for example with /sys/kernel/notes ), and if the tool wasn't compiled with libexpat library, is not possible to resume the scan.

Thanks in advance

@simsong
Copy link
Owner

simsong commented May 9, 2024

Well, what is your goal in scanning /sys and /dev and /run?

@martinmdp
Copy link
Author

I need a full system scan, because of a requirement, then i do bulk_extractor -o results -R /

This scan the whole system, but when get into the folders mentioned there Is a segmentation fault or a problem following symbolic links

@simsong
Copy link
Owner

simsong commented May 9, 2024 via email

@martinmdp
Copy link
Author

The scan is from the same filesystem, i cant scan a image because is a production server

@simsong
Copy link
Owner

simsong commented May 9, 2024

I am also unclear why you want to scan /dev/random and /dev/zero. Can you try running strings or wc in these devices for me and print the results here?

@simsong
Copy link
Owner

simsong commented May 9, 2024

The scan is from the same filesystem, i cant scan a image because is a production server

You can't read the file system's raw disk partition on a production system? What OS are you running?

@martinmdp
Copy link
Author

I don't need to scan those particular directories, I need to scan the entire system for certain evidence, the problem is that when passing the "-R / " parameter to scan the entire system and generate a single report, the process happens for all folders without exception, and in the system folders it throws an error, if I have to scan for each folder I must generate more than 10 different reports (for example, run the command each time with /etc, /home/ /var and so on), I need a single scan for the entire root "/", I am using Linux operating system debian 11 and 12, latest version of bulk_extractor compiled with git pull --recursive, boostrap.sh, ./configure, make and make install

@simsong
Copy link
Owner

simsong commented May 9, 2024

Well, -R / says start at the root directory and scan every single readable object in every folder. And one of the folders you will get from / is /dev. So -R / is asking the computer to scan /dev/chargen and /dev/zero and I think that you will be very unhappy when you read those. And /dev/stdin will read from your keyboard until you type ^d. So I think that you really do not want to do a -R /. Either that, or you are asking that the iterator also only scan regular files and not devices or pipes of FIFOs or other things that are in the file system. Right?

I'm still curious why you are only interested in scanning the allocated blocks associated with files, and then, you are only interested in the primary stream of file systems that support multiple data streams. Are you sure that you do not want data from deleted files?

Don't get me wrong — the mods that you are asking for make sense. But I want to be sure that I understand exactly what you need before I put in a change request.

@martinmdp
Copy link
Author

Thanks for your response, I'm not really requesting modifications, I've just been tasked with scanning a file system in search of certain information. Honestly, I'm not interested in system folders or deleted files. I just need a single report of a system scan complete, and if there are findings, let them appear in the report.

I have run bulk_extractor on a Windows system without problems, but on Linux I cannot do it due to this problem, that is why I was asking if there is a way to "exclude" folders or symlinks, because that way I would run the command with -R / and I would have the possibility of excluding or ignore system folders like /dev, /sys, etc.

Thanks for your time

@simsong
Copy link
Owner

simsong commented May 9, 2024

Hi. If you are asking for -R / to not include all of the directories and file system entries under /, then you are very much requesting modifications.

You haven't had these problems on Windows because Windows doesn't have a unified file system that places devices underneath a single root directory, and because everything on Windows is not a file, as it is on Linux.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants