Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RTTI thumbnail option #460

Open
Rannek opened this issue Feb 6, 2024 · 15 comments
Open

Add RTTI thumbnail option #460

Rannek opened this issue Feb 6, 2024 · 15 comments
Assignees

Comments

@Rannek
Copy link

Rannek commented Feb 6, 2024

bulk_extractor automatically finds and extracts JPG images but the tool does not currently support the extraction of RTTI thumbnail files.

RTTI is a thumbnail file format generated by Raw Therapee. These files cannot be extracted by standard methods, which means that they are currently overlooked by bulk_extractor. and other softwares too.

I have developed a script that successfully extracts RTTI thumbnail files. I believe integrating this script into bulk_extractor would significantly increase the tool's yield and make it even more versatile.

https://github.com/Rannek/raw-therapee-thumbnail-extractor

@simsong
Copy link
Owner

simsong commented Feb 7, 2024

Right now, your script is in Python. If you want to rewrite it as a bulk_extractor module, we can take it now. Otherwise, it will need to wait until we can take on python modules. They will run much slower, because python runs much slower than see.

How widely is RTTI used?

@Rannek
Copy link
Author

Rannek commented Feb 7, 2024

Thank you for the answer and the clarification. I will try to rewrite my Python script as a bulk_extractor module in C++.

How widely is it used? That’s a good question. It is an open-source alternative to Adobe Lightroom, used to edit mainly RAW images but can handle other formats too. I think it’s pretty well known in the photographic community.

Every Raw Therapee Thumbnail Image (RTTI) begins with either Image8 or Image16, which indicates the number of bits per channel in the thumbnail image in RGB layout.

Screenshot_20240207_080858

@simsong
Copy link
Owner

simsong commented Feb 7, 2024

It's not very hard. You will need to have a test file as well. Please check out the src directory.

@simsong
Copy link
Owner

simsong commented Feb 7, 2024

I'm happy to review your code and otherwise help out!

@Rannek
Copy link
Author

Rannek commented Feb 8, 2024

I converted my script to C++ and also made it support binary files, but I can't integrate it as a bulk_extractor module because it exceeds my skills. I am pretty much a noob at C++ and Makefiles and still learning.

You can find my script here, with test files: https://github.com/Rannek/rtti_cpp/

I hope it can be implemented into bulk_extractor somehow.

@simsong
Copy link
Owner

simsong commented Feb 8, 2024

Thanks. Congrats on getting the program to work.
Your program depends on OpenCV. I don't want to build OpenCV into bulk_extractor, so I really can't use your code as-is. But it's a start.

Can you give me an idea of how widely RTTI is used?

Can you put together a corpus of 3-4 RTTI files that I can use for tests?

What are the tools that read and write RTTI images?

@Rannek
Copy link
Author

Rannek commented Feb 8, 2024

Thank you very much! Not wanting to integrate OpenCV is completely understandable. Maybe there is a more elegant solution to this. I need to further investigate it.

How widely is Raw Therapee used?

  • It was released in 2005, and the forum associated with it, Raw Therapee Forum, has over half a million views and more than 10,000 replies.
  • On GitHub, it has 2.4k stars.
  • There are not many alternatives to Adobe Lightroom, so a common choice is Raw Therapee or Darktable.
  • All of the articles recommend Raw Therapee for people searching for an open-source alternative to Lightroom.

From the standpoint of how often this filetype would appear in an evidence scan scenario, it is a good question. It is definitely not as common as Windows thumbcache files, but I think every piece of evidence matters in evidence searching. No other forensic tool can search for this thumbnail type, only for common image formats (as far as I know).

Maybe it will appear in one hard drive out of 50 when searching for evidence, but if that one helps, it was worth it.

Of course, Windows thumbcache is not limited to photographers like Raw Therapee. But this could also be an advantage because if you find this type of thumbnail, you can be sure that there will be a lot of thumbnail files (photographers usually have a lot of pictures).

One additional advantage is that regular file cleaners (CCleaner, Windows built-in cleaner) clear the thumbcache folder, but they miss this folder. This is not true on Linux, though. (BleachBit cleans the .cache folder)

What are the tools that read and write RTTI images?

Only Raw Therapee uses it. It builds a thumbnail folder so the next time the user opens the program, it does not need to generate the thumbnails again (similar to Windows Thumbcache). It does this for every image displayed (even if it is not edited).

I placed 4 .rtti files in the rtti_cpp/rtti_testfiles/ folder in their unmodified format (as the program outputs them) in different aspect ratios for testing.

You can process them one by one, or you can even cat them into one file and the program will still reads them.

@simsong
Copy link
Owner

simsong commented Feb 8, 2024

Got it. Okay, I'll add it in.

@Rannek
Copy link
Author

Rannek commented Feb 8, 2024

Thank you very much! If i can assist you in any way, please let me know.

@Rannek
Copy link
Author

Rannek commented Feb 8, 2024

I found something useful. I discovered that the thumbnail images are actually PPM files. Wikipedia.

If I rewrite the header to a PPM header, it will become a PPM image. So, you actually don't need OpenCV or anything else. The dimensions are stored in little endian after the newline character. I think I overcomplicated this a bit.

For example:

496d61676538 0a 8002 0000 2003 0000 1a Image8..... ....

Becomes:

  • Width: 80 020x0280640 (decimal)
  • Height: 20 030x0320800 (decimal)

After conversion, the header:

5036 0a36 3430 2038 3030 0a32 3535 0a1a P6.640 800.255..

After that, you can convert the .ppm file to .jpg with the convert program from ImageMagick or better ways. Maybe this helps.

@simsong
Copy link
Owner

simsong commented Feb 8, 2024

Yay! That's great research. Do you want to change your program and see if it still works? DO you wish to add tests in your program?

@Rannek
Copy link
Author

Rannek commented Feb 8, 2024

Thank you!

Yes, I will try to rewrite my program to be a simple header replacement, so it does not need OpenCV at all.

Also, I will try to find a way to convert the output to JPG without too much over complication and external libraries. A lot of image editors and viewers support PPM by default, so it's a bonus step.

Yes, I'm planning to add tests to my program. I want to first ensure that I have found the most simple and effective solution. Do you have any recommendations?

I will update this thread soon with my progress.

@simsong
Copy link
Owner

simsong commented Feb 8, 2024 via email

@Rannek
Copy link
Author

Rannek commented Feb 9, 2024

I revised my code. Now, it functions without OpenCV, and the output files are in .bmp format. It operates the same as before. https://github.com/Rannek/rtti_cpp/

Now, I need to figure out how to convert it into a bulk_extractor module. If I understand correctly, the basic logic involves changing std::ifstream to the bulk_extractor stream. I need to review the bulk_extractor code more thoroughly.

@simsong
Copy link
Owner

simsong commented Feb 9, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants