-
Notifications
You must be signed in to change notification settings - Fork 705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scan files in the ISO tree as well as the Joliet file tree #938
Scan files in the ISO tree as well as the Joliet file tree #938
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears when testing the poc_eicar.iso and normal_eicar.iso that the first file that is extracted doesn't use utf16 for the filename encoding, so it's printing:
LibClamAV Warning: utf16 length is not multiple of two: 11
And then the filename is like this: 䕉䍁刮�
The next file entry IS utf16 encoded though. Do you know if there is any way to determine if it is actually UTF16 encoded?
To test this, and to test my concern about not aborting the scan if the return code is not success (my other code comment) I tried modifying the script to generate the POC and added 2 extra copies of EICAR to the file. Here's what the JSON looks like when scanning the "normal" one. The --gen-json --debug
output reads:
{
"Magic":"CLAMJSONv0",
"RootFileType":"CL_TYPE_BINARY_DATA",
"FileName":"normal_eicar.3.iso",
"FileType":"CL_TYPE_BINARY_DATA",
"FileSize":67584,
"FileMD5":"0224a60c2c8e3a24f83d34099412eed1",
"EmbeddedObjects":[
{
"FileType":"CL_TYPE_ISO9660",
"Offset":32768,
"ContainedObjects":[
{
"FileName":"䕉䍁刮�",
"FilePath":"/home/micah/workspace/clamav-micah/build/tmp/20230620_210354-normal_eicar.3.iso.1d20366123/normal_eicar.3.iso.6a1bb2008d/clamav-ba85b2302dd09a88ccfa36b5f8b00e5e.tmp",
"FileType":"CL_TYPE_TEXT_ASCII",
"FileSize":68,
"FileMD5":"44d88612fea8a8f36de82e1278abb02f",
"Viruses":[
"Win.Test.EICAR_HDB-1"
]
},
{
"FileName":"eicar.com",
"FilePath":"/home/micah/workspace/clamav-micah/build/tmp/20230620_210354-normal_eicar.3.iso.1d20366123/normal_eicar.3.iso.6a1bb2008d/clamav-92e42e0d75a5f0374ffc92d4cdd1462a.tmp",
"FileType":"CL_TYPE_TEXT_ASCII",
"FileSize":68,
"FileMD5":"44d88612fea8a8f36de82e1278abb02f"
},
{
"FileName":"eicar.com2",
"FilePath":"/home/micah/workspace/clamav-micah/build/tmp/20230620_210354-normal_eicar.3.iso.1d20366123/normal_eicar.3.iso.6a1bb2008d/clamav-4fe9ba0299e7bb2df20b38642e78fb33.tmp",
"FileType":"CL_TYPE_TEXT_ASCII",
"FileSize":68,
"FileMD5":"44d88612fea8a8f36de82e1278abb02f"
},
{
"FileName":"eicar.com3",
"FilePath":"/home/micah/workspace/clamav-micah/build/tmp/20230620_210354-normal_eicar.3.iso.1d20366123/normal_eicar.3.iso.6a1bb2008d/clamav-2c67876793e968637519fb6b64eb69cc.tmp",
"FileType":"CL_TYPE_TEXT_ASCII",
"FileSize":68,
"FileMD5":"44d88612fea8a8f36de82e1278abb02f"
}
]
}
]
}
One concern I think people may have is that with this approach for "normal" ISO's we may scan each file 2x. I'm not sure what the solution is. I mean,... the clean cache will prevent some amount of re-scanning. But not completely. So I expect scan performance for "normal" ISO's to worsen with this change.
Regarding the warning about filenames, it is just a warning about the length not being a multiple of two. Testing for an odd length and adding 1 to it gets rid of the warning, and gives the same characters that we were getting previously. @micahsnyder what do you think of this? |
Yeah bumping the length sounds good. 👍 |
b8e99e3
to
209b494
Compare
I'm not actually sure what I was thinking. Per my previous comments, when that warning occurs it seems the input is not actually UTF-16, which means converting UTF8 to UTF8 results in gibberish. I was hoping we could determine if it is actually UTF16 before trying to convert it to UTF8. |
I think the correct solution may be to take the filename as is, and let the file system handle the type, so remove the utf16->utf8 conversion all together. @micahsnyder do you have any thoughts on this? |
46343fb
to
fa09905
Compare
That doesn't really solve the issue. Now the filename strings that were UTF16 encoded don't get transcoded and show up as empty strings. E.g. ❯ ./install/bin/clamscan -d ~/clamav.hdb ~/*l_eicar.3.iso --gen-json --debug --leave-temps --force-to-disk --tempdir=(pwd)/tmp -z LibClamAV debug: {
"Magic":"CLAMJSONv0",
"RootFileType":"CL_TYPE_BINARY_DATA",
"FileName":"normal_eicar.3.iso",
"FileType":"CL_TYPE_BINARY_DATA",
"FileSize":67584,
"FileMD5":"0224a60c2c8e3a24f83d34099412eed1",
"EmbeddedObjects":[
{
"FileType":"CL_TYPE_ISO9660",
"Offset":32768,
"ContainedObjects":[
{
"FileName":"EICAR.COM",
"FilePath":"/home/micah/workspace/clamav-micah-2/build/tmp/20230728_125824-normal_eicar.3.iso.159fee5c6f/normal_eicar.3.iso.6db0e5b1a3/clamav-8eb68b97d63b5a1f61ecadb27e9ca865.tmp",
"FileType":"CL_TYPE_TEXT_ASCII",
"FileSize":68,
"FileMD5":"44d88612fea8a8f36de82e1278abb02f"
},
{
"FileName":"EICAR.COM",
"FilePath":"/home/micah/workspace/clamav-micah-2/build/tmp/20230728_125824-normal_eicar.3.iso.159fee5c6f/normal_eicar.3.iso.6db0e5b1a3/clamav-7b8413cb96a3ec0b21131639f76383ab.tmp",
"FileType":"CL_TYPE_TEXT_ASCII",
"FileSize":68,
"FileMD5":"44d88612fea8a8f36de82e1278abb02f"
},
{
"FileName":"EICAR.COM",
"FilePath":"/home/micah/workspace/clamav-micah-2/build/tmp/20230728_125824-normal_eicar.3.iso.159fee5c6f/normal_eicar.3.iso.6db0e5b1a3/clamav-ffd6c25313cbcf8644b5a5a97dae12a9.tmp",
"FileType":"CL_TYPE_TEXT_ASCII",
"FileSize":68,
"FileMD5":"44d88612fea8a8f36de82e1278abb02f"
},
{
"FileName":"",
"FilePath":"/home/micah/workspace/clamav-micah-2/build/tmp/20230728_125824-normal_eicar.3.iso.159fee5c6f/normal_eicar.3.iso.6db0e5b1a3/clamav-5e6b883ec08936161fc8b2ab0aac6bc1.tmp",
"FileType":"CL_TYPE_TEXT_ASCII",
"FileSize":68,
"FileMD5":"44d88612fea8a8f36de82e1278abb02f"
},
{
"FileName":"",
"FilePath":"/home/micah/workspace/clamav-micah-2/build/tmp/20230728_125824-normal_eicar.3.iso.159fee5c6f/normal_eicar.3.iso.6db0e5b1a3/clamav-12beb4289323f286460e1d8767fe1c3b.tmp",
"FileType":"CL_TYPE_TEXT_ASCII",
"FileSize":68,
"FileMD5":"44d88612fea8a8f36de82e1278abb02f"
},
{
"FileName":"",
"FilePath":"/home/micah/workspace/clamav-micah-2/build/tmp/20230728_125824-normal_eicar.3.iso.159fee5c6f/normal_eicar.3.iso.6db0e5b1a3/clamav-3e94dc725f8224f4200fa50e57211e94.tmp",
"FileType":"CL_TYPE_TEXT_ASCII",
"FileSize":68,
"FileMD5":"44d88612fea8a8f36de82e1278abb02f"
}
]
}
]
} I also tried scanning using the full signature database and just noticed some weird behavior where it is re-scanning the embedded ISO9660 object at the same offset (32768) multiple times. My command on my machine is like this:
The metadata from the scan looks kind of like this (i've added Then looking at metadata.json, i'm seeing weirdness. E.g. {
"Magic":"CLAMJSONv0",
"RootFileType":"CL_TYPE_BINARY_DATA",
"FileName":"normal_eicar.3.iso",
"FileType":"CL_TYPE_BINARY_DATA",
"FileSize":67584,
"FileMD5":"0224a60c2c8e3a24f83d34099412eed1",
"EmbeddedObjects":[
{
"FileType":"CL_TYPE_ISO9660",
"Offset":32768,
"ContainedObjects":[
{
"FileName":"EICAR.COM",
"FilePath":"/home/micah/workspace/clamav-micah-2/build/tmp/20230728_125058-normal_eicar.3.iso.7752c48b86/normal_eicar.3.iso.f0320bbd82/clamav-49615fabc0342bcd076145faf5803d38.tmp",
"FileType":"CL_TYPE_TEXT_ASCII",
"FileSize":68,
"FileMD5":"44d88612fea8a8f36de82e1278abb02f",
"Viruses":[
"Eicar-Test-Signature",
...
]
},
...
]
},
{
"FileType":"CL_TYPE_ISO9660",
"Offset":32768,
"ContainedObjects":[
{
"FileName":"EICAR.COM",
"FilePath":"/home/micah/workspace/clamav-micah-2/build/tmp/20230728_125058-normal_eicar.3.iso.7752c48b86/normal_eicar.3.iso.f0320bbd82/clamav-0158d255d136dab1853da3b5afd140c4.tmp",
"FileType":"CL_TYPE_TEXT_ASCII",
"FileSize":68,
"FileMD5":"44d88612fea8a8f36de82e1278abb02f",
"Viruses":[
"Eicar-Test-Signature",
...
]
},
...
]
},
{
"FileType":"CL_TYPE_ISO9660",
"Offset":32768,
"ContainedObjects":[
{
"FileName":"EICAR.COM",
"FilePath":"/home/micah/workspace/clamav-micah-2/build/tmp/20230728_125058-normal_eicar.3.iso.7752c48b86/normal_eicar.3.iso.f0320bbd82/clamav-ce922cd591b9ce0931ec8734ff47d286.tmp",
"FileType":"CL_TYPE_TEXT_ASCII",
"FileSize":68,
"FileMD5":"44d88612fea8a8f36de82e1278abb02f",
"Viruses":[
"Eicar-Test-Signature",
...
]
},
...
]
},
{
"FileType":"CL_TYPE_ISO9660",
"Offset":32768,
"ContainedObjects":[
{
"FileName":"EICAR.COM",
"FilePath":"/home/micah/workspace/clamav-micah-2/build/tmp/20230728_125058-normal_eicar.3.iso.7752c48b86/normal_eicar.3.iso.f0320bbd82/clamav-315903e475f2e1ca61b191b84a024a2d.tmp",
"FileType":"CL_TYPE_TEXT_ASCII",
"FileSize":68,
"FileMD5":"44d88612fea8a8f36de82e1278abb02f",
"Viruses":[
"Eicar-Test-Signature",
...
]
},
...
]
},
{
"FileType":"CL_TYPE_ISO9660",
"Offset":32768,
"ContainedObjects":[
{
"FileName":"EICAR.COM",
"FilePath":"/home/micah/workspace/clamav-micah-2/build/tmp/20230728_125058-normal_eicar.3.iso.7752c48b86/normal_eicar.3.iso.f0320bbd82/clamav-47db60978699a4db93ad9416466df3e4.tmp",
"FileType":"CL_TYPE_TEXT_ASCII",
"FileSize":68,
"FileMD5":"44d88612fea8a8f36de82e1278abb02f",
"Viruses":[
"Eicar-Test-Signature",
...
]
},
...
]
},
{
"FileType":"CL_TYPE_ISO9660",
"Offset":32768,
"ContainedObjects":[
{
"FileName":"EICAR.COM",
"FilePath":"/home/micah/workspace/clamav-micah-2/build/tmp/20230728_125058-normal_eicar.3.iso.7752c48b86/normal_eicar.3.iso.f0320bbd82/clamav-955ea8beb2a9e1e2e6931dd34cca3b44.tmp",
"FileType":"CL_TYPE_TEXT_ASCII",
"FileSize":68,
"FileMD5":"44d88612fea8a8f36de82e1278abb02f",
"Viruses":[
"Eicar-Test-Signature",
...
]
},
...
]
},
{
"FileType":"CL_TYPE_ISO9660",
"Offset":32768,
"ContainedObjects":[
{
"FileName":"EICAR.COM",
"FilePath":"/home/micah/workspace/clamav-micah-2/build/tmp/20230728_125058-normal_eicar.3.iso.7752c48b86/normal_eicar.3.iso.f0320bbd82/clamav-c16ce8ba824d50825d06c915c2c00a3d.tmp",
"FileType":"CL_TYPE_TEXT_ASCII",
"FileSize":68,
"FileMD5":"44d88612fea8a8f36de82e1278abb02f",
"Viruses":[
"Eicar-Test-Signature",
...
]
},
...
}
]
} |
7ff6b94
to
58526c2
Compare
Your latest update resolves the utf16/8 conversion problems. Regarding the weirdness I observed with it extracting ISO's multiple times... I am just a complete idiot. I was testing with daily/main/bytecode/hifistatic/preclass/windows/osx/linux cvds. yeah... all of em. So it had duplicates of all the file type magic sigs. After removing the extra dbs from my db directory, extraction is back to normal. Will give it another once over on the code now that the behavior is fixed with the filenames. |
PR needs to be rebased to pass the build on macOS since the switch to openssl 3.x changes. I can do that. |
fa2c3c5
to
702c9d1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Internal test pipelines look good.
Previously were only scanning the joliet tree when we found it, but files in the ISO tree need to be scanned as well.
Test files created with pycdlib and contain the ClamAV logo.png file.
88f0d17
to
980f893
Compare
No description provided.