-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add binary vs. textual file enumeration. #2739
Conversation
@@ -25,108 +24,4 @@ public ArtifactProvider(IEnumerable<IEnumeratedArtifact> artifacts) | |||
|
|||
public IFileSystem FileSystem { get; set; } | |||
} | |||
|
|||
public class SinglethreadedZipArchiveArtifactProvider : ArtifactProvider |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public string Contents | ||
{ | ||
get => GetContents(); | ||
get => GetArtifactData().text; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the core change. On any attempt to access a property that returns artifact contents, we attempt to fault in the data. This may entail reading the contents from disk or a stream. We now sniff the data to determine whether it is binary or textual and populate Contents
or Bytes
accordingly.
/// <param name="rawData">The raw data.</param> | ||
/// <param name="start">The start.</param> | ||
/// <returns></returns> | ||
public static bool CheckForByteOrderMark(byte[] rawData, out Encoding encoding, int start = 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not have explicit testing/use of BOM detection. I have a stubbed test but haven't nailed down how to generate useful test cases. I'm leaving this (and a stub test) for the future.
In practice, detecting/handling esoteric encodings is not a prominent scenario for most analysis scenarios.
@@ -517,6 +518,7 @@ protected override string ConstructTestOutputFromInputResource(string inputResou | |||
mockFileSystem.Setup(x => x.DirectoryGetDirectories(It.IsAny<string>())).Returns(Array.Empty<string>()); | |||
mockFileSystem.Setup(x => x.DirectoryEnumerateFiles(inputLogDirectory, inputLogFileName, SearchOption.TopDirectoryOnly)).Returns(new string[] { inputLogFilePath }); | |||
mockFileSystem.Setup(x => x.FileReadAllText(inputLogFilePath)).Returns(v2LogText); | |||
mockFileSystem.Setup(x => x.FileOpenRead(inputLogFilePath)).Returns(new MemoryStream(Encoding.UTF8.GetBytes(v2LogText))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/Sarif/EnumeratedArtifact.cs
Outdated
|
||
GetArtifactData(); | ||
this.isBinary = this.contents == null; | ||
return this.isBinary.Value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Combining the last two statements, will the return value be null? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! and just delete the backing field.
src/Sarif/EnumeratedArtifact.cs
Outdated
} | ||
} | ||
|
||
internal bool IsArtifactBinary() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest: make private to narrow API surface to IsBinary and focus internal use/experience on the more public version? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'internal' is still private, i tend to do this to make things testable but not public. As it happens I can now delete this helper entirely.
It now occurs to me: this new property value is untested!! Need to add it to the test validations. So I'm glad you pointed to it. :)
throw new InvalidOperationException("Stream is not seekable. Provide a seekable stream or set the 'SupportNonSeekableStreams' property."); | ||
} | ||
|
||
this.bytes = new byte[Stream.Length]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it the case that we can categorize a Seekable stream in 4096 bytes, but we need to read in the full stream for a Nonseekable one? If 4096 is sound, I suspect we'd both love to take a one-block peek in the Nonseekable case, and use that as the starting buffer for a full Stream or byte[] read depending on what is appropriate. So I probably just don't see a fatal flaw yet.
Where is 4096 coming from in the Seekable case? Looking at the implementation of FileEncoding, I'd optimistically credit that we probably don't get much more accuracy in the classifier after this many bytes if that's coming from some tribal knowledge or guidance stemming from other users of this code. But right now that's just a brittle prior without any experience on with so many of these file extensions, so I'm really curious what the author guidance or community discourse looks like. #Pending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's maybe a way to put a finger on what I'm getting at:
The OSS code snippet we're taking asserts that if you see no more than 10% of the bytes as being non-textual, then that's a good-enough heuristic. I buy that. Not second-guessing it. But that's the kind of suspiciously-round number someone just yanked out of their brain one day and it just worked well enough in the real world that it stuck and became an institution.
Maybe 4096 is the same way and we don't need to do full file scans to classify in any scenario? This is what I'm wondering/hoping. (Not asserting.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so first, 4096 is way too much data and I dropped it to 1024. This is probably still more data than we need.
Let's talk offline further about this (in particular about whether we can address the inefficiency in the non-seekable case of reading the whole thing). I'm just not aware of how we can handle a string decoding from the original non-seekable string if we've already obtained 1024 bytes from it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. We can't. I was basically proposing a copy with extra steps. (And apologies if you're getting this twice. I tried the e-mail reply feature but don't see it in review.)
For a large file, it might be better to close and re-open the stream after reading 1k. That would work. But I was wrong that we can do this in about the same speed in both cases, so the API structure is fundamentally sound.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually: if the zip archive is always already in memory, maybe closing/reopening is trivially cheap?
this.Stream.Seek(0, SeekOrigin.Begin); | ||
|
||
byte[] header = new byte[4096]; | ||
int length = this.Stream.Read(header, 0, header.Length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fully cognizant that this is almost certainly out of scope for what you've set out to accomplish but I'm duty-bound to wonder aloud when I see more synchronous code being crafted: do you see a path to the Task Programming Model? #Pending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing! It is a SMOP. Just a 'small matter of programming'. :)
{ | ||
var artifacts = new List<IEnumeratedArtifact>(); | ||
|
||
foreach (ZipArchiveEntry entry in zipArchive.Entries) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this indirectly recurse? Do we want it to? (I imagine we do.)
(This might become obvious to me later, but tossing it out there in case it doesn't and I forget.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this should not. This is the zip archive's own enumerator, which handles traversing all zip contents. As far as I'm aware, the zip archive format does not allow features (such as a kind of symbolic link) that would result in any cycles/recursion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was unclear and imprecise. The zip archive doesn't know anything about the contents of its individual entries. It's counterproductive to zip up a file that is already zipped, but you can. And it probably happens in some cases where people are automating bulk archiving where other processes and/or user behavior may have caused some of the contents to be zipped up.
To get super concrete: if we're ever working through the entrys in "BulkArchive.zip", and find an entry in it such that entry.Name == "doublezipped.zip" --will we crack open doublezipped.zip?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, thanks. No, we do not recurse into other zip archives. In theory, we could instantiate a new provider in this case. In theory, we could try to expand other arbitrary archives as well. We've discussed this as an interesting possibility for the tool (you and I and others). But it's not realized today. I tend to think we'd want a different design if we had this capability beyond a simple iteration paradigm. Maybe a queue or more of a subscription model where something could observe names/file types and just grab what was interesting to them.
{ | ||
get | ||
{ | ||
lock (this.archive) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe strengthen error checking in the constructor to preclude the null condition and elide the lock?
Edit: maybe the lock is still necessary. (That's not clear to me.) The nullchecking could be potentially elided if the constructor enforces non-null values. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bonus: if we do need locking, it would be nice to be locking the entry as opposed to the archive. (Assuming you don't need a whole archive locked to safely open an entry and such.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have to lock on the archive! Too bad. We tested the other case. There must be (makes sense) state maintained at the archive level about decompression/other entry state. I'm not sure why we permit a null entry. Will try to rip that out.
string extension = Path.GetExtension(Uri.ToString()); | ||
if (this.binaryExtensions.Contains(extension)) | ||
{ | ||
this.bytes = new byte[Stream.Length]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do all our underlying Streams support .Length?
It is more defensive to promote these .bytes fields to MemoryStreams and use Stream.Copy. If I'm reading the docs right: wherever the underlying Stream supports .CanSeek, the MemoryStream can be precisely presized by safely querying .Length. Otherwise, .Length can throw for some implementations (but not others.)
The tradeoff of doing this is that if we ever needed a precisely-sized byte[] for an artifact later, we would potentially need to take a copy. (But not in cases where we were able to size it precisely.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ZipArchive supports providing a decompressed length. What's the performance impact of creating a memory stream as you describe? Will it effectively use the wrapped stream as its buffer? And so the length computation is computable before the memory stream did it? :)
This sounds like a nice next clean-up. It also makes me wonder if I've been over-concerned about handling non-seekable streams. This case may be entirely unremarkable based on what you've pointed out: we should simply read the original stream into a byte array, sniff it and then, if we need to decode, we wrap it in a memory stream (which will use the byte array as its buffer and not create a new copy).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we know we have .Length always, then it doesn't matter if we store these things as byte[]s or MemoryStreams. We can go back and forth fluidly. (ie: new MemoryStream(this.bytes)). Which means we can get a decoded string easily, as you say. But that's a second copy. (Copy 1: zip stream into memory stream or byte[]; copy 2: into string). You've convinced me this is unavoidable for Nonseekables, at least without closing/reopening the stream.
For Seekables, it's worth avoiding. We can just do one copy. (Copy 1: zip stream into string.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. But for this specific location, ZipArchive, these streams are never seekable. Do you have a have suggestion or do you agree this code can ship as is? :)
|
||
if (Stream == null && this.contents == null) | ||
if (Stream == null && this.contents == null && this.bytes == null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
made this consistent for this file. We should discuss this style point, whether properties have a this.
prefix in code is very inconsistent across the code base.
@@ -5,6 +5,7 @@ | |||
using System.Collections.Generic; | |||
using System.IO; | |||
using System.Net.Mime; | |||
using System.Text; | |||
|
|||
using FluentAssertions; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Esp. since there are some new files here and you're a fellow appreciator: have you measured code coverage? If so: where did it land?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops! Thanks for asking, I meant to provide that data already. Those new files are mostly refactored classes. Here's the coverage data for the classes that have functional changes. I took a little time to fill out testing for ZipArchiveArtifact
.
94.38% : EnumeratedArtifact
97.22% : MultithreadedZipArtifactProvider
93.59% : ZipArchiveArtifact
{ | ||
if (rawData[i - 1] == 0 && rawData[i] == 0) | ||
{ | ||
if (++nullSequences > 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem, I believe, is that some BOM sequences has a '\0' character (but no more than 1). So your heuristic might make use believe that a text file with a BOM is binary.
throw new InvalidOperationException("Stream is not seekable. Provide a seekable stream or set the 'SupportNonSeekableStreams' property."); | ||
} | ||
|
||
this.bytes = new byte[Stream.Length]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, our driver framework assumes enumerated artifacts aren't shared across threads. Madness would ensure. A single thread gets a single scan target and all rules run against that target on that thread.
I do see that some other scenarios would find value in hardening for this. But I wouldn't do it prematurely or do it without testing perf impact for our core scenario.
@@ -737,7 +737,7 @@ private async Task ScanTargetsAsync(TContext globalContext, IEnumerable<Skimmer< | |||
|
|||
DriverEventSource.Log.ReadArtifactStart(filePath); | |||
// Reading the length property faults in the file contents. | |||
long sizeInBytes = perFileContext.CurrentTarget.Contents.Length; | |||
long sizeInBytes = perFileContext.CurrentTarget.SizeInBytes.Value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and we should not provoke those conditions. :) There is a unit test for this. What actually happens is that we will raise InvalidOperationException
for the weird cases, so the nullness operator doesn't help.
In our code, however, we always provide a URI and contents and so there shouldn't be a problem. You do not want to code defensively in these cases because if you do so you'll mask an issue that you'd assumed, by design, is protected against.
That can lead to very subtle bugs. Better is to have the code as it is. We think our objects have integrity, if not, we will fail fast with an exception.
} | ||
} | ||
|
||
private void ValidateBinaryArtifact(EnumeratedArtifact artifact, int sizeInBytes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inferring standards from practice: are we okay with sequences of assertions wherever the [Fact] is validating a single input/case, as opposed to a multiplicity of them?
I'm hypothesizing that if this method were ValidateBinaryArtifacts (plural) then it would be using a StringBuilder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! For a true, more atomic unit test, it's fine to break early on the first condition that fails. There's two interesting things to consider, one is what failure information is more useful to the debugger. The second is to ensure that the literal output of the test is illustrative.
Some teams have stringent standards around emitting textual messages with all failures. Seeing 'Expected '17' but observed '13'' isn't always helpful.
|
||
string guid = Guid.NewGuid().ToString(); | ||
|
||
#pragma warning disable SYSLIB0001 // Type or member is obsolete |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧐
Which type or member is obsolete and why can't we use the shiny new stuff?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops! Thanks for catching this. So UTF7 is marked as obsolete in .NET. I previously was playing around with this encoding to test the behavior of our encoding sniffing. This turned into a rabbit hole so I pulled back (but left the #pragma in place).
MemoryStream is backed by a byte[]. Presumably, they grow it exponentially. But the constructor takes a size argument so you can presize it correctly in the case that we can read Length. Because it's just a byte[] under the covers, MemoryStream is about the same cost as a byte[]. You can get the raw underlying buffer via GetBuffer(). But if it wasn't sized precisely, it will tend to be bigger than the total data. So in the case where we can read .Length the cost is about the same.
You're probably hating the bad abstraction -which... fair.
If you're sure we can always read .Length then it doesn't get us much. Once we have a pre-sized byte[] in memory, we can just wrap that in a MemoryStream if we ever want to treat it as a Stream. (ie: to do an async stream copy, decode it as text --whatever.)
From: Michael C. Fanning ***@***.***>
Sent: Thursday, November 30, 2023 5:10 PM
To: microsoft/sarif-sdk ***@***.***>
Cc: Scott O'Neil (ALLEGIS GROUP HOLDINGS INC) ***@***.***>; Comment ***@***.***>
Subject: Re: [microsoft/sarif-sdk] Add binary vs. textual file enumeration. (PR #2739)
@michaelcfanning commented on this pull request.
________________________________
In src/Sarif/ZipArchiveArtifact.cs<#2739 (comment)>:
+ get => GetArtifactData().bytes;
+ set => this.bytes = value;
+ }
+
+ private (string text, byte[] bytes) GetArtifactData()
+ {
+ if (this.contents == null && this.bytes == null)
+ {
+ lock (this.archive)
+ {
+ if (this.contents == null && this.bytes == null)
+ {
+ string extension = Path.GetExtension(Uri.ToString());
+ if (this.binaryExtensions.Contains(extension))
+ {
+ this.bytes = new byte[Stream.Length];
ZipArchive supports providing a decompressed length. What's the performance impact of creating a memory stream as you describe? Will it effectively use the wrapped stream as its buffer? And so the length computation is computable before the memory stream did it? :)
This sounds like a nice next clean-up. It also makes me wonder if I've been over-concerned about handling non-seekable streams. This case may be entirely unremarkable based on what you've pointed out: we should simply read the original stream into a byte array, sniff it and then, if we need to decode, we wrap it in a memory stream (which will use the byte array as its buffer and not create a new copy).
-
Reply to this email directly, view it on GitHub<#2739 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BCW4VCNWS56FFVJZRM5YCH3YHEU47AVCNFSM6AAAAABABOE6WKVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTONJYHAZTGMJTGY>.
You are receiving this because you commented.Message ID: ***@***.******@***.***>>
|
Oh, you're right. I'm just proposing a copy with extra steps. Sorry.
What about disposing the stream and opening a new one to avoid the full copy? Call it "Seek(Origin), cave-man style."
From: Michael C. Fanning ***@***.***>
Sent: Thursday, November 30, 2023 4:49 PM
To: microsoft/sarif-sdk ***@***.***>
Cc: Scott O'Neil (ALLEGIS GROUP HOLDINGS INC) ***@***.***>; Comment ***@***.***>
Subject: Re: [microsoft/sarif-sdk] Add binary vs. textual file enumeration. (PR #2739)
@michaelcfanning commented on this pull request.
________________________________
In src/Sarif/EnumeratedArtifact.cs<#2739 (comment)>:
+ }
+
+ Stream = null;
+
+ return (this.contents, this.bytes);
+ }
+
+ private void RetrieveDataFromNonSeekableStream()
+ {
+ bool isText;
+ if (!SupportNonSeekableStreams)
+ {
+ throw new InvalidOperationException("Stream is not seekable. Provide a seekable stream or set the 'SupportNonSeekableStreams' property.");
+ }
+
+ this.bytes = new byte[Stream.Length];
OK, so first, 4096 is way too much data and I dropped it to 1024. This is probably still more data than we need.
Let's talk offline further about this (in particular about whether we can address the inefficiency in the non-seekable case of reading the whole thing). I'm just not aware of how we can handle a string decoding from the original non-seekable string if we've already obtained 1024 bytes from it.
-
Reply to this email directly, view it on GitHub<#2739 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BCW4VCNKBOWUGERPBJLI64LYHESODAVCNFSM6AAAAABABOE6WKVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTONJYHAYDONBZGM>.
You are receiving this because you commented.Message ID: ***@***.***>
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last lingering question on my mind is if it might be fine to close/reopen the stream when we cannot seek to origin. (Apologies if I'm missing a response on this one.) I think that depends on what's underneath the stream. If it's in memory already, it should be cheap, and that could allow the code to not care about seekability at the API level.
If close/reopen means you go to network a second time, then you'd need to know it's large for that to be worth it.
Since the proposed API design solicits a seekability behavior preference, there's an implication for the API surface here, and I don't know what our commitments are, to whom, or when they start applying in terms of those becoming intractable to change. I think my parting suggestion for consideration is that it might be better to remove us caring about seekability in our API surface, handle the details internally, and later add it if we definitely need it --than to go in the other direction. But as noted: I don't know the tradeoffs there, so I'll leave the question to you.
This is awesome and I'm super stoked to see where your coverage numbers landed.
EnumeratedArtifact
now sniffs artifacts to distinguish between textual and binary data. TheContents
property will be null for binary files (useBytes
instead).EnumeratedArtifact
raisesInvalidOperationException
for non-seekable streams (unless theSupportNonSeekableStreams
property is set). Non-seekable streams result in performance inefficiencies when expanding.MultithreadedZipArchiveArtifactProvider
now distinguishes binary vs. textual data using a hard-coded binary files extensions list. This data will be made configurable in a future change.EnumeratedArtifact
now automatically detects and populates aBytes
property for binary files such as executables and certificates.Previously, artifact enumeration assumed all data consisted of text. This breaks attempts to load/examine binary data.
In
EnumeratedArtifact
, we now examine the beginning of all seekable streams to detect whether we are seeing textual or binary data. We do so using an OSS snippet that looks for null/control characters in that data. This code also attempts to retrieve a range of text file encodings but this capability is not verified. Binary files are retrieved as byte arrays. Text files go to the existingContents
property.If we have a non-seekable stream, what we have to do is retrieve the file as a byte array and then subsequently convert it to text if we determine that it is textual. This is a performance degradation from our previous implementation and so we require users to explicitly set a
SupportNonSeekableStreams
property to ensure they know what they are opting into.Separately, the
MultithreadedZipArchiveArtifactProvider
does enumerate non-seekable streams, since they comprise compressed data. This provider uses a specializedZipArchiveArtifact
kind. Rather than accepting the costs to double-transform a text stream to bytes and then a decoded string, we simply hard-code a set of binary file extensions and only retrieve these as bytes. This property is settable. This isn't a really ideal solution but works for our immediate scenarios. You could imagine updating this class to optionally allow for the non-performant behavior (expand as bytes, sniff to classify as binary or text, convert text to strings). This would allow for higher fidelity classification without requiring advance knowledge of file extensions.Code coverage information:
94.38% :
EnumeratedArtifact
97.22% :
MultithreadedZipArtifactProvider
93.59% :
ZipArchiveArtifact