-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug fixes and complete refactoring #3
base: master
Are you sure you want to change the base?
Conversation
…tests, for detailed changes see CHANGELOG and pull request GioviQ#3
😱 |
I have forgotten to ask some questions regarding special cases that are defined in the standard, but that were and are not implemented correctly, details that I noticed where I am unsure about their correctness and potential improvements. Deviations from the standard
Inconsistencies between documents
Proposed improvementsI thought of some features that might be useful to some people, but I am unsure if implementing them would be worth it.
Please take your time and don't rush with a response. I am not in a hurry 😊 |
My implementation is based on real cases and not all specifications are implemented. Consider that countries do not follow in some cases the standard
My primary need is establish the age of a person, so if you want you can return a null date of birth instead of an excepion.
Implementation based on https://github.com/ZsBT/mrz-java/blob/master/src/main/java/com/innovatrics/mrz/records/FrenchIdCard.java
See cheminfo/mrz#2
No, because the optical reader often returns wrong readings and why should I skip checks?
I don't need it. My use case is to establish person age and document expiration.
|
Foreword
First of all, I want to apologize for the big scale of this pull request. If you need any assistance or want me to make changes before merging, please don't hesitate to ask. I tried to preserve as much as possible of the idea behind your code while improving its structure and readability. All changes that users of the library will experience are listed in the
CHANGELOG
file. In this pull request, I want to state the reasoning behind certain changes I made.Project structure
I renamed the test project to adhere to the commonly used naming scheme of
ProjectName.Test
. I also added aLICENSE
file to your repository that contains the Apache 2.0 license to match thePackageLicenseExpression
you declared inMRZ.csproj
. Furthermore, I also added aCHANGELOG
file that contains the changes that a user of the library will experience when upgrading to a newer version.I removed the
Parser
class since it didn't offer any benefit because there isn't any state that is managed by it besides the unnecessarysum
variable that could introduce a race condition ifParser.DetectFields
was simultaneously called from multiple different threads. I have rewritten your helper functions to be (essentially) pure functions, which means that the code is now thread-safe.Document class
I renamed the
Format
,Type
andNumber
properties toDocumentFormat
,DocumentType
andDocumentNumber
to express that their values refer to the document itself and not the person that is represented by the document. I added a private property for each check digit so that reading values can be decoupled from validation checking. I removed theGender
enum and changed the datatype of theGender
property tochar
because a user of the library most likely has their own way of encoding gender information and in my opinion, they should directly perform the mapping themselves. If a value is not present in the mrz, the default value is now an empty string instead ofnull
because it is the default value for a missing capturing group. I am unsure if this is an improvement or deterioration, so I would kindly ask for your feedback regarding this topic.Regex
I added names to all capturing groups, which in my opinion greatly improves the readability of all regex patterns and also enabled me to write one common
Document
initialization block for all mrz formats.Tests
I found that the old regex patterns allowed the
|
character forDocumentType
andGender
values and the seventh position of the first SDL line since in regex,[ACI]
means "A single character of: A, C or I" and a|
within square brackets is treated as character and not asOR
. I fixed the patterns and added a unit test that checks that an exception is thrown if|
is present in any position of any mrz format.Benchmarks
I ran benchmarks on my machine to make sure that my refactored code does not perform significantly worse than your old code. As it turns out, my refactored code is 5-10x faster and allocates 5-7x less memory per execution. I highly suspect that the reason for those improvements is due to my use of the
Regex.IsMatch
method, which internally performs regex caching. This approach is also considered best practice.NuGet (MRZ.csproj)
I added the
LICNSE
,CHANGELOG
and docstring files to the NuGet package. This did not result in an increased package size but a decrease from 39kB to 34kB. The content of theCHANGELOG
file is also automatically used as content in thePackageReleaseNotes
tag for the NuGet package.To allow for deterministic builds, please build with the
-p:"ContinuousIntegrationBuild=true"
flag. Packing now also produces a symbol package that should also be uploaded to nuget to enable better debugging support for users in combination with SourceLink.Here is an example of how I would build and publish the packages:
The
dotnet nuget push
command will publish both, the code and the symbol package, if they are located in the same folder.Thank you for this great library and again, if you need any assistance with this pull request, please don't hesitate to ask.