-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCR #65
OCR #65
Conversation
src/main/kotlin/pl/edu/uj/ii/ksi/mordor/configuration/TesseractConfiguration.kt
Outdated
Show resolved
Hide resolved
src/main/kotlin/pl/edu/uj/ii/ksi/mordor/services/PDFTextExtractor.kt
Outdated
Show resolved
Hide resolved
src/main/kotlin/pl/edu/uj/ii/ksi/mordor/services/ImageTextExtractor.kt
Outdated
Show resolved
Hide resolved
# Conflicts: # src/main/kotlin/pl/edu/uj/ii/ksi/mordor/services/text/extractor/AutoDetectTextExtractor.kt # src/main/kotlin/pl/edu/uj/ii/ksi/mordor/services/text/extractor/FileContentValidator.kt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks almost ok, please manually check if it works on pdf files from mordor.ksi.ii.uj.edu.pl
src/main/kotlin/pl/edu/uj/ii/ksi/mordor/services/text/extractor/ImageTextExtractor.kt
Outdated
Show resolved
Hide resolved
src/main/kotlin/pl/edu/uj/ii/ksi/mordor/services/text/extractor/FileContentValidator.kt
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks ok, just some nits
if (content == null) { | ||
return null | ||
} | ||
val modified = content.replace('\n', ' ').replace('\t', ' ') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
val modified = content.replace('\n', ' ').replace('\t', ' ') | |
val modified = content.replace("\\s".toRegex(), " ") |
return ImageTextExtractor(tessBaseAPI).extract(file, maxLength) | ||
} | ||
} catch (e: IOException) { | ||
logger.error("File can not be read", e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logger.error("File can not be read", e) | |
logger.warn("File can not be read: " + file.absolutePath, e) |
No description provided.