-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Story/geco 122 #23
base: develop
Are you sure you want to change the base?
Story/geco 122 #23
Conversation
…w the options on the UI in system config
…function in EditPropertiesContoller.java
…non managed Spring class
…stemHandler in the Tesseract Constructor
@Autowired | ||
private IKafkaRequestSender kafkaRequestSender; | ||
|
||
@Autowired | ||
private ISystemMessageHandler messageHandler; | ||
|
||
@Autowired | ||
private SystemMessageHandler msgHandler; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is already autowired
page.setOCRType(Properties.OCR_HOCR); | ||
} else { | ||
page.setOCRType(Properties.OCR_PLAINTEXT); | ||
page.setLanguageType(defaultLang); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you not set the default language if ocr type is HOCR?
try { | ||
Process proc; | ||
BufferedReader reader; | ||
proc = Runtime.getRuntime().exec(command); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merge with 359
cassiopeia/src/main/java/org/apache/tika/parser/ocr/TesseractOCRParser.java
Show resolved
Hide resolved
output = output + line + " "; | ||
} | ||
proc.waitFor(); | ||
reader.close(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the reader always needs to be closed, even if an exception is thrown.
reader = new BufferedReader(new InputStreamReader(proc.getInputStream())); | ||
String line = ""; | ||
while ((line = reader.readLine()) != null) { | ||
output = output + line + " "; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use stringbuffer here
lang_list = output.split(":"); | ||
|
||
String[] languages; | ||
languages = lang_list[1].split(" "); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don't you read the the output into a list right away since it's one language per line, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no string manipulations necessary
there don't seem to be any changes |
…cassiopeia into story/GECO-122 Merged required for the new changes
for (int i = 1; i < langs.size(); i++) { | ||
langTypeMap.put(langs.get(i), langs.get(i)); | ||
} | ||
tessPars = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you do this?
page.setOCRType(Properties.OCR_HOCR); | ||
} else { | ||
page.setOCRType(Properties.OCR_PLAINTEXT); | ||
} | ||
|
||
page.setLanguageType(defaultLang); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's move that to line 76 to keep it together with the rest of the default language code
* @param propertyManager | ||
* @return String[] - the list of available languages | ||
*/ | ||
public List<String> getTessLangs(IPropertiesManager propertyManager) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's not introduce a new dependency here, just use the TesseractOCRConfig class used in the other methods.
sysMsgHandler.handleMessage("Error while getting Tesserract languages.", e, MessageType.ERROR); | ||
} catch (InterruptedException e) { | ||
sysMsgHandler.handleMessage("Error while getting Tesserract languages.", e, MessageType.ERROR); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs a finally block in which you close the reader
String line = ""; | ||
while ((line = reader.readLine()) != null) { | ||
output.add(line); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you still need a comment here, explaining why you are doing what you're doing. and wasn't the first line, not a language?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add the comments.
The first line is "List of available languages:(3)" and therefore it is not required.
…age,up the page by 4-5 lines.
…TessLangs() in TesseractOCRConfig class.
No description provided.