TOLOGIX - ISLG App Rebuild

PDF quality test for document upload

Assigned to
Guillaume Morin-Simard, UX Designer/Researcher at Industrial Guillaume M. Harsh Parikh, Tech Lead at DevIT Harsh P. Jitesh Dhuravala, DevIT Jitesh D. Melissa Cowell, General Manager at Industrial Melissa C.
Notes
As part of the document upload process within ISLG (and ILG) we need integrate a PDF quality test to ensure the PDF document meets certain quality thresholds before it is converted into an HTML document. The premise would be that if the PDF quality threshold isn't met, the HTML conversion is cancelled, and the admin user is prompted to send the document for manual HTML conversion.

It is proposed that we PDF quality test tool used earlier this year for this purposed (see Message Board - TOLOGIX - PDF to HTML Conversion).

Comments & Events

Morgan Maguire, CEO
Hi Jitesh Dhuravala, DevIT Jitesh   and Harsh Parikh, Tech Lead at DevIT Harsh ,

In addition to the question above, would it be possible for you to provide more detail on how the PDF Quality tool works, and what exactly it is testing to determine the quality score? The reason I ask is that while Irit Weinfeld Irit was putting together documents for the OCR file sample, she noticed that the documents below were given "100" quality score even though their apparent quality seems less than perfect (see PDF Quality spreadsheet below). Could you explain why these documents have a "100" quality score?

BIT/0020 Canada - Venezuela BIT (1996)
Melissa Cowell, General Manager at Industrial
Harsh Parikh, Tech Lead at DevIT Harsh  

Please see questions above.
Harsh Parikh, Tech Lead at DevIT
Hi Melissa Cowell, General Manager at Industrial Melissa and Morgan Maguire, CEO Morgan ,

We are checking PDF's font pixel in PDF quality tool. In above sample the PDF's quality is not good but the font appear in PDF is good like we can clearly read and bold.
Melissa Cowell, General Manager at Industrial
Harsh Parikh, Tech Lead at DevIT Harsh  

How long does it take on average to assess a quality score in the PDF quality tool?

Mel
Harsh Parikh, Tech Lead at DevIT
Hi Melissa Cowell, General Manager at Industrial Melissa ,

Its depend on PDF File's pages. But you can consider average 20 Seconds to assess a quality score.
Morgan Maguire, CEO
Thanks Harsh Parikh, Tech Lead at DevIT Harsh . Based on your answer above concerning the quality score, does that mean the PDF quality tool is not capable of determining whether a document is of sufficient quality to be converted with the PDF to HTML conversion tool?

If the documents above are getting a 100% quality score, but are in fact insufficient quality for automated conversion, there is a problem with the PDF quality tool. The whole purpose of the tool is to allow us to identify which documents are of sufficient quality for automated conversion.

Jitesh Dhuravala, DevIT Jitesh , how do you suggest we resolve this issue?

Morgan
Jitesh Dhuravala, DevIT
Hi Morgan Maguire, CEO Morgan ,

Currently, PDF quality tool does identify whether a document is sufficient to be converted into HTML or not by our conversion tools, it does not mean if quality of 100% document will convert 100% of document quality.

Quality tool is used to check whether it is capable to convert by our algorithm or not. Yes definitely most of case if document quality is 100% than our algorithm is capable to convert into HTML but in some cases it might be  quality should be not 100%, the reason behind is  content of PDF document and its format ( indentation, style, bullets, numbering ) etc.

Our main target of quality tools it was, The premise would be that if the PDF quality threshold isn't met, the HTML conversion is cancelled, and the admin user is prompted to send the document for manual HTML conversion.

Also we are continuously working on conversion tool to  improve quality of document.


Thanks,
Jitesh
Morgan Maguire, CEO
OK. Thanks for clarifying Jitesh Dhuravala, DevIT Jitesh . However, this still leaves an outstanding problem. How do we deal with documents similar to ones pointed out above (where they pass the PDF quality test, but the content/format of the document is not compatible with the conversion tool)? We need the PDF quality test tool to be able to identify these documents so that they can be flagged and sent for manual HTML coding.

Thanks,

Morgan 
Morgan Maguire, CEO
Hi Jitesh Dhuravala, DevIT Jitesh ,

Would it be possible to a response to my questions in my previous comment. Note that I'm adding this to the agenda for Thursday meeting.

Morgan 
Jitesh Dhuravala, DevIT
Hi Morgan Maguire, CEO Morgan ,

Current conversation algorithm is capable to identify whether document is converted to html or not, if not it will update status as a "conversion fail". Further we can sent it to manual conversion to html. 

Our team is working on improving conversion algorithm to accomplish all our requirements and we are sure to achieve all your needs. 

Still if you planning to integrate quality tool to display score of document quality please provide user story and updated wire frame in new ISLG and ILG so we will plan accordingly. Let's we will discuss more about this on Thursday call.

Thanks,
Jitesh
Morgan Maguire, CEO
Ok. Sounds good, Jitesh Dhuravala, DevIT Jitesh . Thanks for clarifying. Let's discuss next steps during the call on Thursday.

Morgan 
Paul Moon
With Martin Laporte, CTO at Tologix Martin 's new HTML converter, this will no longer be relevant. 
Paul Moon
Paul Moon completed this to-do.