Posted by Morgan Maguire · Dec 16, 2020 at 7:12 PM

Resolving PDF to HTML Issue

Hello all,

Following-up on the call this morning with

Irit

Ketan

Harsh

and

Jitesh

discussing progress with the PDF to HTML conversion projects, it is increasingly unlikely that we will complete converting all the required PDF documents into an HTML format before the end of the month. As a result, we need to come up with a contingency plan for dealing with the fact that approximately 50-100 documents will not have HTMLs available before the application launches in January.

Based on the discussion this morning (and follow-up discussion with

Irit

after the meeting), there three possible solutions:

(1) Integrate PDF pinpointing system from legacy application into new application

In this scenario, for documents where an HTML is not available, we make modifications to the application that allow us to integrate the PDF pinpointing systems from the legacy application, which would utilize migrated Citator and Subject Naviagor data to pinpoint to passages within PDF documents that already have relevant destinations codes inserted into he PDF.

After discussing this with

Harsh

, this solution appears to be very difficult to implement, and it would be unlikely that we would be able to complete this change to the application before launch.

(2) proceed without HTMLs and create prompts for users

In this scenario, we would perform an audit of the system to determine what features will be affected by the absence of an HTML and create specific requirements that would prevent users from performing those functions on documents where an HTML is unavailable.

Harsh

has pointed out that this solution would still require a significant amount of work, because it will affect many aspects of the application.

(3) Create placeholders HTMLs that warn user that a full HTML is not available

This scenario was suggested by

Irit

after our call, and would involve creating a placeholder HTML that would be used for all the documents where a converted HTML is unavailable. The placeholder HTML would include a message to the user indicating that the HTML is currently unavailable and to refer to the PDF.

This solution would still require application auditing to ensure it works across the different features, but could be a better solution than #2, because it doesn't require modifications to the application requirements.

Please review and provide your comments and suggestions, and we'll discuss the during the development team meeting tomorrow.

Thanks,

Morgan

Comments & Events

Morgan Maguire, CEO

Hello everyone,

Following-up on the call with

Melissa

Harsh

Ketan

and

Savannah

this morning, we are going to proceed with solution #3 above. To implement the solution we need to do the following:

Melissa please write the applicable users stories that create the following requirements across the subscriber side of the application:
- If Analysis data is unable to pull relevant HTML text from the HTML (i.e., the HTML paragraph tag does not exist within the HTML document), then the application will present the following message: The document you are attempting to view is currently in process and temporarily unavailable in a web friendly format. The document will be available in a web friendly format soon; however, in the interim, please select Full Case & Analysis and select View Original PDF to view the original source PDF document.
Ketan , Piyush and Jitesh , please discuss and finalize the process for creating blank HTML documents that can be used to upload for documents that will not be converted from PDF to HTML format by December 31st. The following is a sample of what should be contained in the blank HTML:

Blank HTML Placeholder.htm 40.3 KB • Download

Blank HTML Placeholder.docx 51.3 KB • Download

The number of documents requiring blank HTMLs will be finalized after our call scheduled on December 30th.
As HTML documents are converted from PDF format and completed through the automated and manual conversion process over the weeks ahead, Harsh and the team will upload and replace the blank HTML placeholder documents with the completed HTMLs, which will then connect with the relevant Analysis data and eliminate the message above.

Please let us know if you have any questions or concerns.

Thanks,

Morgan

Dec 22, 2020 at 6:54 PM Notified 12 people

Morgan Maguire, CEO

Hello all,

Further to a call this morning with

Ketan

and

Harsh

and my comment in the the following user story: https://industrialagency.tpondemand.com/restui/board.aspx?#page=userstory/20720, we're going to pursue a solution that will implement a version of criteria above for the Research Tools and Document View, but suppress the action menus that would allows users to add the documents to Research Notepad and Document Comparison.

Harsh

and

Ketan

are going to clarify the exact requirement in the the user story above, but my understanding of the solution is the following:

No HTML will be uploaded (no need for blank HTMLs).
When a document with no HTML is presented in the Research Tools (Subject Navigator, Article Citator, Jurisprudence Citator, Publication Citator and Terms & Phrases) or in Document View, the pinpoint reference tag will be presented but instead of the HTML, the following message will appear: The document you are attempting to view is currently in process and temporarily unavailable in a web friendly format. The document will be available in a web friendly format soon; however, in the interim, please view the original source PDF document here.
- The link will open the PDF version of the document in a separate browser tab.
- The Action Menu and Add to Document Comparison and Add to Notepad buttons will be greyed out and disabled
  
  image.png 156 KB • Download
  
  image.png 133 KB • Download

Note that because an HTML document is not required for this solution, there will be no need to create blank HTMLs as described in steps 2 and 3 above.

Please share any questions or concerns.

Thanks,

Morgan

Dec 23, 2020 at 6:14 PM Notified 13 people