Ability to extract a Provision List directly from HTML document
Hi everyone,
Following up on our meeting this morning and the item 1078 in ISLG Sprint 7: https://dev.azure.com/tologix/ISLGRebuild/_sprints/backlog/ISLGRebuild%20Team/ISLGRebuild/Sprint%207?workitem=1078, we will proceed as follows:
Following up on our meeting this morning and the item 1078 in ISLG Sprint 7: https://dev.azure.com/tologix/ISLGRebuild/_sprints/backlog/ISLGRebuild%20Team/ISLGRebuild/Sprint%207?workitem=1078, we will proceed as follows:
-
and
Paul
will provide
Irit
with a PDF sample (short treaty document).
Harsh
-
will convert the document with the automated converter and send
Harsh
and
Paul
a copy of the HTML.
Irit
-
will send the sample PDF and HTML to CDS and request that they manually generate the HTML from the PDF, following the HTML tagging schema used for Dispute Documents, but will the additional guidance of the automatically converted HTML document provided by
Irit
.
Harsh
, please ensure CDS returns the document back as a high priority.
Irit
- CDS will create the manually generate HTML document
-
and team will use the manually generated HTML document to build and test the manual provision extract tool within the new ISLG application.
Harsh
Please let me know if I'm missing anything.
Thanks,
Morgan
Thanks,
Morgan
Attached are two short Treaty documents: BIT/1118 (10 pages) and BIT/1121 (11 pages)
Do we have another that is better quality? I'm concerned these will have issues with the automated converter when
Thanks,
Morgan
I have uploaded one sample Treaty PDF document on staging.islg and converted into html through our application so the provisions are automatically added into database.
Now, we need to provide this same html document to CDS team so they can make further Treaty & Arbitration html by using this sample html file as reference.
You can forward this attached html file to CDS team to take this html as reference.
However, I noticed that the HTML document above does not have all the structure we want it to. For example, Chapter I is created as a standalone <div> tag and does not indicate that Rules 1-12 are all contained within Chapter I:
Thanks,
Morgan
Rule 1 is div tag in html then class="provision" and id need to apply for Rule 1 div tag ( <div data-pro='RULE2' data-key='32' class='para paralvl1 provision' id='BIT1365_paRULE2.1'>)
I will discuss with Martin in today's call so he can guide or discuss with CDS team.
Can you ask CDS to start converting the 2 documents you have posted in this thread?
They should use the following items to guide them:
Could CDS turn these documents within 1 week (or less)?
Also note that after this project is complete, we will probably need to make small adjustments to our HTML coding manual.
Thanks,
--Martin
Yes, I will send an email to CDS today. I'll let them know this is high priority and we need this back within a week or less as you suggested.
Thanks,
Irit
Morgan
Thanks,
Irit
CDS has completed the HTMLs for samples, BIT/0916 and BIT/0917:
BIT-0916- Italy-Mongolia (1993) [English].html - TOLOGIX - PDF to HTML Conversion
BIT-0917- Netherlands-Nicaragua (2000) [English].html - TOLOGIX - PDF to HTML Conversion
Thank you,
Irit
Morgan
We checked the html and found that there is spelling mistake in html for provisionDefinition class.
As per following screenshot, there is spelling mistake.
In Red box, you can see the spelling of provisionDefination.
Please ask to CDS team to correct the spelling in both html and revert back to us.
The correct word is : provisionDefination
The provisionDefination is use for fetch the title of provision.
There is a typo in "provisionDefination", but this is the way it was coded in our code, so unfortunately this is what CDS will need to use moving forward.
Thanks,
--Martin
Today, We have deployed this task on staging.islg. You can check and provide your feedback.
When you select Edit Provision from Dropdown menu then you can get button for Provisions.
You can use following 2 attached html files for fetch the provisions. As per Martin above comment, We replace the spell and update the following 2 HTMLs.
Yes, that's the latest version of the coding manual. Note all the coding manual versions are contained here: https://islg.egnyte.com/fl/9VgmHKWgaz.
Does this mean we're stuck with that typo?
Thanks,
Morgan
Thanks,
Morgan
We are stuck with this typo for the time being.
One of these parts is the PDF to HTML conversion engine. Since we are in the midst of upgrading our conversion engine, I think the best approach is to make that change at the time when we do the official switch to the new engine.
Thanks,
-Martin
Morgan
Success. I uploaded the HTMLs for BIT/0916 and BIT/0917 on staging.islg. I then selected the Get Provision button. The provision list was pulled from the HTML.
Please let me know if you would like for me to confirm this on DevOps and if I need to change the state to “Tested on Staging”.
Thank you,
Irit
Great! Yes, please change status to "Tested on Staging" if you are satisfied with the functionality.
Thanks,
--Martin
Here, I have updated user manual for Treaty and Arbitration html documents. You can find the update in second page. I marked out as grey color.
you can update the text or content if you want.
Hi
Thanks!
CC:
Will do, thanks.
Irit
Morgan