Hi
Martin
,
As we discussed today, content reviewers encountered a
letter and punctuation spacing issue when they reviewed old migrated Automated and Manual ISLG HTMLs. As long as the HTML was an accurate representation of the text of the original PDF, we asked them to leave it 'as is' in the interest of time. We'd like to address this now and find a way to fix HTMLS with this spacing issue.
Issue: There are HTMLs with spaces between words and a subsequent punctuation mark (ie. commas, periods, semi-colons, brackets etc.).
For example, “word ,” or “word .”, etc.
Chris Thomas came across an example of this today in IC/0209/18 (Infinito Gold v. Costa Rica Award).
Using IC/0209/18 as an example. When I click the “find” function in the HTML, and enter a space character and a comma (or period etc.) character, I am able to find
all the occurrences.
Like this:
“ .” (there are 56 in the HTML)
or
“ ,” (there are 351 in the HTML)
or
“ ;” (there is 1 in the HTML)
or
“ )” (there are 32 in the HTML)
Is there a way to "find and replace" these issues in IC/0209/18? We want to make sure Analysis is saved before trying to make any changes to the HTML text. And we want to make sure Analysis is not disrupted. This is a migrated document, so shifted highlighted text may not be an issue we need to worry about for this one.
IF we can fix this issue in IC/0209/18, the next step would be to find a way to search app.islg for other documents that have this space issue.
Thank you,
Irit
cc
Paul
, I will update Chris as we learn about what options and solutions we have.
We believe it will be feasible to remove these extra spaces without affecting the existing tags and analysis.
We will have more details by next week.
Thanks,
-Martin
Would it be easy for you to find 2-3 documents that have the spacing issues?
Additionally, it would be ideal if these documents have at least one tag where the space issue is found.
For example, say the document is tagged as follows:
Thanks,
-Martin
I have found one example so far. Please have a look at IC/0209/18 Infinito Gold Ltd. v. Republic of Costa Rica, ICSID Case No. ARB/14/5, Award, 03 June 2021
Footnote 703
Footnote 528
Footnote 1240
Paragraph 359
Paragraph 484
Thank you,
Irit
Hi
Thanks,
-Martin
I also created a DevOps task to track this: https://dev.azure.com/tologix/ISLGRebuild/_workitems/edit/2041.
Thanks,
-Martin
Would it be possible for you to find another one with a similar setup (" , " issue with at least one tag where the space issue is found.
Thanks,
-Martin
Here is another example.
Please have a look at IC/0170/06:
Blue Bank International & Trust (Barbados) Ltd. v. Bolivarian Republic of Venezuela, ICSID Case No. ARB/12/20, Decision on Annulment, 22 June 2020 [Spanish]
Footnote 40 (JP Citator tag)
Footnote 44 (JP Citator tag)
Footnote 51 (JP Citator tag)
Thank you,
Irit
I have checked 2 documents in this task and found that,
Cc :
Hi
We could write a more complex script that:
This would not be a straight-forward project, and we would have to test extensively before applying to Production since we would be changing tag locations.
My recommendation is to create a DevOps item for this, but delay the work until we have a better sense of how we want to manage content moving forward.
Thanks,
-Martin
Please create a DevOps, and I'll place it as a low-priority item.
Thanks,
Paul
I have created DevOps https://dev.azure.com/tologix/ISLGRebuild/_workitems/edit/2063.
Thanks,
-Martin
I see the DevOps item is included in the current sprint - should it be included in this sprint?
Thanks,
Paul
Thanks for catching this. It was a mistake; it should have been put in the main backlog.
I have now moved the item to the backlog.
Thanks,
-Martin
Is this something that the PDF to HTML comparison tool could largely carry out or should we still keep this as a backlog item for the next sprint?
Please let me know.
Thanks,
Paul
It seems that any tagged documents that need spacing correction can cause disruptions to the tag. So maybe we just focus on new HTMLs?
We can catch these spacing issues during content review easily. The issue is document specific. We don't always have this spacing issue.
We can discuss this over zoom if it's easier.
Irit
As long as we don't have to adjust spacing post-tagging, we can take your approach. Let's discuss this tomorrow.
Thanks,
Paul