We have started to migrate the tagging data by following manner. We have completed Article Citator tool. Rest of the tool are in progress mode.
1) Article Citator 2) Jurisprudence Citator 3) Publication Citator 4) Subject Navigator 5) Term and Phrase
We have found broken tags for Article Citator module through Automated process but still there is possibility to have more such broken tags record which might be missed through automated query which has to be identified manually.
Thanks for the report above. Let's discuss during our call this morning.
Thanks,
Morgan
Notified 4 people
Harsh Parikh,Tech Lead
Hi
Morgan
,
Today we are looking in to Subject Navigator's tool tagging data and we found that there are lots of tagging data which will be not match with our html structure.
There is no autonated query to find this kind of data so we should have to check manually those data.
Here as example, Following, I have added some sample tagging data which we find through automated query and same will update in our broken tag sheet.
3.2 and 2 OR 1-14
A.46-A.47
A.34-A.35
D.1.6-D.1.7
VII.1.B.4-VII.1.B.5
VIII.5.2-VIII.5.7
VIII.7.1-VIII.7.3
14.iii.h-14.iii.i
14.iii.h-14.iii.i
31.1(a)-31.2
But, Still, there are lots of tagging data which we need to check manually. Currently, We are going to migrate those data.
Notified 4 people
Morgan Maguire,CEO
Hi
Harsh
,
Yes, this isn't unexpected. I expect these are the same documents in your Article Citator report above, because the same destination codes were used in the Subject Navigator as was used in the Article Citator.
As we've discussed previously, my suggestion is to create a report that includes all the documents you suspect will create broken tags, and then we'll convert all the legacy Analysis data into tags that will be compatible with the new HTML tags.
Thanks,
Morgan
Notified 4 people
Morgan Maguire,CEO
Further to our call this morning, we will proceed with resolving issues concerning broken tags as follows:
Harsh
will generate a report that list all the dispute documents that have issue with broken tags.
Paul
,
Irit
and
Morgan
will test these documents within application and determine whether the documents do in fact have problems in new application, and will generate a report listing documents that will require manual adjustment to their analysis data.
Piyush
and team will generate a table of correspondence similar to the sample below that list the PDF Destination Codes in the same rows as the applicable HTML tags:
Using the table of correspondence, the analysis data from the legacy application will be converted into data that is compatible with HTML tagging structure.
Harsh
and team will upload and replace analysis data in the new application with compatible tagging data.
Please share any questions or concerns.
Thanks,
Morgan
Notified 4 people
Morgan Maguire,CEO
Hi
Harsh
,
Following up on above, could you confirm when you'll generate the report above?
Thanks,
Morgan
Notified 5 people
Harsh Parikh,Tech Lead
Hi
Morgan
,
Here, I have attached Broken Tag Report as per automated query. In this sheet, you also find some tags are accurate but still in sheet. Those tag are finding by automated query but those tags are migrated properly.
But, As mentioned earlier, There is still chance that we missed the broken tags which we need to check manually.
Notified 5 people
Morgan Maguire,CEO
Hi
Harsh
,
Thanks for the report above. The team has started to review it, and we'll start testing and identifying documents that appear to have fundamental issues in compatibility with the HTMLs. Note that this will be depend on the availability of the completed HTML documents in the application, which at this point only applies to 20 of the 67 source documents identified in your report.
We'll report back with our results next week.
Thanks,
Morgan
Notified 5 people
Morgan Maguire,CEO
Hi
Harsh
,
Ketan
,
Jitesh
and
Piyush
,
I've reviewed the following report produced by the
Paul
,
Nafiseh
and
Irit
:
1. Breakup paragraph ranges for decimal paragraphs
We're not having any issues with decimal paragraphs for data entered into the Analysis list in the new application. For example, if the data was entered as "pa1.20" the link is connecting to correct paragraph within the HTML. Therefore, there will not be a need alter the source paragraph schema for these documents. However, there is a widespread problem where paragraph ranges with decimals (e.g., 1.20-1.25) were not broken up into individual paragraph (e.g., pa1.20, pa1.21, pa1.22, pa1.23, pa1.24, pa1.25). As a result, we'll need to identify all dispute documents that have decimal paragraph schema and re-enter the Analysis data into new application with paragraph ranges broken up. Once we do this, the broken tag issue will be resolved for these documents.
Harsh
and
Ketan
, please consult and suggest how best to implement this process.
2. Correct incongruency between destination codes and HTML paragraph IDs for custom paragraph documents.
For the manual coding of these documents, we requested that the HTML paragraph IDs used in the HTMLs match the schema used in the destination codes: https://docs.google.com/spreadsheets/d/17KxVaMhJ1Vj85pI5-P_FyiOetqsw_GcdaurvfUDlLFY/edit#gid=1751772033. Therefore, for Part II - Chapter B - Paragraph 2, the paragraph ID should have been "UN001503_II/B,pa2", which would have matched up with the applicable destination codes:
We could correct the HTML paragraph IDs to match the schema of the destination codes; or
We could change the source Analysis data to match the schema used the HTML paragraph IDs.
My suggestion is that we apply the second solution, because it will allow us to correct paragraph range issues that are also present in the Analysis data.
Harsh
,
Jitesh
and
Piyush
, please consult and suggest which solution we should apply (I'm open to other suggestions as well).
Note that the solution we apply will also need to be applied to the other custom paragraph documents (AF/0022/01 and IC/0032/01) assuming the schemas between the destination codes and HTML paragraph IDs do not match in these documents as well.
Thanks,
Morgan
Notified 6 people
Morgan Maguire,CEO
Hi
Ketan
,
Please note that I'm adding this to the agenda for discussion tomorrow, could you please ensure these issues are discussed internally before the meeting.
Thanks,
Morgan
Notified 7 people
Morgan Maguire,CEO
Hi
Harsh
,
Following-up on our discussion earlier today, please let us know when you have the following reports ready for review:
Report containing analysis data from the legacy application for dispute documents where the source paragraph ID contains a decimal (e.g., 1.20). This data will then be examined and changed to separate paragraph ranges, and then uploaded to replace existing analysis data in rebuild.investorstatelawguide.com
Report for dispute documents with custom paragraph number schema, but also with table of correspondence to march analysis data with HTML paragraph IDs.
Thanks,
Morgan
Notified 7 people
Harsh Parikh,Tech Lead
Hi
Morgan
,
For 2nd Point, Here, I have attached sample spread sheet to add Broken tags.
As discussed, The tab name is UIN number of Dispute Document.
Column (B) - Your team will add the analysis tagged data for all those dispute document and revert back to us.
Then After, Our PDF to HTML team,
Jitesh
,
Piyush
, will add new html paragraph id in Column (C).
Then after will provide you for review and once you confirm then will update those new html tagged data in migrated database.
For 1st Point regrading decimal ranges, Here I have attached sheet with old analysis data and new migrated analysis data.
You can just analysis and marked with some color that which ranges data we need to correct. There are few data are already corrected so no need to update.
Also, there is no decimal ranges data for Publication Citator.
I also added a column (PDF Destination Code) to clarify what PDF destination code is associated with each analysis data code to clarify any ambiguities for the team:
Please note as well that for UN/0015/03, I was able to list the data in the order it appears within the document.
Let me know when the team has completed filling in the NewHTMLTagID field, and then we'll make the appropriate updates to the analysis data.
Thanks,
Morgan
Notified 8 people
Harsh Parikh,Tech Lead
Hi
Jitesh
and
Piyush
,
As we discussed, Please use above attached spread sheet and start to fill NewHtmlTagId from Html File against destination code.
Notified 8 people
Jitesh Dhuravala
Hi
Morgan
and
Harsh
,
We've updated the spreadsheet with HTML ID in the NewHTMLTagID field for the relevant destination code, but we've also found some missing destination code in the Excel sheet according to the PDF file. So you can please check and provide your feedback about some of the destination codes available in PDF but not specified in the excel sheet.
Great. Thank you,
Jitesh
. This report looks great. Yes, don't worry about the missing destination codes. We only included destination codes that are relevant to analysis data. The list you've generated should be sufficient to convert all the legacy analysis data into the compatible data for the new application.
Harsh
, now that we have the tables in the report above, we'll need to use these to convert all the legacy analysis data into data compatible with the new application. Also, we'll also need to deal with any paragraph ranges that will have similar issues to what we've discussed here: Re: Paragraph Ranges not being Migrated Properly - TOLOGIX - ISLG App Rebuild. Please confirm on what we should do on next steps to convert the data.
Thanks,
Morgan
Notified 8 people
Harsh Parikh,Tech Lead
Hi
Morgan
,
We checked this spread sheet and now we need to manually update all tagging data in database.
Ok. Sounds good
Harsh
. Let me know when you've updated the data, and we'll take a look to confirm the issues are resolved.
Thanks,
Morgan
Notified 8 people
Harsh Parikh,Tech Lead
Hi
Morgan
,
Tomorrow, We are planning to update Broken tags on database from by using above attached excel sheet.
We will update the tags on both rebuild.islg and staging.islg.
Hope this is fine.
Notified 8 people
Morgan Maguire,CEO
OK. Great,
Harsh
. However, would it be possible for you to produce a spreadsheet showing the updated data for me to review before it is uploaded into the new application?
Thanks,
Morgan
Notified 8 people
Harsh Parikh,Tech Lead
Hi
Morgan
,
We have updated tag data for UIN (AF-0022-01 & UN-0015-03) on staging.islg.
Please check and let us know.
For the 3rd UIN IC-0032-01, There is no tagging data on legacy application. Hence, Tag data is not migrated on staging.islg.
Also, Could you please let us know what you need in spread sheet as we have updated the tag data as per attached sheet in database.
Once you confirmed then will update on app.islg.
Notified 8 people
Morgan Maguire,CEO
Hi
Harsh
,
Looks like a switch in the data analysis data may have happened with IC/0032/01, where the analysis data was transferred to IC/0032/02.
Paul
, could you please investigated what happened here? Currently, IC/0032/01 has been HTML coded, but it looks like analysis data is entered into the IC/0032/02. Do we need to HTML code IC/0032/02, because that document currently isn't in the queue?
Thanks,
Morgan
Notified 8 people
Morgan Maguire,CEO
Hi
Harsh
,
I've examined the updated data for AF/0022/01 and UN/0015/03, and the updated location data looks good. However, I found the following problems with the Analysis data:
In AF/0022/01, the Subject Navigator data is not accessible in Analysis data table:
IC/0032/01 [French] was originally DCD'd back in 2010 when no English version was available. IC/0032/02 [English] was found back in July 2019. Unfortunately, DCD Status List wasn't updated when all the data was transferred from IC/0032/01 to IC/0032/02 in 2019, so IC/0032/01 was initially included on the HTML queue; however, IC/0032/02 was added on the HTML queue in January 2021 (automatic row 1380). DCD status is now updated with column AH indicating the original dispute document UIN (e.g., IC/0032/01) if DCD data had been subsequently transferred from the UIN in AH to an updated UIN under column A (e.g., IC/0032/02).
Thanks,
Paul
Notified 8 people
Morgan Maguire,CEO
OK. Thanks
Paul
.
Harsh
, as
Paul
has confirmed, we'll need to switch to IC/0032/02. The document is currently in the automated HTML queue, and is currently pending QC content review.
Irit
, could you please ensure this document is prioritized so we can get the HTML complete.
When the HTML document is ready, we'll need to again perform the matchup between the PDF destination codes and the HTML paragraph IDs. I've gone ahead and updated the Custom Paragraph spreadsheet under the IC-0032-02 tab to reflect the updated PDF destination codes from the legacy application:
IC/0032/02 has been reviewed and requires corrections. It is saved in the QC Application.
Thanks, Irit
Notified 8 people
Harsh Parikh,Tech Lead
Hi
Morgan
,
We checked data and found that for UIN/0015/03, The instrument is missing of UIN ARB/0056.
And, In database we have set Inactive. It mean deleted the UIN ARB/00056. that's why the legal instrument references are not visible for those entries.
After discussing the issue about ARB/0056 with
Paul
, we're going to handle this issue internally as we do our updates in content on app.investorstatelawguide.com. However,
Paul
will post another to-do with details on deleting all existing Article Citator referring to ARB/0056.
Bringing the discussion back to issues at hand with this to-do, the following needs to be addressed:
Jitesh
and
Piyush
please insert the updates HTML paragraph IDs when the IC/0032/02 completes HTML conversion.
We have updated Broken Tag data of UIN IC/0032/02 on staging.islg except SN module.
Please check and confirm and also let us know how will update SN module tagged data for this UIN as we are not found in attached sheet.
Notified 8 people
Morgan Maguire,CEO
Hi
Harsh
,
The updated entries in the Citators on staging.islg look good. However, I'm not sure why you weren't able to retrieve the data for the Subject Navigator entries. For example, "paIC/0032/02, pa107, paIC/0032/02, pa108" should have used "pa37.1, pa38.1":
We have started to migrate the tagging data by following manner. We have completed Article Citator tool. Rest of the tool are in progress mode.
1) Article Citator
2) Jurisprudence Citator
3) Publication Citator
4) Subject Navigator
5) Term and Phrase
We have found broken tags for Article Citator module through Automated process but still there is possibility to have more such broken tags record which might be missed through automated query which has to be identified manually.
Thanks for the report above. Let's discuss during our call this morning.
Thanks,
Morgan
Today we are looking in to Subject Navigator's tool tagging data and we found that there are lots of tagging data which will be not match with our html structure.
There is no autonated query to find this kind of data so we should have to check manually those data.
Here as example, Following, I have added some sample tagging data which we find through automated query and same will update in our broken tag sheet.
But, Still, there are lots of tagging data which we need to check manually. Currently, We are going to migrate those data.
Yes, this isn't unexpected. I expect these are the same documents in your Article Citator report above, because the same destination codes were used in the Subject Navigator as was used in the Article Citator.
As we've discussed previously, my suggestion is to create a report that includes all the documents you suspect will create broken tags, and then we'll convert all the legacy Analysis data into tags that will be compatible with the new HTML tags.
Thanks,
Morgan
Thanks,
Morgan
Following up on above, could you confirm when you'll generate the report above?
Thanks,
Morgan
Here, I have attached Broken Tag Report as per automated query. In this sheet, you also find some tags are accurate but still in sheet. Those tag are finding by automated query but those tags are migrated properly.
But, As mentioned earlier, There is still chance that we missed the broken tags which we need to check manually.
Thanks for the report above. The team has started to review it, and we'll start testing and identifying documents that appear to have fundamental issues in compatibility with the HTMLs. Note that this will be depend on the availability of the completed HTML documents in the application, which at this point only applies to 20 of the 67 source documents identified in your report.
We'll report back with our results next week.
Thanks,
Morgan
I've reviewed the following report produced by the
and there are two issues we need to resolve.
1. Breakup paragraph ranges for decimal paragraphs
We're not having any issues with decimal paragraphs for data entered into the Analysis list in the new application. For example, if the data was entered as "pa1.20" the link is connecting to correct paragraph within the HTML. Therefore, there will not be a need alter the source paragraph schema for these documents. However, there is a widespread problem where paragraph ranges with decimals (e.g., 1.20-1.25) were not broken up into individual paragraph (e.g., pa1.20, pa1.21, pa1.22, pa1.23, pa1.24, pa1.25). As a result, we'll need to identify all dispute documents that have decimal paragraph schema and re-enter the Analysis data into new application with paragraph ranges broken up. Once we do this, the broken tag issue will be resolved for these documents.
2. Correct incongruency between destination codes and HTML paragraph IDs for custom paragraph documents.
The report above indicates that we are having an issue with UN/0015/03, this is one of the three documents flagged as needing special attention during the manual coding process: Confirm integration of HTML document with Custom paragraph schemas - TOLOGIX - PDF to HTML Conversion.
For the manual coding of these documents, we requested that the HTML paragraph IDs used in the HTMLs match the schema used in the destination codes: https://docs.google.com/spreadsheets/d/17KxVaMhJ1Vj85pI5-P_FyiOetqsw_GcdaurvfUDlLFY/edit#gid=1751772033. Therefore, for Part II - Chapter B - Paragraph 2, the paragraph ID should have been "UN001503_II/B,pa2", which would have matched up with the applicable destination codes:
However, inspecting the HTML generated for UN/0015/03:
the paragraph ID "UN001503_partII.B.pa2" was used, which caused the Analysis data for "II/B,pa2" to not work within the new application:
There are two possible solutions to this problem:
Note that the solution we apply will also need to be applied to the other custom paragraph documents (AF/0022/01 and IC/0032/01) assuming the schemas between the destination codes and HTML paragraph IDs do not match in these documents as well.
Thanks,
Morgan
Please note that I'm adding this to the agenda for discussion tomorrow, could you please ensure these issues are discussed internally before the meeting.
Thanks,
Morgan
Following-up on our discussion earlier today, please let us know when you have the following reports ready for review:
Morgan
For 2nd Point, Here, I have attached sample spread sheet to add Broken tags.
As discussed, The tab name is UIN number of Dispute Document.
Column (B) - Your team will add the analysis tagged data for all those dispute document and revert back to us.
Then After, Our PDF to HTML team,
Then after will provide you for review and once you confirm then will update those new html tagged data in migrated database.
For 1st Point regrading decimal ranges, Here I have attached sheet with old analysis data and new migrated analysis data.
You can just analysis and marked with some color that which ranges data we need to correct. There are few data are already corrected so no need to update.
Also, there is no decimal ranges data for Publication Citator.
Here is the updated custom paragraph spreadsheet with the analysis data for AF/0022/01, IC/0032/01 and UN/0015/03:
I also added a column (PDF Destination Code) to clarify what PDF destination code is associated with each analysis data code to clarify any ambiguities for the team:
Please note as well that for UN/0015/03, I was able to list the data in the order it appears within the document.
Let me know when the team has completed filling in the NewHTMLTagID field, and then we'll make the appropriate updates to the analysis data.
Thanks,
Morgan
As we discussed, Please use above attached spread sheet and start to fill NewHtmlTagId from Html File against destination code.
We've updated the spreadsheet with HTML ID in the NewHTMLTagID field for the relevant destination code, but we've also found some missing destination code in the Excel sheet according to the PDF file. So you can please check and provide your feedback about some of the destination codes available in PDF but not specified in the excel sheet.
Please find updated spreadsheet below.
Thanks,
Jitesh
Thanks,
Morgan
We checked this spread sheet and now we need to manually update all tagging data in database.
Also, we need to update manually these data Re: Paragraph Ranges not being Migrated Properly - TOLOGIX - ISLG App Rebuild.
Thanks,
Morgan
Tomorrow, We are planning to update Broken tags on database from by using above attached excel sheet.
We will update the tags on both rebuild.islg and staging.islg.
Hope this is fine.
Thanks,
Morgan
We have updated tag data for UIN (AF-0022-01 & UN-0015-03) on staging.islg.
Please check and let us know.
For the 3rd UIN IC-0032-01, There is no tagging data on legacy application. Hence, Tag data is not migrated on staging.islg.
Also, Could you please let us know what you need in spread sheet as we have updated the tag data as per attached sheet in database.
Once you confirmed then will update on app.islg.
Looks like a switch in the data analysis data may have happened with IC/0032/01, where the analysis data was transferred to IC/0032/02.
Thanks,
Morgan
I've examined the updated data for AF/0022/01 and UN/0015/03, and the updated location data looks good. However, I found the following problems with the Analysis data:
Morgan
IC/0032/01 [French] was originally DCD'd back in 2010 when no English version was available. IC/0032/02 [English] was found back in July 2019. Unfortunately, DCD Status List wasn't updated when all the data was transferred from IC/0032/01 to IC/0032/02 in 2019, so IC/0032/01 was initially included on the HTML queue; however, IC/0032/02 was added on the HTML queue in January 2021 (automatic row 1380). DCD status is now updated with column AH indicating the original dispute document UIN (e.g., IC/0032/01) if DCD data had been subsequently transferred from the UIN in AH to an updated UIN under column A (e.g., IC/0032/02).
Thanks,
Paul
When the HTML document is ready, we'll need to again perform the matchup between the PDF destination codes and the HTML paragraph IDs. I've gone ahead and updated the Custom Paragraph spreadsheet under the IC-0032-02 tab to reflect the updated PDF destination codes from the legacy application:
Thanks,
Morgan
Irit
IC/0032/02 has been reviewed and requires corrections. It is saved in the QC Application.
Thanks,
Irit
We checked data and found that for UIN/0015/03, The instrument is missing of UIN ARB/0056.
And, In database we have set Inactive. It mean deleted the UIN ARB/00056. that's why the legal instrument references are not visible for those entries.
ARB/0056 isn't missing. However, it doesn't appear to have been migrated to the new application under Arbitration Rules, rather than Treaties as it was supposed to in the legacy application: https://www.investorstatelawguide.com/CoreComponents/AddDocumentStep1?docId=1294. Also, it doesn't look like the provision list was migrated from the legacy application as well: https://www.investorstatelawguide.com/CoreComponents/AddDocumentStep2Case1and2?docId=1294#
Also, further to my screenshots above, why am I unable to view the records from the Subject Navigator in the Analysis List?
Thanks,
Morgan
After discussing the issue about ARB/0056 with
Bringing the discussion back to issues at hand with this to-do, the following needs to be addressed:
Morgan
The Viewing issue for SN is resolved across all UIN on staging.islg. Please check and confirm.
We have just updated tag data on staging.islg for UIN no. UN/0015/03 and AF/0022/01.
Let me know that can we update on app.islg ?
For, IC/0032/02 tag data will update once we will receive updated excel sheet from PDF to Html team.
IC/0032/02 looks great. No further corrections required. The excel sheet has been updated.
Thank you,
Irit
The updated view for the Subject Navigator data within the Analysis List looks great. Please make the necessary changes on app.islg.
As
Thanks,
Morgan
I have updated the Broken tag data on app.islg and also made necessary change on app.islg
Please find attached spreadsheet, we have inserted html Ids into the NewHTMLTagid column within the CustomParagraph spreadsheet.
Thanks,
Jitesh
The updates on app.islg look great.
Thanks,
Morgan
As per following Screenshot, The SN Tagged data for UIN IC/0032/02 we not found in spread sheet which you attched above.
Please let us know what we do for this data ?
We have updated Broken Tag data of UIN IC/0032/02 on staging.islg except SN module.
Please check and confirm and also let us know how will update SN module tagged data for this UIN as we are not found in attached sheet.
The updated entries in the Citators on staging.islg look good. However, I'm not sure why you weren't able to retrieve the data for the Subject Navigator entries. For example, "paIC/0032/02, pa107, paIC/0032/02, pa108" should have used "pa37.1, pa38.1":
Thanks,
Morgan
We have updated Broken Tag data of UIN IC/0032/02 on both staging.islg and app.islg.
The updated data looks great. I believe we have resolved everything is this to-do. I've marked it complete.
Morgan