Basecamp Export

Paul Moon

Hi

Martin

:

This is a client-identified issue, which should be addressed as soon as possible as an unplanned/critical task.

As shown in the video below, Full Text Search of two w/3 "requests for arbitration" results in the following issues. The default search setting was applied, and I get 35 documents.

1. The result shows more paragraphs than the searched result (it seems to include all paragraphs that contain either two or requests for arbitration when it is supposed to show only when two appears within 3 words of requests for arbitration):

Niko: there are only three (paras 7, 93, 105) relevant paragraphs (https://app.investorstatelawguide.com/DocumentView?rt=Wk21nmb-fNY%3D&docid=5gYc5UAuRHc%3D&fts=1SkhCzXk8SI%3D&keywords=qwv8pfjjsowTn5tqCWJceMVyoBtk%20YdPsDid6Mf9Z218yAfpnEdaiA%3D%3D&document=kcBtwoQG9EEUVR9khq6ThZDwiyFc-IQQq7fReNfDQ99ryoiHQx4Vjw%3D%3D&st=MivJjWXtNNQ%3D&Stemming=F-l5KByMCyg%3D&WordNetSynonyms=3CK3xFumiHs%3D&Fuzzy=F-l5KByMCyg%3D&Fuzziness=BT35dTu48nk%3D), but the result shows 85 paragraphs; and
Occidental: there are only two (paras 20, 57) relevant paragraphs (https://app.investorstatelawguide.com/DocumentView?rt=Wk21nmb-fNY%3D&docid=FX-YW6RwyyU%3D&fts=1SkhCzXk8SI%3D&keywords=qwv8pfjjsowTn5tqCWJceMVyoBtk%20YdPsDid6Mf9Z218yAfpnEdaiA%3D%3D&document=tgI1IV4%20Q9UPYW4rwcohsypJtIJbambLi9t5tORNrZpryoiHQx4Vjw%3D%3D&st=MivJjWXtNNQ%3D&Stemming=F-l5KByMCyg%3D&WordNetSynonyms=3CK3xFumiHs%3D&Fuzzy=F-l5KByMCyg%3D&Fuzziness=BT35dTu48nk%3D), but the result shows 10.

2. When footnote references should be displayed, it does not list any footnote

AWG: there seems to be at least two (fn1 and fn5) relevant footnotes (https://app.investorstatelawguide.com/DocumentView?rt=Wk21nmb-fNY%3D&docid=30vCmPeckHo%3D&fts=1SkhCzXk8SI%3D&keywords=qwv8pfjjsowTn5tqCWJceMVyoBtk%20YdPsDid6Mf9Z218yAfpnEdaiA%3D%3D&document=3Mmh-3O%20bYK-PB33WrfMWfSRopihjQ5X1FzuD2wiHG5ryoiHQx4Vjw%3D%3D&st=MivJjWXtNNQ%3D&Stemming=F-l5KByMCyg%3D&WordNetSynonyms=3CK3xFumiHs%3D&Fuzzy=F-l5KByMCyg%3D&Fuzziness=BT35dTu48nk%3D), but it does not show any once users click on AWG itself while listing 37 paragraphs.

3. When users click on pinpoint references, HTML excerpts display randomly: see Niko and Occidental examples throughout the video.

Full Text Search results (22-Dec-21).mp4 22.8 MB • Download

Thanks,

Paul

Dec 22, 2021 at 8:35 PM Notified 1 person

Martin Laporte, CTO

Hi

Harsh

,

Can you or

Piyush

investigate this issue? Do we need to involve Radomir?
Please add this to the current sprint's unplanned issues.

Thanks,
--Martin

Jan 04, 2022 at 3:03 PM Notified 3 people

Harsh Parikh, Tech Lead

Hi

Radomir

,

Could you please look into this issue as high priority ?

Cc :

Martin

Paul

Rob

Jan 05, 2022 at 6:01 AM Notified 5 people

Harsh Parikh, Tech Lead

Hi

Radomir

,

The above issue has produced for production data. Following are the details of Production data.

Server : 10.68.138.11
Database : ISLGRebuildProduction
PDF & HTML Documents Path : (server : 10.68.138.10 ) E:\ISLGRebuildProduction\wwwroot\Documents

Martin

Rob

Paul

Jan 06, 2022 at 7:02 AM Notified 5 people

Radomir Mladenovic

Hi

Harsh

, I made an update to the search service to change how paragraphs are matched when the query is "Boolean".
https://1drv.ms/u/s!AugzRBG6eTFwjs0lrk7PiBMH6xXcog?e=tMUkkb

However, I've found another issue. For the sample document in the video ("Niko Resources (Bangladesh) Ltd. v. Bangladesh ..."), the FTS now returns two paragraphs instead of three. Paragraph 7 (pa7) is missing from the results.
I tracked this down to double "pa7" in the "IN-0086-03 - Niko v. Bapex and Petrobangla - Decision on Jurisdiction -C.html" file. It appears with two different IDs: IN008603_pa7 and hIN008603_pa7. The second occurrence of pa7 overwrites the first pa7 in the index so it cannot be found.
As the paragraph number is obviously not unique, I made a change to the indexer as well to use the complete paragraph id. Now you will be getting paragraph numbers as IN008603_pa7 instead of pa7 only so you need to handle that in your application as well.
Obviously, you will have to create a fresh index for FTS to apply this change.

Let me know if you have any questions.

Jan 08, 2022 at 10:14 PM Notified 6 people

Harsh Parikh, Tech Lead

Hi

Radomir

,

We have added ParagraphDatasource.cs file in DB Indexer Project and updated search controller for API project and generate new indexes.

But, Still we not able to found Pa7 and also when we click on paragraph the content is not getting.

Also, We are getting only Paragraph number not with ID as you mentioned above.

image.png 148 KB • Download

Following is my search payload request for search result.

{"searchRequest":"two w/3 \"requests for arbitration\"","SearchType":"Boolean","Stemming":true,"WordNetSynonyms":false,"Fuzzy":true,"Fuzziness":"1","FilterStatement":{"type":"boolean","Operator":"and","clauses":[{"type":"boolean","Operator":"or","clauses":[{"type":"boolean","Operator":"and","clauses":[{"type":"match","field":"DocumentContentTypeId","values":["13"]}]},{"type":"boolean","Operator":"and","clauses":[{"type":"match","field":"DocumentContentTypeId","values":["37"]}]},{"type":"boolean","Operator":"and","clauses":[{"type":"match","field":"DocumentContentTypeId","values":["12"]}]}]}]},"PageNum":0,"PageSize":20}

Following is my search request for fetch the content on paragraph click :

{"searchRequest":"two w/3 \"requests for arbitration\"","SearchType":"3","Stemming":"true","WordNetSynonyms":"false","Fuzzy":"true","Fuzziness":"1","paraId":"2E0045BB3FB92D6F1EB21FFF28724904#cGE5Mw=="}

Cc :

Martin

Jan 11, 2022 at 9:54 AM Notified 6 people

Radomir Mladenovic

Hi

Harsh

, are you absolutely sure you generated new indexes using updated indexer? Because when I generate and use the new index, I see pa7 and the paragraph reference contains the ID:

image.png 36.1 KB • Download

You can find a copy of my FTS index under E:\FTProot

I did however find an issue with paragraph highlighting, which was introduced with the recent fix for extracting long highlighted fields. That should be fixed now. Here it is for pa7:

image.png 34.1 KB • Download

You can get the update from
https://1drv.ms/u/s!AugzRBG6eTFwjtkvzJrYlJPaZxGvTA?e=58As0A

I'm sending you indexer as well, together with compiled executable. Functionally, there are no important changes to the indexer - I added an option to limit index generation to a subset (or single) document in order to create a smaller index for troubleshooting. The indexer update I sent you the last time was good as well.

Jan 16, 2022 at 6:35 PM Notified 5 people

Harsh Parikh, Tech Lead

Thanks

Radomir

The issue is resolved now.

Paul

, Please check on staging.islg and confirm to us.

Cc :

Martin

Jan 18, 2022 at 11:08 AM Notified 5 people

Paul Moon

Hi

Harsh

:

It looks good on staging.islg. Please deploy it to app.islg.

Thanks,

Paul

Jan 18, 2022 at 6:02 PM Notified 5 people

Paul Moon completed this to-do.

Jan 18, 2022 at 6:04 PM

TOLOGIX - ISLG App Rebuild

✔ Full Text Search result display

Comments & Events