TOLOGIX - ISLG App Rebuild

Full Text Search result display

Assigned to
Martin Laporte, CTO at Tologix Martin L.

Comments & Events

Paul Moon
Hi Martin Laporte, CTO at Tologix Martin :

This is a client-identified issue, which should be addressed as soon as possible as an unplanned/critical task.

As shown in the video below, Full Text Search of two w/3 "requests for arbitration" results in the following issues. The default search setting was applied, and I get 35 documents.

1. The result shows more paragraphs than the searched result (it seems to include all paragraphs that contain either two or requests for arbitration when it is supposed to show only when two appears within 3 words of requests for arbitration):
2. When footnote references should be displayed, it does not list any footnote
3. When users click on pinpoint references, HTML excerpts display randomly: see Niko and Occidental examples throughout the video.
Thanks,

Paul
Martin Laporte, CTO at Tologix
Hi Harsh Parikh, Tech Lead at DevIT Harsh ,

Can you or Piyush Kanpariya, DevIT Piyush investigate this issue? Do we need to involve Radomir?
Please add this to the current sprint's unplanned issues.

Thanks,
--Martin
Harsh Parikh, Tech Lead at DevIT
Hi Radomir Mladenovic, Contegra Radomir ,

Could you please look into this issue as high priority ?

Cc : Martin Laporte, CTO at Tologix Martin   Paul Moon Paul Rob Wiesenberg, Contegra Rob  
Harsh Parikh, Tech Lead at DevIT
Hi Radomir Mladenovic, Contegra Radomir ,

The above issue has produced for production data. Following are the details of Production data.

Server : 10.68.138.11
Database : ISLGRebuildProduction
PDF & HTML Documents Path : (server : 10.68.138.10 ) E:\ISLGRebuildProduction\wwwroot\Documents

Martin Laporte, CTO at Tologix Martin Rob Ferguson, Team Lead - Web Development at Industrial Rob Paul Moon Paul  
Radomir Mladenovic, Contegra
Hi Harsh Parikh, Tech Lead at DevIT Harsh , I made an update to the search service to change how paragraphs are matched when the query is "Boolean".
https://1drv.ms/u/s!AugzRBG6eTFwjs0lrk7PiBMH6xXcog?e=tMUkkb

However, I've found another issue. For the sample document in the video ("Niko Resources (Bangladesh) Ltd. v. Bangladesh ..."), the FTS now returns two paragraphs instead of three. Paragraph 7 (pa7) is missing from the results.
I tracked this down to double "pa7" in the "IN-0086-03 - Niko v. Bapex and Petrobangla - Decision on Jurisdiction -C.html" file. It appears with two different IDs: IN008603_pa7 and hIN008603_pa7. The second occurrence of pa7 overwrites the first pa7 in the index so it cannot be found.
As the paragraph number is obviously not unique, I made a change to the indexer as well to use the complete paragraph id. Now you will be getting paragraph numbers as IN008603_pa7 instead of pa7 only so you need to handle that in your application as well.
Obviously, you will have to create a fresh index for FTS to apply this change.

Let me know if you have any questions.
Harsh Parikh, Tech Lead at DevIT
Hi Radomir Mladenovic, Contegra Radomir ,

We have added ParagraphDatasource.cs file in DB Indexer Project and updated search controller for API project and generate new indexes.

But, Still we not able to found Pa7 and also when we click on paragraph the content is not getting.

Also, We are getting only Paragraph number not with ID as you mentioned above.



Following is my search payload request for search result.

{"searchRequest":"two w/3 \"requests for arbitration\"","SearchType":"Boolean","Stemming":true,"WordNetSynonyms":false,"Fuzzy":true,"Fuzziness":"1","FilterStatement":{"type":"boolean","Operator":"and","clauses":[{"type":"boolean","Operator":"or","clauses":[{"type":"boolean","Operator":"and","clauses":[{"type":"match","field":"DocumentContentTypeId","values":["13"]}]},{"type":"boolean","Operator":"and","clauses":[{"type":"match","field":"DocumentContentTypeId","values":["37"]}]},{"type":"boolean","Operator":"and","clauses":[{"type":"match","field":"DocumentContentTypeId","values":["12"]}]}]}]},"PageNum":0,"PageSize":20}


Following is my search request for fetch the content on paragraph click :

{"searchRequest":"two w/3 \"requests for arbitration\"","SearchType":"3","Stemming":"true","WordNetSynonyms":"false","Fuzzy":"true","Fuzziness":"1","paraId":"2E0045BB3FB92D6F1EB21FFF28724904#cGE5Mw=="}


Cc : Martin Laporte, CTO at Tologix Martin  
Radomir Mladenovic, Contegra
Hi Harsh Parikh, Tech Lead at DevIT Harsh , are you absolutely sure you generated new indexes using updated indexer? Because when I generate and use the new index, I see  pa7 and the paragraph reference contains the ID:


You can find a copy of my FTS index under E:\FTProot

I did however find an issue with paragraph highlighting, which was introduced with the recent fix for extracting long highlighted fields. That should be fixed now. Here it is for pa7:


You can get the update from 
https://1drv.ms/u/s!AugzRBG6eTFwjtkvzJrYlJPaZxGvTA?e=58As0A

I'm sending you indexer as well, together with compiled executable. Functionally, there are no important changes to the indexer - I added an option to limit index generation to a subset (or single) document in order to create a smaller index for troubleshooting. The indexer update I sent you the last time was good as well.
Harsh Parikh, Tech Lead at DevIT
Thanks Radomir Mladenovic, Contegra Radomir The issue is resolved now.

Paul Moon Paul , Please check on staging.islg and confirm to us.

Cc : Martin Laporte, CTO at Tologix Martin  
Paul Moon
Hi Harsh Parikh, Tech Lead at DevIT Harsh :

It looks good on staging.islg. Please deploy it to app.islg.

Thanks,

Paul
Paul Moon
Paul Moon completed this to-do.