This is a client-identified issue, which should be addressed as soon as possible as an unplanned/critical task.
As shown in the video below, Full Text Search of two w/3 "requests for arbitration" results in the following issues. The default search setting was applied, and I get 35 documents.
1. The result shows more paragraphs than the searched result (it seems to include all paragraphs that contain either two or requests for arbitration when it is supposed to show only when two appears within 3 words of requests for arbitration):
However, I've found another issue. For the sample document in the video ("Niko Resources (Bangladesh) Ltd. v. Bangladesh ..."), the FTS now returns two paragraphs instead of three. Paragraph 7 (pa7) is missing from the results. I tracked this down to double "pa7" in the "IN-0086-03 - Niko v. Bapex and Petrobangla - Decision on Jurisdiction -C.html" file. It appears with two different IDs: IN008603_pa7 and hIN008603_pa7. The second occurrence of pa7 overwrites the first pa7 in the index so it cannot be found. As the paragraph number is obviously not unique, I made a change to the indexer as well to use the complete paragraph id. Now you will be getting paragraph numbers as IN008603_pa7 instead of pa7 only so you need to handle that in your application as well. Obviously, you will have to create a fresh index for FTS to apply this change.
Let me know if you have any questions.
Notified 6 people
Harsh Parikh,Tech Lead
Hi
Radomir
,
We have added ParagraphDatasource.cs file in DB Indexer Project and updated search controller for API project and generate new indexes.
But, Still we not able to found Pa7 and also when we click on paragraph the content is not getting.
Also, We are getting only Paragraph number not with ID as you mentioned above.
Following is my search payload request for search result.
{"searchRequest":"two w/3 \"requests for arbitration\"","SearchType":"Boolean","Stemming":true,"WordNetSynonyms":false,"Fuzzy":true,"Fuzziness":"1","FilterStatement":{"type":"boolean","Operator":"and","clauses":[{"type":"boolean","Operator":"or","clauses":[{"type":"boolean","Operator":"and","clauses":[{"type":"match","field":"DocumentContentTypeId","values":["13"]}]},{"type":"boolean","Operator":"and","clauses":[{"type":"match","field":"DocumentContentTypeId","values":["37"]}]},{"type":"boolean","Operator":"and","clauses":[{"type":"match","field":"DocumentContentTypeId","values":["12"]}]}]}]},"PageNum":0,"PageSize":20}
Following is my search request for fetch the content on paragraph click :
{"searchRequest":"two w/3 \"requests for arbitration\"","SearchType":"3","Stemming":"true","WordNetSynonyms":"false","Fuzzy":"true","Fuzziness":"1","paraId":"2E0045BB3FB92D6F1EB21FFF28724904#cGE5Mw=="}
Cc :
Martin
Notified 6 people
Radomir Mladenovic
Hi
Harsh
, are you absolutely sure you generated new indexes using updated indexer? Because when I generate and use the new index, I see pa7 and the paragraph reference contains the ID:
You can find a copy of my FTS index under E:\FTProot
I did however find an issue with paragraph highlighting, which was introduced with the recent fix for extracting long highlighted fields. That should be fixed now. Here it is for pa7:
I'm sending you indexer as well, together with compiled executable. Functionally, there are no important changes to the indexer - I added an option to limit index generation to a subset (or single) document in order to create a smaller index for troubleshooting. The indexer update I sent you the last time was good as well.
Notified 5 people
Harsh Parikh,Tech Lead
Thanks
Radomir
The issue is resolved now.
Paul
, Please check on staging.islg and confirm to us.
Cc :
Martin
Notified 5 people
Paul Moon
Hi
Harsh
:
It looks good on staging.islg. Please deploy it to app.islg.
This is a client-identified issue, which should be addressed as soon as possible as an unplanned/critical task.
As shown in the video below, Full Text Search of two w/3 "requests for arbitration" results in the following issues. The default search setting was applied, and I get 35 documents.
1. The result shows more paragraphs than the searched result (it seems to include all paragraphs that contain either two or requests for arbitration when it is supposed to show only when two appears within 3 words of requests for arbitration):
Paul
Can you or
Please add this to the current sprint's unplanned issues.
Thanks,
--Martin
Could you please look into this issue as high priority ?
Cc :
The above issue has produced for production data. Following are the details of Production data.
Server : 10.68.138.11
Database : ISLGRebuildProduction
PDF & HTML Documents Path : (server : 10.68.138.10 ) E:\ISLGRebuildProduction\wwwroot\Documents
https://1drv.ms/u/s!AugzRBG6eTFwjs0lrk7PiBMH6xXcog?e=tMUkkb
However, I've found another issue. For the sample document in the video ("Niko Resources (Bangladesh) Ltd. v. Bangladesh ..."), the FTS now returns two paragraphs instead of three. Paragraph 7 (pa7) is missing from the results.
I tracked this down to double "pa7" in the "IN-0086-03 - Niko v. Bapex and Petrobangla - Decision on Jurisdiction -C.html" file. It appears with two different IDs: IN008603_pa7 and hIN008603_pa7. The second occurrence of pa7 overwrites the first pa7 in the index so it cannot be found.
As the paragraph number is obviously not unique, I made a change to the indexer as well to use the complete paragraph id. Now you will be getting paragraph numbers as IN008603_pa7 instead of pa7 only so you need to handle that in your application as well.
Obviously, you will have to create a fresh index for FTS to apply this change.
Let me know if you have any questions.
We have added ParagraphDatasource.cs file in DB Indexer Project and updated search controller for API project and generate new indexes.
But, Still we not able to found Pa7 and also when we click on paragraph the content is not getting.
Also, We are getting only Paragraph number not with ID as you mentioned above.
Following is my search payload request for search result.
{"searchRequest":"two w/3 \"requests for arbitration\"","SearchType":"Boolean","Stemming":true,"WordNetSynonyms":false,"Fuzzy":true,"Fuzziness":"1","FilterStatement":{"type":"boolean","Operator":"and","clauses":[{"type":"boolean","Operator":"or","clauses":[{"type":"boolean","Operator":"and","clauses":[{"type":"match","field":"DocumentContentTypeId","values":["13"]}]},{"type":"boolean","Operator":"and","clauses":[{"type":"match","field":"DocumentContentTypeId","values":["37"]}]},{"type":"boolean","Operator":"and","clauses":[{"type":"match","field":"DocumentContentTypeId","values":["12"]}]}]}]},"PageNum":0,"PageSize":20}
Following is my search request for fetch the content on paragraph click :
{"searchRequest":"two w/3 \"requests for arbitration\"","SearchType":"3","Stemming":"true","WordNetSynonyms":"false","Fuzzy":"true","Fuzziness":"1","paraId":"2E0045BB3FB92D6F1EB21FFF28724904#cGE5Mw=="}
Cc :
You can find a copy of my FTS index under E:\FTProot
I did however find an issue with paragraph highlighting, which was introduced with the recent fix for extracting long highlighted fields. That should be fixed now. Here it is for pa7:
You can get the update from
https://1drv.ms/u/s!AugzRBG6eTFwjtkvzJrYlJPaZxGvTA?e=58As0A
I'm sending you indexer as well, together with compiled executable. Functionally, there are no important changes to the indexer - I added an option to limit index generation to a subset (or single) document in order to create a smaller index for troubleshooting. The indexer update I sent you the last time was good as well.
Cc :
It looks good on staging.islg. Please deploy it to app.islg.
Thanks,
Paul