As shown below, SN search field's suggested search does not take users to the branch selected and says "No records found" when the branch does in fact exist. I cannot isolate this issue to a class of branches. This behaviour seems similar to Re: Disputes & Dispute Documents search field and filters not working - TOLOGIX - ISLG App Rebuild where dtSearch indexing failure was the culprit according to
Harsh
.
1) The Search with ""("Double question") word is not performing through DTSeacrh Indexing.
2) if we are searching with long keyword then it is not working to get result do DtSearch API. For example (if we search with, Waste Management v. Mexico II Award analyzes cases where a persistent and serious breach of a contract by a State organ can constitute expropriation, or conduct tantamount to expropriation) then result is not found.
But, if we search with only (Waste Management v. Mexico II) then result is found.
I am putting
Radomir
and
Rob
in this thread to help us for this issue.
Radomir
, Could you please looking above video and my comment. Please suggest how we resolve this.
Cc :
Piyush
Notified 4 people
Radomir Mladenovic
Hi
Harsh
, I suspect the issue is searching for phrases that contain quotes and other special characters but I would have to reproduce this in order to confirm. I'm on a trip the whole next week and not sure if I'll have enough bandwidth to troubleshoot this. It would be useful if you could send me document IDs of sample documents for which search didn't work, as well as search payloads you used. Thanks.
Notified 5 people
Harsh Parikh,Tech Lead
Hi
Radomir
,
Following is the search payload request for Subject Navigator module which doesn't returns the result.
Search word : Waste Management v. Mexico II Award analyzes cases where a persistent and serious breach of a contract by a State organ can constitute expropriation, or conduct tantamount to expropriation
{"ErrorMessage":null,"WasError":false,"SearchRequest":"Waste Management v. Mexico II Award analyzes cases where a persistent and serious breach of a contract by a State organ can constitute expropriation, or conduct tantamount to expropriation","PageNum":0,"PageSize":0,"Fuzzy":false,"Fuzziness":1,"Stemming":true,"WordNetSynonyms":false,"Synonyms":false,"PhonicSearching":false,"SearchType":3,"SortField":null,"SortOrder":null,"SearchFlags":0,"Custom":null,"NoFrames":false,"EnableDateSearch":false,"StartDate":null,"EndDate":null,"FileConditions":null,"BooleanConditions":null,"QueryStatement":null,"FilterStatement":null,"Facets":null,"IxId":null,"IndexIds":null,"IncludeSynopsis":true,"Near":14,"ExcludeEnabled":false,"ExcludeTerm":null,"TreePath":null,"paraId":null,"FieldFilterName":null,"FieldFilterValues":null,"docId":null,"docUrl":null,"SearchTypeId":3}
Database : ISLGRebuildProduction Server : 10.68.138.14
Radomir
, Please take note that in second search payload, if we use only following search word then we are greeting search result.
Search word : Waste Management v. Mexico II
Cc :
Martin
Paul
Piyush
Notified 5 people
Martin Laporte,CTO
Hi
Radomir
,
Can you give us an update on this issue? Would it be easier to setup a Zoom call?
Thanks, -Martin
CC:
Rob
Notified 6 people
Radomir Mladenovic
Hi
Harsh
,
From what I see, there are no results found because you have HTML tags (<EM>) in the "branchname" field where the expected text appears:
"branchname": "<EM>Waste Management v. Mexico II</EM> Final Award analyzes cases where a persistent and serious breach of a contract by a State organ can constitute expropriation, or to conduct tantamount to expropriation"
dtSearch does not support HTML tags in meta fields so any tags are indexed as text and affect searching:
"SearchRequest": "\"Waste Management v. Mexico II\"" This works as the phrase is within <EM>
"SearchRequest": "\"<EM>Waste Management v. Mexico II</EM> Final Award analyzes cases where a persistent and serious breach of a contract by a State organ can constitute expropriation, or to conduct tantamount to expropriation\"" This works as the complete content with <EM> is included in the search phrase.
"SearchRequest": "\"EM Waste Management v. Mexico II EM Final Award analyzes cases where a persistent and serious breach of a contract by a State organ can constitute expropriation, or to conduct tantamount to expropriation\"" This works as well - you see EM is still present without tag brackets
"SearchRequest": "\"Waste Management v. Mexico II Final Award analyzes cases where a persistent and serious breach of a contract by a State organ can constitute expropriation, or to conduct tantamount to expropriation\"" This doesn't work.
I hope this explains it.
Notified 5 people
Harsh Parikh,Tech Lead
Got your point
Radomir
.. But, also our second query is regarding double quotation word ( "").
For Example, if we pass search word "Alter ego" objections then it doesn't work.
Please suggest.
Notified 5 people
Radomir Mladenovic
Harsh
quotes are not indexed.
If your search type is phrase, do not send quotes in the search request.
If search type is boolean for example, send with quotes around everything (e.g. "Alter ego objections")
Notified 5 people
Harsh Parikh,Tech Lead
Hi
Radomir
,
We can do one thing, we will add one column in strode procedure results. The column name will be BranchText.
The BranchText column doesn't include html element or double quatation.
Is it possible for you to change your logic in WEB API to search data on BranchText field instead of BranchName.
We will maintain BranchName column as it is for data display purpose.
Server : 10.68.138.13 (Web Server), 10.68.138.14 (Database Server) SP Name : FE_GetMasterTreeSearchForGenerateIndexing Databse Name : ISLGRebuildStaging Module : Subject Navigator
Let me know your thoughts.
Cc :
Martin
Piyush
Notified 5 people
Martin Laporte,CTO
Hi
Radomir
,
Before you proceed with the request above from
Harsh
, can you help me understand why we must pass the entire search string to dtSearch within double-quotes?
If I, as a user, type:
Waste Management v. Mexico II Final Award
I would expect to have a match, as I did not encapsulate my search string with double-quotes.
On the other hand, if I type:
"Waste Management v. Mexico II Final Award"
Then I understand that in our current setup, I would not get a match since this exact string cannot be found (since we're dealing with an <em> tag in the middle).
The above examples would mimic how Google and most other search engines behave. Do you agree?
Thanks, -Martin
Notified 5 people
Radomir Mladenovic
Harsh
you can add new column. As far as I remember, no changes to indexer are needed - the new column will be picked up and indexed automatically.
Martin
quotes are not necessary. It was just to demonstrate finding exactly the same document you know you have in the database. Without the quotes it should still find it but you will get many other documents as well (e.g. if you're using "any word" type of query).
Notified 5 people
Martin Laporte,CTO
Thanks,
Radomir
. I think that our search is behaving as expected then.
This issue has been resolved on staging.islg. Please check and confirm.
Cc :
Martin
Notified 5 people
Paul Moon
Hi
Harsh
:
It looks good on staging.islg. Please let me know when it is deployed to app.islg, as I'll have to let the client know. I'll leave this item open until then.
Thanks,
Paul
Notified 5 people
Martin Laporte,CTO
Hi
Paul
,
I tested on Production today and it looks like it's fixed.
Thanks, -Martin
Notified 5 people
Paul Moon
Hi
Martin
:
The main functionality is fixed, but how the highlights appear is an issue as shown below for quotation marks and italicized texts. Please add it to unplanned/critical. Let me know if you rather have the task as a separate item.
We have resolved the issue of double quotation word and html text word in Subject navigator module. But, As per above video by Paul those words are not going to highlighted.
Please take note that we have created one new column in result query without double quotation text and html text.
Please check from your side and let us know.
Cc :
Rob
Martin
Piyush
Paul
Notified 5 people
Martin Laporte,CTO
Hi
Radomir
,
Can you provide your input on the latest issue reported by Paul above?
Thanks, -Martin
CC:
Rob
Harsh
Notified 6 people
Radomir Mladenovic
Hi,
Search for "Waste Management v. Mexico II Final Award analyzes cases where a persistent and serious breach of a contract by a State organ can constitute expropriation, or to conduct tantamount to expropriation" (without quotes) returns highlights in two fields:
The "branchnametext" field has all the terms highlighted, where "branchname" has highlights from"serious breach of contract".
It's similar with "\"alter ego\" objections" - I see words properly highlighted in the branchnametext field.
I think you should use branchnametext to show highlighted fields, not the branchname. It's the same issue as discussed earlier - dtSearch does not support fields with HTML so searching and highlighting are not working properly. Simply use the plain text field version (branchnametext) instead of the original (branchname) whenever it appears in the highlighted fields section.
Hope this helps..
Notified 6 people
Harsh Parikh,Tech Lead
Thanks
Radomir
.
Paul
, Could you please check this task on staging.islg.
Cc
Martin
Notified 6 people
Paul Moon
Hi
Harsh
:
As shown below, "and" and "or" are not highlighted - is there a reason why?
As per above comment by the paul, the "and", "or" this kind of noise words are not going to highlighted in Subject navigator module. As you know, we had already removed all noise words.
Can you explain the right behavior of this ?
Cc :
Martin
Paul
Rob
Notified 6 people
Rob Wiesenberg
Hi
Harsh
,
You should be able to highlight AND and OR once they are out of the noise.dat file and you have reindexed. You might try doing a quick test searching with dtSearch Desktop to see if they are still getting highlighted. Also be sure that your indexer is using the noise.dat file that you have edited. it is easy to mistakenly use another noise.dat file that may be on the system in a different location. Let us know.
Thanks, Rob
Notified 5 people
Harsh Parikh,Tech Lead
Hi
Rob
,
We have removed the noise words. I don't remember why we removed noise words but when Morgan was available that time he insist us to remove those words.
Could you please explain what is the benefit to remove noise words ?
Removing the noise word (aka stopwords) list (noise.dat) allows those words to be indexed and therefor can be searched and highlighted. There were use cases where this was deemed useful. Usually it is when the noise words are part of phrases and add meaning. You can review the noise.dat file to get an idea of the default terms (a, about, after, all, also, an, and, any, are, as, at, be, been, but, by, can, come, could...).
Removing the noise words will increase the index size and indexing time as these words are very frequently found in all documents. Hope this helps.
Notified 6 people
Harsh Parikh,Tech Lead
Hi
Martin
and
Paul
,
Please go through the following 2 basecamp thread and let us know that Do we need to include noise words ?
I would like to recap the conversation to make sure we are all on the same page:
Paul
is reporting that the noise words do not get highlighted
Rob
and
Radomir
explained that removing the noise words from noise.dat will cause these words to be highlighted
Morgan asked that we remove the noise words over a year ago
Assuming the above is accurate, then it seems to me that even though we were supposed to exclude the noise words, we are currently including them.. please confirm.
-Martin
Notified 5 people
Harsh Parikh,Tech Lead
Hi
Rob
and
Radomir
,
I have included all noise words and re-indexing but still the noise words are not going to highlighted in Subject Navigator module.
Waste Management v. Mexico II Award analyzes cases where a persistent and serious breach of a contract by a State organ can constitute expropriation, or conduct tantamount to expropriation
Web Server : 10.68.138.13 DB Indexer Folder : E Drive -> ISLGRebuildStagingDBIndexer
Database Server : 10.68.138.14 Database Name : ISLGRebuildStaging
Please let me know if you need any more details.
Cc :
Paul
Martin
Notified 5 people
Martin Laporte,CTO
Hi
Rob
and
Radomir
,
I had a chat with
Harsh
and he now understands that we need to REMOVE the words from the noise.dat file if we want them to be highlighted in ISLG.
He is working on re-indexing with an empty noise.dat and will report back in this thread.
No further action is needed from you at this time.
Thanks, -Martin
CC:
Paul
Notified 5 people
Harsh Parikh,Tech Lead
Hi
Radomir
and
Rob
I have checked again with remove the noise words and re-indexing. still the noise words are not highlighted as per above screenshot.
Please check and provide your feedback.
Cc :
Martin
Paul
Notified 5 people
Rob Wiesenberg
Hi
Harsh
,
Martin
, and
Radomir
,
Looking at the screenshot it appears that all of the noise words except the word "AND" and the word "OR" are searchable and are getting highlighted. These two words are dtSearch search operator commands so they are not treated as searchable terms by default. Let me double check to see if there is a work around. I'll let you know.
Notified 5 people
Rob Wiesenberg
Harsh
,
Martin
and
Radomir
,
I am guessing that you are sending the queries to dtSearch as Boolean requests vs Any Word or All Words. In Boolean mode the terms AND and OR are search operators and therefor not treated as search terms and thus do not get highlighted. If you switch to All Words then these terms will be highlighted. There may be other implications so not sure if it will interfere with other search logic. Please see: https://support.dtsearch.com/webhelp/dtsearchcppapi/AllWords_and_AnyWords.html
Notified 5 people
Paul Moon
Thanks, everyone.
Upon discussion with
Martin
, we'll leave app.islg as is and not make any further changes.
As shown below, SN search field's suggested search does not take users to the branch selected and says "No records found" when the branch does in fact exist. I cannot isolate this issue to a class of branches. This behaviour seems similar to Re: Disputes & Dispute Documents search field and filters not working - TOLOGIX - ISLG App Rebuild where dtSearch indexing failure was the culprit according to
Please add it to unplanned/critical and address it asap for testing, as it affects the live site users.
Thanks,
Paul
Is this a dtSearch indexing issue?
I have created DevOps task https://dev.azure.com/tologix/ISLGRebuild/_workitems/edit/1991 for this issue.
Thanks,
-Martin
There are 2 parts we have found in above issue.
1) The Search with ""("Double question") word is not performing through DTSeacrh Indexing.
2) if we are searching with long keyword then it is not working to get result do DtSearch API. For example (if we search with, Waste Management v. Mexico II Award analyzes cases where a persistent and serious breach of a contract by a State organ can constitute expropriation, or conduct tantamount to expropriation) then result is not found.
But, if we search with only (Waste Management v. Mexico II) then result is found.
I am putting
Cc :
I'm on a trip the whole next week and not sure if I'll have enough bandwidth to troubleshoot this. It would be useful if you could send me document IDs of sample documents for which search didn't work, as well as search payloads you used. Thanks.
Following is the search payload request for Subject Navigator module which doesn't returns the result.
Search Word : "Alter ego" objections
{"ErrorMessage":null,"WasError":false,"SearchRequest":"\"Alter ego\" objections","PageNum":0,"PageSize":0,"Fuzzy":false,"Fuzziness":1,"Stemming":true,"WordNetSynonyms":false,"Synonyms":false,"PhonicSearching":false,"SearchType":3,"SortField":null,"SortOrder":null,"SearchFlags":0,"Custom":null,"NoFrames":false,"EnableDateSearch":false,"StartDate":null,"EndDate":null,"FileConditions":null,"BooleanConditions":null,"QueryStatement":null,"FilterStatement":null,"Facets":null,"IxId":null,"IndexIds":null,"IncludeSynopsis":true,"Near":14,"ExcludeEnabled":false,"ExcludeTerm":null,"TreePath":null,"paraId":null,"FieldFilterName":null,"FieldFilterValues":null,"docId":null,"docUrl":null,"SearchTypeId":3}
Search word : Waste Management v. Mexico II Award analyzes cases where a persistent and serious breach of a contract by a State organ can constitute expropriation, or conduct tantamount to expropriation
{"ErrorMessage":null,"WasError":false,"SearchRequest":"Waste Management v. Mexico II Award analyzes cases where a persistent and serious breach of a contract by a State organ can constitute expropriation, or conduct tantamount to expropriation","PageNum":0,"PageSize":0,"Fuzzy":false,"Fuzziness":1,"Stemming":true,"WordNetSynonyms":false,"Synonyms":false,"PhonicSearching":false,"SearchType":3,"SortField":null,"SortOrder":null,"SearchFlags":0,"Custom":null,"NoFrames":false,"EnableDateSearch":false,"StartDate":null,"EndDate":null,"FileConditions":null,"BooleanConditions":null,"QueryStatement":null,"FilterStatement":null,"Facets":null,"IxId":null,"IndexIds":null,"IncludeSynopsis":true,"Near":14,"ExcludeEnabled":false,"ExcludeTerm":null,"TreePath":null,"paraId":null,"FieldFilterName":null,"FieldFilterValues":null,"docId":null,"docUrl":null,"SearchTypeId":3}
Database : ISLGRebuildProduction
Server : 10.68.138.14
Search word : Waste Management v. Mexico II
Cc :
Can you give us an update on this issue? Would it be easier to setup a Zoom call?
Thanks,
-Martin
CC:
From what I see, there are no results found because you have HTML tags (<EM>) in the "branchname" field where the expected text appears:
dtSearch does not support HTML tags in meta fields so any tags are indexed as text and affect searching:
This works as the phrase is within <EM>
This works as the complete content with <EM> is included in the search phrase.
This works as well - you see EM is still present without tag brackets
This doesn't work.
For Example, if we pass search word "Alter ego" objections then it doesn't work.
Please suggest.
We can do one thing, we will add one column in strode procedure results. The column name will be BranchText.
The BranchText column doesn't include html element or double quatation.
Is it possible for you to change your logic in WEB API to search data on BranchText field instead of BranchName.
We will maintain BranchName column as it is for data display purpose.
Server : 10.68.138.13 (Web Server), 10.68.138.14 (Database Server)
SP Name : FE_GetMasterTreeSearchForGenerateIndexing
Databse Name : ISLGRebuildStaging
Module : Subject Navigator
Let me know your thoughts.
Cc :
Before you proceed with the request above from
If I, as a user, type:
On the other hand, if I type:
The above examples would mimic how Google and most other search engines behave.
Do you agree?
Thanks,
-Martin
Thanks for creating DevOps task https://dev.azure.com/tologix/ISLGRebuild/_workitems/edit/2007 to track this.
This issue has been resolved on staging.islg. Please check and confirm.
Cc :
It looks good on staging.islg. Please let me know when it is deployed to app.islg, as I'll have to let the client know. I'll leave this item open until then.
Thanks,
Paul
I tested on Production today and it looks like it's fixed.
Thanks,
-Martin
The main functionality is fixed, but how the highlights appear is an issue as shown below for quotation marks and italicized texts. Please add it to unplanned/critical. Let me know if you rather have the task as a separate item.
Thanks,
Paul
I have created unplanned task https://dev.azure.com/tologix/ISLGRebuild/_workitems/edit/2026 in DevOps to track this.
Thanks,
-Martin
We have resolved the issue of double quotation word and html text word in Subject navigator module. But, As per above video by Paul those words are not going to highlighted.
Please take note that we have created one new column in result query without double quotation text and html text.
Please check from your side and let us know.
Cc :
Can you provide your input on the latest issue reported by Paul above?
Thanks,
-Martin
CC:
Search for "Waste Management v. Mexico II Final Award analyzes cases where a persistent and serious breach of a contract by a State organ can constitute expropriation, or to conduct tantamount to expropriation" (without quotes) returns highlights in two fields:
The "branchnametext" field has all the terms highlighted, where "branchname" has highlights from"serious breach of contract".
It's similar with "\"alter ego\" objections" - I see words properly highlighted in the branchnametext field.
I think you should use branchnametext to show highlighted fields, not the branchname. It's the same issue as discussed earlier - dtSearch does not support fields with HTML so searching and highlighting are not working properly. Simply use the plain text field version (branchnametext) instead of the original (branchname) whenever it appears in the highlighted fields section.
Hope this helps..
Cc
As shown below, "and" and "or" are not highlighted - is there a reason why?
As per above comment by the paul, the "and", "or" this kind of noise words are not going to highlighted in Subject navigator module. As you know, we had already removed all noise words.
Can you explain the right behavior of this ?
Cc :
You should be able to highlight AND and OR once they are out of the noise.dat file and you have reindexed. You might try doing a quick test searching with dtSearch Desktop to see if they are still getting highlighted. Also be sure that your indexer is using the noise.dat file that you have edited. it is easy to mistakenly use another noise.dat file that may be on the system in a different location. Let us know.
Thanks,
Rob
We have removed the noise words. I don't remember why we removed noise words but when Morgan was available that time he insist us to remove those words.
Could you please explain what is the benefit to remove noise words ?
Cc :
Here are references to earlier discussions explaining where to find the current noise words used by the indexer and the web app:
* Re: Search Implementations - TOLOGIX - Contegra Search Audit
* Re: dtSearch Implementation in ILG - TOLOGIX - Infrastructure LawGuide (ILG)
Removing the noise word (aka stopwords) list (noise.dat) allows those words to be indexed and therefor can be searched and highlighted. There were use cases where this was deemed useful. Usually it is when the noise words are part of phrases and add meaning. You can review the noise.dat file to get an idea of the default terms (a, about, after, all, also, an, and, any, are, as, at, be, been, but, by, can, come, could...).
Removing the noise words will increase the index size and indexing time as these words are very frequently found in all documents. Hope this helps.
Please go through the following 2 basecamp thread and let us know that Do we need to include noise words ?
* Re: Search Implementations - TOLOGIX - Contegra Search Audit
* Re: dtSearch Implementation in ILG - TOLOGIX - Infrastructure LawGuide (ILG)
I would like to recap the conversation to make sure we are all on the same page:
Assuming the above is accurate, then it seems to me that even though we were supposed to exclude the noise words, we are currently including them.. please confirm.
-Martin
I have included all noise words and re-indexing but still the noise words are not going to highlighted in Subject Navigator module.
Could you please check and give your feedback.
Search text :
Waste Management v. Mexico II Award analyzes cases where a persistent and serious breach of a contract by a State organ can constitute expropriation, or conduct tantamount to expropriation
Web Server : 10.68.138.13
DB Indexer Folder : E Drive -> ISLGRebuildStagingDBIndexer
Database Server : 10.68.138.14
Database Name : ISLGRebuildStaging
Please let me know if you need any more details.
Cc :
I had a chat with
He is working on re-indexing with an empty noise.dat and will report back in this thread.
No further action is needed from you at this time.
Thanks,
-Martin
CC:
I have checked again with remove the noise words and re-indexing. still the noise words are not highlighted as per above screenshot.
Please check and provide your feedback.
Cc :
Looking at the screenshot it appears that all of the noise words except the word "AND" and the word "OR" are searchable and are getting highlighted. These two words are dtSearch search operator commands so they are not treated as searchable terms by default. Let me double check to see if there is a work around. I'll let you know.
I am guessing that you are sending the queries to dtSearch as Boolean requests vs Any Word or All Words. In Boolean mode the terms AND and OR are search operators and therefor not treated as search terms and thus do not get highlighted. If you switch to All Words then these terms will be highlighted. There may be other implications so not sure if it will interfere with other search logic. Please see: https://support.dtsearch.com/webhelp/dtsearchcppapi/AllWords_and_AnyWords.html
Upon discussion with