Search Implementations
Hello all,
I had a call with
Rob
this afternoon about implementing the recommendations outlined in their report: Tologix ISLG - App Audit Review - 2019-08-31.pdf - TOLOGIX - Contegra Search Audit. I've decided I would like
Rob
and his team to get directly involved in the implementation process by building the custom indexers required for all the subscriber side searches for the new ISLG and ILG applications. The Contegra team will get involved when the SQL databases are finalized (i.e., after the admin sites are complete) and the UI for the searches is finalized (i.e., after the subscriber site designs are complete), which is scheduled to be complete by the end of October.
In the meantime, to ensure everyone has a clear understanding of our how roles are going to be delegated,
Rob
is going to produce an implementation plan that we'll circulate and review during the next team meeting scheduled for Thursday, September 26th.
Please let me know if you have any questions or concerns.
Thanks,
Morgan
I had a call with
In the meantime, to ensure everyone has a clear understanding of our how roles are going to be delegated,
Please let me know if you have any questions or concerns.
Thanks,
Morgan
Further to my message above,
At this point, we anticipate Contegra performing the following:
Custom Indexer Service: Contegra will create dtSearch indexes containing both the full text and the relevant SQL fielded information, which will include:
In the interim,
Thanks,
Morgan
Following up on my note above, could you please confirm whether you foresee any problems with integrating these plans into your existing development timelines.
Thanks,
Morgan
We are planning to start dtSearch implementation in ILG as we have need dtsearch indexes for subscriber side search functionality. Before we start, It is good if we take one call and than make it start in order to achieve things which we were discussed in our last call.
Thanks,
Jitesh
I had a call with
Thanks,
Morgan
Sounds good. I can provide this by the end of the week.
Mel
Thanks,
Morgan
Tuesday, 10th March Holiday for us. Please schedule call after 10th any other day.
Thanks,
Jitesh
Thanks,
Morgan
Unfortunately next Wednesday 3/11 we are not available at 7:30 ASM (Vancouver time. We are available for a call on Tuesday, Wednesday or Friday next week, after 10 AM your time. Thursday next week is wide open. We can have a call earlier in the day. Please let me know if one of those times works for you and Jitesh.
Thanks,
Rob
I'll send a calendar invite with details.
Thanks,
Morgan
Great, please let me know if you would like to use a call-in number or Skype.
Thanks,
Rob
We'll be connecting through Zoom. The details are in the calendar invite I sent out earlier.
Thanks,
Morgan
In preparation for the call on Thursday,
The document is broken down into four categories of searches:
For the purposes dtSearch, the only search that performs searches of the document texts is the Full Text Search (and perhaps the Global Search via the Full Text Search), and thus I believe is the only search that will require dtSearch. dtSearch is also used for the Subject Navigator in the old application, because we integrated the Boolean and linguistic options available through dtSearch. However, it appears we discarded these requirements in the new application (
Let me know if you have any questions or concerns. I would be happy to hop on a call in advance of the call on Thursday to explain anything in the document, so that we can focus on how to optimize all the relevant searches.
Thanks,
Morgan
We are know how existing dtSearch implemented and working in current application but we are not reached stage of new ISLG subscriber side which
Thanks,
Jitesh
Thanks,
Morgan
We went back and forth on this a couple times. The intention was to simplify/standardize the search across research tools. That being said, we did end up modifying the behaviour for the Subject Navigator and integrating boolean and linguistic tools is certainly possible.
We could incorporate the following criteria:
I can enter a keyword search that is powered by the dtSearch search engine
Let me know your preference and this can be added to the user story and wireframes.
Mel
Morgan
Following up on the above, given that we're offering these advanced features in the current application in Subject Navigator, I think we should have them available in the new application as well. Could you please update the applicable requirements in the user stories, and then we'll plan to integrate dtSearch in the searches for the Subject Navigator and the Full Text Search.
Thanks,
Morgan
Will do.
Mel
Following up on the call this morning with
Lastly, for ILG, we are likely to adopt of a similar approach of creating a customized index for the dtSearch keyword search, but I would like us to finalize things in ISLG before we start that work. Therefore,
Please let me know if anyone has any questions or concerns.
Thanks,
Morgan
Here is a summary of the database view needed:
Database views from which we can pull data and index. A database view is a searchable object in a database that is defined by a pre-defined db query. In this context each database view will represent the specific data fields that need to be indexed to support the searches for Full-Text Search, Document Library and Subject Navigator, respectively. Though a view does not store data, it can be thought of as a virtual table that can be queried like a table. A view may combine data from more than one table using joins, or just contain a subset of data needed for the purpose of searching a specific dataset. The data views should be created by the team that is already familiar with the data model otherwise considerable time would need to be spent for Contegra to understand the full data model.
Thanks,
Morgan
We will provide database tables structure with fields and related Screen View to Contegra Team by end this week.
We have started working on Subject Navigator module.so, initially Contegra team can start working on Subject Navigator.
As we move along we will keep them updating for other modules as well.
Also, for database access do you need Contegra to have Live Database Server Access (Carbon 60) or local development server access (at DEVIT)?
Thanks,
Ketan Sondarva
As discussed this morning, please provide
For the live server access, please request additional credentials from Carbon60.
Thanks,
Morgan
Here, I have attached the Subject Navigator Search document which contains Screen View and Database Table Structure.
Thanks,
Morgan
The above attached document is very basic document. I know you should want to go deeply.
Hence, Just go through that document and if you want further details then we will schedule one call to discuss further which things you want and then will provide you.
Thanks for the document. Yes, it would be good to have a call and go through the details. For example, you list "DocumentValue" but it's not clear if that's a table or a column as I don't see it in the diagram.
Basically, what we need comes down to a couple of questions:
- Which columns are full-text searchable?
- Which additional columns do you need in search results? (e.g. ID field(s) that you use to create content URLs.)
We can schedule a call to go through it together. As it will be quite technical, I guess we can do it without bothering everyone else. Just drop me a note when you're available for a call. We could also use chat instead (e.g. Skype, Slack).
Thanks,
Radomir
Sounds good to me. Can we do call on next tuesday 6:30 PM IST time??
Here, I have added my skype detail :
Skype id : harsh.parikh05
Jitesh and ketan will also join in this call.
Great, I just sent you a contact invite by Skype. Talk to you on Tuesday.
Today, We & Contegra Team taken one call to discuss Subject Navigator Search. We have one question for you.
Up to how many level you want to perform Search ? Do you want search in Dispute Document Full Citation and cited paragraph number ?
Questions like this are answered in the document produced by
In the acceptance criteria for Subject Navigator, the keyword search will be performed for the following fields:
Thanks,
Morgan
cc:
Thanks for clarification.
We are taking "Special Search Term" field through Meta Field and other fields are required and hard coded.
Can we take Special Search Terms field as hard coded ? Because only hard coded field we identified for to data model view.
Thanks,
Morgan
No, hard coded fields are not required. This would be fine.
Mel
Also, I assume this means that any field we integrate into a search will need to be hard coded?
Morgan
Following up on our conversation this morning, it will not be possible for us to hard code all the fields that will be used to populate searches, particular for a number of the filter used in the Full Text Search and Document Library searches. Therefore, please review the search requirements in the search document: https://docs.google.com/document/d/10TP4xS4YUgmnznIUI2pzzMA2HOZu1FPm8dE7zmgudtA/edit, and let us know how you plan to deal with the situation. Note it's very important that we limit the number of hard coded fields, so that we can adjustment fields as required in the future.
Thanks,
Morgan
As discussed in last call, here, we have attached the spread sheet for Subject Navigator Search which contains all view columns (currently, it contains dummy data). Also, We have attached query which give us result set of columns.
Please check and let us know if you have any concern.
In which database can I try this query? I was looking at SQL Server at 10.68.138.11 but couldn't find any that contains referenced tables.
I don't really understand the spreadsheet you sent but if the SQL you made returns data that should be indexed, that should be sufficient.
Thanks,
Radomir
You can try above attached query on ISLGRebuild database on 10.68.138.11 server.
When you fire attached query on ISLGRebuild database then you get the result of columns which we included in attached spread sheet.
Please note that the currently all available data are dummy.
I successfully executed query on ISLGRebuild. Looks fine to me. Could you please just create a view for it so we can do simple "select * from <view_name>" to get data?
We have cared view for Subject Navigator Search.
You can use Select * From vw_SubjectNavigatorSearch on ISLGRebuild database.
Sorry, I don't understand what you want.
As per my assumption, you want column name which refer to document name in View.
Please clarify or we take short call to discuss.
I am available on skype in all working days. (10:00 AM to 8:00 PM IST)
The problem I have now is that I used the branchId (the first column) for dtSearch document id but, as this is not a unique value in the view, we overwrite records and at the end don't have all rows in the index.
So, I suggested creating an "artificial" ID that consists of all relevant row IDs the row consists of. And, as in the example I gave, to join these IDs together with some delimiter character.
Just put this column to be first in the view, that's how we'll know it's ID column.
If you need additional clarification, feel free to contact me on Skype.
Can I set Row_num() as unique ID for you as first column ?
We have added Row_num() as unique Id in view. We have given alias Id and set as First Column.
Please check and confirm.
Yes, the updated view looks good. I indexed it and the index is in "C:\Temp\test-index\subject-nav" on the Web server.
Now, do you want us to create a helper library to consume indexes or prefer to do it on your own? Would you prefer a Controller to access it as a web service, or a library (DLL)? Which C# framework version do you use?
Yes, We want from you to consume indexes. We are using .Net Core 2.1 version and C# version is 7.0
We are preferring Controller to call from JavaScript. Because, IF there is some customize we need then we can do easily.
This is my opinion. You can say your best approach.
I prepared the first version of the search web service. It's on the web server in the "D:\TologixWebSearch". It's .Net Core 3.1 application.
Could you install this app on the webserver so I can finalize the setup?
To check if it's running, you can send POST /api/search/subject-nav with parameter searchRequest=branch for example.
BTW, I just now realized that you said you're using Core 2.1, not 3.1 - if that's a problem let me know and I'll downgrade code.
Thanks.
I am not go through the search application. I am busy with to complete other stuff. I will go through by within one or two day and get back to you.
I go through following path on ip server : 10.68.138.10
E:\TologixWebSearch
I found that you put the published code in folder. Am I Right ?
Can we take call tomorrow 6:00 PM IST (Ahmedabad Time) ?
Yes, we can talk Wednesday 6:00 PM IST
We & Radomir took call for Subject Navigator search and all are going in good manner.
If we get the source code which radomir developed for SN as demo purpose then it is good for us. hence, we can test and get idea to do same thing in our local environment.
Thanks,
Morgan
Thanks,
Morgan
Is there a reason you need the source code now? If you so, please specify why, and we can work something out.
Thanks,
Morgan
The source code just we need to check how custom dtSearch indexing working, how we can pass input parameter model and how the data will get in json format.
We need source code only one time. for example, radomir devloped web app for subject navigator. once we get all the idea then we don't want for rest of modules.
This is first module we are going to implement so we need to clear from our side that all are going in proper manner.
Example: POST /api/search/subject-nav
{
"searchRequest": "pink link",
"searchType": "Phrase"
}
or
{
"searchRequest": "pink link fooooo",
"searchType": "AnyWords"
}
Here's the object model from where you can see all available options:
public class SearchModel
{
public string SearchRequest { set; get; }
public int PageNum { set; get; }
public int PageSize { set; get; }
public bool Fuzzy { set; get; }
public int Fuzziness { set; get; }
public bool Stemming { set; get; }
public bool WordNetSynonyms { set; get; }
public bool Synonyms { set; get; }
public bool PhonicSearching { set; get; }
public SearchType SearchType { set; get; }
public string SortField { set; get; }
public string SortOrder { set; get; }
public int SearchFlags { set; get; }
public bool EnableDateSearch { set; get; }
public DateTime? StartDate { set; get; }
public DateTime? EndDate { set; get; }
public string FileConditions { set; get; }
public string BooleanConditions { set; get; }
public bool IncludeSynopsis { set; get; }
public int Near { set; get; }
public bool ExcludeEnabled { set; get; }
public string ExcludeTerm { set; get; }
public SearchModel()
{
IncludeSynopsis = true;
Stemming = true;
Fuzziness = 4;
SearchType = SearchType.AllWords;
Near = 14;
SearchRequest = null;
}
public enum SearchType
{
NoValue,
AllWords,
AnyWords,
Boolean,
Phrase,
NearTerm
}
Sort order options:
scoredesc
scoreasc
hitsdesc
hitsasc
locationasc
locationdesc
documentasc
documentdesc
Let me know if you have any questions.
I've got confirmation from
Thanks,
Morgan
https://www.dropbox.com/s/wffgqgnv1qx6rde/contegra-tologix-master.zip?dl=0
We will check and try it and let you know if we face any issue.
I got a note from
Also, what are next steps in starting work on the next indexers (Full Text Search and Document Library searches: https://docs.google.com/document/d/10TP4xS4YUgmnznIUI2pzzMA2HOZu1FPm8dE7zmgudtA/edit)? Phase 2 of Subscriber side development includes the Disputes & Dispute Documents, which is scheduled to start development on May 21 and will include the Document Library searches. But I'm wondering if
Thanks,
Morgan
We are planning to integrate Subject Navigator Indexes after 20th May (after completing the core development of phase 1).
For Dispute & Dispute Document Library, We will take in phase 2 development and taking priority to this first.
For Full Text Search, It is big module and will set priority in Phase 3.
After speaking with
Thanks,
Morgan
Further to discussions and emails with
To start things off,
Thanks,
Morgan
We will start integrating Subject Navigator Index provided by you on first week of Aug. Meantime we will share you SQL View of Dispute & Dispute document module by 5th Aug so your team can start working on index creation for the same. For FTS (Full Text Search) we are planning to send you SQL View by mid of Aug as till then we can finish our integration with Subject Navigator & working on Dispute & Dispute document.
Also, I would like to know how much time it will take to create index for such module in general if we provide proper details of SQL View in a given time. So, we can plan our development & integration accordingly.
Thanks,
Ketan Sondarva
It would be great if you could provide SQL Views as soon as possible so we can review and start working on it. It's hard to tell how much time it's needed for indexing - depends on the amount of data, the number of files to be index, system and database performance, etc. Even if you don't have all documents ready but have a decent amount, we can test with it and get some numbers from it.
Note that my availability in August is limited. I have ongoing projects but I can still dedicate some hours for your project in the first half of Aug. However, from Aug 15, I have scheduled vacation and on-site work planned so my availability will be very limited until the second week od September.
Thanks,
Radomir
Thanks for update and noted as well.
As we have some query in Dispute & Dispute document SQL View, if we can connect on 29th or 30th July, 2020 by 6:00 PM IST to discuss further on our query which will give you better idea about SQL View and you can also discuss any query from yourside.
Thanks,
Ketan Sondarva
We have some quires for Dispute & Document SQL View. are you able take call on Skype tomorrow (30th July) at 6:00 PM IST?
Otherwise, I'm available on Friday (31st July) at 6:00 PM IST.
We have few queries regrading search result in dispute & document library module which we will discuss with Industrial team on Monday then will provide SQL view of Dispute & Document Library.
We are working on Subject Navigator Contegra Search Integration in our application and we are getting the search result through API call.
For next module Dispute & Document Library following I have added SQL View.
vw_DisputeDocumentLibrarySearch on ISLGRebuild databse.
(server :10.68.138.11)
But, I have some concern regrading search parameter.
As per above screenshot, there are so many search parameters we need to pass while we click on filter and find the result according to parameter which we selected.
Suppose for example, In text box we enter keyword "ICSID" and select the Language "English" from search parameter then we need to get result from indexing who match the result with ICSID keyword and English language.
So, my question is how the indexing will return the result by different parameter which i selected ?
If you need to discuss then we are available on 13th August as tomorrow we have a national holiday.
Please suggest.
You need to pass additional filters via Custom object, for example:
{
"searchRequest": "pink link",
"searchType": "AnyWords",
"custom": {
"language": "English"
}
}
You can see the implementation of this filtering in the SearchController.cs, lines 201-217.
Hope this helps. let me know if you have any questions.
Regards,
Radomir
As discussed on Friday (14/08)'s call, We have added HierarchicalParentIds column in Subject Navigator SQL view.
We have remian ParentId column as it is and added new column HierarchicalParentIds to get multiple parentId in comma separator.
The HierarchicalParentIds column contains multiple ParentIDs with comma separator as per following screenshot.
Please let us know if you have any concern.
You can get it from https://www.dropbox.com/s/fxd5acvnpxw7h6e/DBIndexer-with-cmd-line.zip?dl=0
If you run TlogixDBIndexer and pass a filename as an argument (which is a config file), it will run in the command line mode only. You can find sample config file "index-config-subject-nav.json" in the archive.
I'll look at the parents field soon.
https://www.dropbox.com/s/riuyasnjn9rv7ll/DBIndexer3.zip?dl=0
It includes 3 dtsearch data files that define some indexing behavior (e.g. noise words). The code was updated to load them from the application folder. (This change was needed to fix behavior where single letter terms were not indexed so, for example, document ID with value "1" could not be found.)
Next, I implemented support for finding and returning parent nodes listed in the HierarchicalParentIds column. The search response payload now includes parents field witch is the same list of documents. Sample response looks like this:
{ "totalResults": 7, "results": [ { "fields": { "id": "35", "branchid": "35", "branchtypeid": "3", "branchname": "Test new subject", "Filename": "db://vw_SubjectNavigatorSearch#Id=35", "parentid": "20", "documentid": "15", "contenttypedatamasterid": "29", "disputedocumentshorttitle": "OT/0001/03 - NJ Test Case AF-0005-01 Destination Code - 31/08/2020 - English", "shorttitle": "NJ Test Case AF-0005-01 Destination Code", "fullcitation": "NJ Test Case, AF-0005-01 Destination Code, 31 August 2020", "casename": "NJ Test Case" } }, ... ], "parents": [ { "fields": { "id": "27", "branchid": "27", "hierarchicalparentids": "1,27", "branchtypeid": "3", "branchname": "Abuse of Process", "Filename": "db://vw_SubjectNavigatorSearch#Id=27", "parentid": "1" } }, ... { "fields": { "id": "1", "branchid": "1", "hierarchicalparentids": "1", "branchtypeid": "3", "branchname": "A", "Filename": "db://vw_SubjectNavigatorSearch#Id=1" } } ] }You can get search service update from:
https://www.dropbox.com/s/1y4e4tke94sqngn/TologixWebSearch.zip?dl=0
We need all results data including Parents and branch both in one array. We don't need different array for Parents data.
Could you please modify this and let us know. Also, the above dropbox link is not working.
I separated results and parents with reason. How would you know otherwise in which node there's a hit and which node is there only as a support? It's a one liner code on your end to join both arrays if you really need them that way. But, if you're sure you don't need separation between hits and non-hits, I can join them in the service. Please confirm.
BTW, which Dropbox link is not working? I tried both from the previous message and they both open.
What is the purpose of separation of hits and non-hits which we don't know.
Our understanding is we need to pull all branch and parent data result in one array and we will convert that result as per our model and pass to presentation layer.
Could you please clarify what is the meaning and use of hits and non-hits separation ?
Also, the above Dropbox link is not open from our end. I have attached screen shot for your reference.
As of Dropbox, check your network. Works fine for me.
We don't need to display hit count so just merge the array and let us know once you updated.
We will again use dropbox link and pull the service in our local environment.
One more question regarding above Examplecustommodel.zip file, Shall we have to create different result model for each tool ?
Suppose, above example model is work for SN but for DisputeDocuemntLibrary we need to create another model ?
I made a change to the search service to add parents to the results list.
You can get it from https://www.dropbox.com/s/8va4x1q2jkzptls/TologixWebSearch2.zip?dl=0
As of the custom model, if for DisputeDocuemntLibrary or other index you have different fields, you will need a different model. I mentioned this as a downside of a custom model during our last call.
What you could do, is to make one model with all possible fields you have, across all different indexes you're going to create. It's messy but it's a single model.
We are not getting all parent ids result from indexes. We looked in to your TologixWebSeacrh Application and found that you are using Ids in place of BranchIds.
Can we take call on Monday (31st August) 2:00 PM IST for Subject Navigator Indexes result ?
Please send me details what exactly is the issue and you'll have a fix by Monday.
Actually, You are using Id column instead of branch id for search Parent Ids and Parent Ids all result data.
You are using Id column in Line no. 112, 84. We have changed and set branchid instead and we get all results.
Please confirm that Is it OK ?
So,if I understood correctly, you modified the view to use branchid in the id field and now it works fine? Sure, if you see no side effects, it's fine.
I was not modified the view. I have used branchid in place of id in line no. 112 and 84 in your tologixwebsearch application and then publish that code again and check.
it works fine.
The change we made in searchcontrol.cs file in line no. 112 and 84 is Ok?
Are you available to take call on tuesday ?
In general, we can add highlighting by extending the search service. Highlighted text could be added as a field to the JSON result object.
BTW, I saw your message on skype:
> we need to know how we generate indexes on daily basis whiout do manual process to generate indexes
I already sent you indexer update that runs from the command line. See my message and Dropbox download URL that I sent on Aug 17 above. You need to setup Windows Scheduler to run the indexer and that's it.
We will discuss on tuesday.
Here, I have attached SearchController.CS file which we made changes for Subject Navigator Index.
Please replace the above file in your project.
As discussed, we need to highlight the match word in presentation view as we are currently did with Jquery Highlighter.
The word will be highlighted from following columns.
Respondent State, Case Name, Case Number and Special Search Terms, Short Title and Full Citation.
Also, Please let us know when we schedule a call to generate indexes through command line.
As discussed per today's call, We need to provide 2 SQL Views. 1st SQL View contains all dynamic columns with column name (ex. Field_100, Field_101) and 2nd SQL View contains all columns which we need to bind in our model.
As discussed, You will get one column DisputeId from 1st SQL View and you need to pass DisputeId column value to 2nd SQL View and provide us JSON format result to bind data as per our model.
But, We have sent one mail to you regarding 1st SQL View. Due to Dynamic column structure, It is not feasible to create 1st SQL View. Hence , currently we are providing Stored Procedure in place of 1st SQL View which contains all dynamic columns.
We are able to genearte 2nd SQL View because that view contains Fixed column.
Here, I have provided you Stored Procedure and 2nd SQL View.
Stored Procedure name (In Place 1st SQL View) : FE_MetafieldwithValueDynamic
2nd SQL View : (Pass the DisputeId column from which you get from above stored procedure)
VW_DisputeContegraSearch
You can use ISLGRebuild databse on 10.68.138.11 server to find the stored procedure and SQL view.
Please let us know that stored procedure is work from your side or not or we are ready to discuss over call.
Which field identifies a "document" for this index? I think we mentioned ContentTypeDataMasterId in the past but now you reference the DisputeId. Should DisputeId be used as a document identifier?
I checked the stored proc result and the view. A possible problem I see is that for DisputeId=23 view VW_DisputeContegraSearch returns multiple rows. I can index data as received but in the results you will not be able to tell which field goes with which ContentTypeDataMasterId. (For example, you have Field_299=15532 for ContentTypeDataMasterId=58, but ContentTypeDataMasterId=59 has the same DisputeId so will be part of the same document.)
Is that how you wanted things to be indexed? Please confirm.
I'll be available on Skype in the following hour or two if you want to discuss it quickly.
Yes Consider DisputeId column for document identifier.
For 2nd Point, I will ping you on skype right now.
Thanks for quick call today. As discussed, We need to pass DisputeId column in 2nd SQL View and get results from indexing of all 2nd SQL View Columns.
Also, We get Search result count and highlight the matching word from all the columns.
After closer examination of the data returned by the stored proc and the view, I still see some issues.
The problem is that both return rows with duplicate DisputeId values. For example, multiple rows exist for both DisputeId 7, 30, etc. As we use DisputeId for a document identifier, any duplicate appearing in the first table will overwrite the previously indexed document with the same DisputeId.
1) It's crucial that the first indexed table/view (here we use the result of the stored proc) returns rows that have unique document identifier. For the Disputes Search it makes sense that the DisputeId is used, but values should be distinct.
2) The second view indexed may return multiple rows for the same DisputeId and all columns will be indexed as a part of the same indexed document.
Let me know if you have any question.
We are not getting what you want to suggest. As discussed, We need to Pass DisputeId column value from stored procedure data to 2nd SQL view for getting all Dispute and Document related data which we need to display in subscriber side.
If you want unique column then you can use ContentTypeDataMasterId column.
IF you want to take call then we are available now till 6:00 PM IST. Also, we are available on Monday 10:30 AM to 5:00 PM IST.
Please let us know.
The problem is not indexing of the second view. The problem is with the first view (stored proc). Each row in the table returned by the stored procedure should correspond to one indexed document (and that will be returned in results as one matching document).
My understanding is that your result item for this search is a Dispute. That means that you should have only one row per dispute in the first view indexed. As you have the same DisputeId in multiple rows, I suspect this is not the case.
BTW, I can index what you provided so far, and will can continue my work. However, my concern is that the data is not prepared as it should be which will result in an incomplete index and wrong/incomplete search output.
I hope this makes it more clear. Please let me know.
Yes.. You will get same DisputeId row. As discussed in last call, we are grouping our 2nd SQL View data in C# side. so when if get mutiple row data with same disputeid then will do grouping in C# side and then will present data for user.
Please let us know if you want to discuss over call.
If we use DisputeId for index document identifier, you cannot get multiple rows - you would get only match for the last row as it will overwrite previous rows with the same id.
So, is using ContentTypeDataMasterId for the document ID fine?
I think we need to take one short call to sort out this. Please let us know when you are available to discuss over call.
As discussed now, you can take ContentypeDataMasterID column as unique column and by using this column you will get value of DisputeId column and pass this DisputeId value in 2nd SQL View.
Hope this is fine.
I'm still working on the search service changes - you should have it tomorrow. In the meantime, here's the JSON structure that you can use to send your query:
1) Boolean queries nesting:
{ "type": "boolean", "operator": "and" "clauses": [ { "type": "boolean", "operator": "or" "clauses": [ ... another boolean or match type ] } ] }2) Field value matching:
{ "type": "match", "exclude": false, "field": "Field_100", "value": "3" }{ "type": "match", "exclude": false, "field": "Field_100", "values": ["3", "7", "14"], "operator": "and" }I hope this covers all your cases. Let me know if anything is missing.
Please confirm your connivance time.
Here's an update for you: https://www.dropbox.com/sh/iufr961llu2wrht/AADzxZAmXEcDn-hotEyY3CkNa?dl=0
1) DBIndexer is updated to index second view. Check the indexer-config-disputes.json config file in the zip file - you probably just need to change the index path where you want tosave the index.
2) TologixWebSearch - copy CS files to update the existing project files. In the appsettings.json you can see that new parameter was added ("DisputesIndex") that should point to the Disputes index,
Search service endpoint for Dispatch search is: /api/search/disputes
I'm attaching example request body request and response.
I've noticed that, because of multiple rows in the 2nd view, some fields in results have multiple values. I made changes to the indexer (attached modified file) to prevent that. I'm also sending you a sample output for the same query as in the previous post.
Talk to you tomorrow.
We will look into this today and will update you.
let us know if anything.
As discussed in today's call, We need only 2nd View indexing result to bind the data in model.
The stored procedure we used just to get the DisputeId from filtered data and pass that DisputeId value to 2nd View. so we need 2nd view data which contains all columns which we need to bind in our model.
Hope this is fine and let us know once you changed the service.
I made changes in accordance with today's talk. You can find modified files at
https://www.dropbox.com/sh/htv0bhk4sea6n0p/AAB7HzodCDz1N6aKf0W-1NGga?dl=0
Note that the dispute index config has changed as well.
To the appsettings.json new parameter was added for DisputeDocsIndex
You can see sample results output JSON in disputes-response-sep17.txt
Hope that's what you need. Let me know if you have any question.
We have following operators which we need to include service while we pass filter data in json file.
All these operators do not exist in dtSearch as is. You need to transform them to a combination of "equals" and boolean statements.
For example, for "is set" and "is not set", you could add an index field (column to your view) as "field_59_set" with value "Y" (when the value is set) and "N" (when the value is not set). Then, if you need "field_59 is not set", send search for "field_59_set = N".
As for operators after/before/between, I guess you need them only for dates, correct? First, you need to change the formatting of dates in your view to the "YYYYMMDD" format. The way dtsearch date range query filter like is: xfilter(word "datefield::20020101~~20020131")
https://support.dtsearch.com/webhelp/dtsearchCppApi/File_Conditions.html
Filters containing starts/ends/lasts probably refer to a specific field. You will need to specify that field in the query, combining with supported operators.
Operators "greater than" and "less than" do not exist as such. What are you comparing? What's the type of the field this applies to?
I think we need to discuss above points on call. Also, we have following queries in JSON file which pass to filter Dispute & Document Data.
We have following 2 question for JSON request.
1) If we not pass any operator in JSON then API give us error of invalid operator
2) we need to confirm the JSON file for Add Another Rule from your side.
We are available to discuss above all points 10:30 AM to 6:00 PM IST.
Please let us know your convenient time to discuss.
As you need all details, Here We have attached ZIP file which contains 4 screen shot with different kind of filter data we set.
And based on this filtered data we make json file which you can see in ZIP folder.
Please look into that JSON file and provide your feedback that as per filtered data the JSON file is correct or not ?
As you mentioned, will talk more on Thursday call 11:00 AM IST.
1)
{
"type":"match",
"field":"Field_62",
"Operator":"and", <<<<< this means both 234 AND 233 should be present, if that's what you wanted, it's fine
"values":["234","233"]
},
2)
{
"type":"match",
"exclude":false,
"field":"Field_347",
"values":[true], <<<< as you have a single field here, you can use the "value" field and then you don't need the Operator here. It would be ignored anyway. (Check the support JSON examples I sent you earlier.)
"Operator":"and"
},
3) The query part with Field_91 and Field_94 doesn't look good. From other items in your JSON I think you didn't fully understand the logic of the structure.
For type:boolean, the operator is used to combine clauses. For example:
{
"type": "boolean",
"operator": "and"
"clauses": [ c1, c2, c3 ]
}
will be translated to c1 and c2 and c3 where c1/c2/c3 are boolean or match elements.
According to my understanding of your UI and what you told me, JSON for this part would looks like:
{
"type":"boolean",
"Operator":"or", <<< you combine with OR the same queries for Field_91 and Field_94
"clauses":[
{ <<< a clause for Field_91
"type":"boolean",
"Operator":"or", <<< this is OR selected in your UI
"clauses":[
{
"type":"match",
"exclude":false,
"field":"Field_91",
"values":["23447","23573"],
"Operator":"and"
},
{
"type":"match",
"exclude":true,
"field":"Field_91",
"value":"23411"
}
]
},
{ <<< a clause for Field_94
"type":"boolean",
"Operator":"or", <<< this is OR selected in your UI
"clauses":[
{
"type":"match",
"exclude":false,
"field":"Field_94",
"values":["23447","23573"],
"Operator":"and"
},
{
"type":"match",
"exclude":true,
"field":"Field_94",
"value":"23411"
}
]
}
]
}
Hope this helps.
I made changes in accordance with our today's call.
You can find updated files at: https://www.dropbox.com/sh/wvdyic79bqey4i4/AADZen55gjB-nhPwqDRG30MVa?dl=0
1) Indexer was updated to index date fields in YYYYMMDD format. Other date fields in the view that you send as text, you need to fix on your end.
2) Search service changes:
a) Instead of using QueryStatement, send JSON via field FilterStatement. The JSON structure remains the same, with additions for date search below.
b) To search for a date range use object of the "range" type:
{
"type": "range",
"field": "datecreated",
"from": "20190919",
"to": "20210919"
}
For the BEFORE operation, also use type "range" but specify only "to" (omit the "from" field). For the AFTER operation, use "range" and "from".
c) The operator field is now optional and the default is OR as you requested.
d) To sort results use SortField with name of the field on which you sort, and SortOrder, which can have value "asc" or "desc".
Note that the sorting will be used on the "disputes" index search, and then you're getting results from the "dispute-docs" for already ordered disputes. I hope that makes sense.
Below is an example of a json structure with the new features:
{
"searchRequest": "test",
"FilterStatement": {
"type": "boolean",
"operator": "or",
"clauses": [
{
"type": "match",
"field": "datecreated",
"value": "20201001"
},
{
"type": "range",
"field": "datecreated",
"from": "20190919",
"to": "20210919"
},
{
"type": "range",
"field": "datemodified",
"to": "20190110"
}
]
},
"SortField": "language",
"SortOrder": "desc"
}
Can we take call to discuss full text search module requirements and how we get data from contegra search?
We will take call on 6th october 11:00 AM IST.
Please confirm.
Yes, we can talk 6th october 11:00 AM IST.
We are working on Full Text Search Module. Due to one technical Challenges task, We are delay in this module.
We are ready with SQL View and requirements for FTS module.
Could we take call on next Tuesday (20th October) 11:00 AM IST to discuss and finalized the FTS module ?
We will take call on our Skype Group.
Please confirm.
As discussed today for FTS module, Following I have added details.
1) We are provide you Query and View. Query will use for Find the ContentTypeDataMasterId based on Filter we passed. then, We are Passing ContentTypeDataMasterId to our view to display card detail. The View is our Final Result.
2) The SQL View contains HtmlFileName, PDFFileName and ISPDFOnly Column.
ISPDFOnly : 0 -> Need to find search keyword in html files
ISPDFOnly : 1 -> Need to find search keyword in PDF files.
Server Path for PDF and HTML Files : (Server ip : 10.68.138.10)(E:\ISLGRebuildDemo\wwwroot\Documents)
3) We need to highlight the keyword in Paragraph or Page text. Also, we need hit count.
4) We have sorting and Pagination to on display cards. By default we need to display 10 cards.
Here, I have attached sample PDF file, HTML file, Query and SQL View.
Database Name : ISLGRebuild (Server : 10.68.138.11)
SQL Query Name : FE_MetafieldwithValueDynamicFTS
SQL View Name : VW_FTSDocumentSearch
Also, As discussed Please take following SearchControl.cs file and put in your project and use this page for further work.
As discussed today, please let us know once you resolved regenerate indexing issue.
Also, please let us update on Full Text Search module? when we expect for this?
What will do the file yo provided in following link for re-indexing issue.
https://1drv.ms/u/s!AugzRBG6eTFwg4sJwygyvkWxOQnMWg?e=gfh2JQ
We are not getting that what will do with those file ? How we can resolve this ?
The indexer checks if the index already exists and turns off "Create" flag. That was the problem that prevented indexing when the index was in use. I reproduced the problem locally and it was fixed by his change.
Hope it works for you.
What we need to do with those 3 files in folder ? Can we replace in TologixDBIndexer Project ?
IF yes, then if we replace those files then it gives error that Dthelper is not exists.
In any case, you can use the already compiled exe that I provided.
As per your comment, We have also copied SampleDataSource.cs file in TologixDBIndexer Project. The error is gone and Indexing is generated.
But, Sam problem is still occurred. We need to stop published application in IIS and then again need to generate indexes. after that result will produce. If we directly try to generate index then no result is found.
Please do needful. We are available in skype till 6:00 PM IST to discuss.
We are available to discuss FTS module and re-indexing issue. Can we conenct ?
1. I need you to add a unique ID to the indexed query/view. It can be row number as you suggested, but it that case you will not be able to do incremental indexing (as there's no reliable unique identifier of a document). As you said full index is generated daily, this might not be a problem so row number is fine.
2. If we generate two separate indexes for the query and the view, we might hit dtSearch limitation on the request size (I remember it was 64KB) when passing results IDs from the first index to the next one. That's why I highly recommend either:
a) Adding all data to one stored procedure as you suggested, or
b) Adding ContentTypeDataMasterId parameter to the FE_MetafieldwithValueDynamicFTS procedure, so that we can index the view first and get only related data from the proc.
That will allow us to create single index with all data.
3. dtSearch doesn't support text extraction (and highlighting) on page level. I'll think about how to meet your requirements for getting result pages and discuss this with
Following, I have added discussed point as per todays call.
1) Re-indexing issue is resolved
2) As per your recommended, We have merged query and view in one Single Query for all data. so now there is no need of second view. Following, I have mentioned query name.
3) We have added Row number in query as unique identifier. (Column name RowId)
The Stored Procedure Name : FE_MetafieldwithValueDynamicFTS. You can find on ISLGRebuild databse on our server.
Now, There is no need of second view. We get all data from above mentioned query.
For Sorting, There are following 4 field you can use from query.
As we discussed, When column ISPDF Only 1 in query, it mean we need to display Page number from PDF file and when user click on it we need to render that page under paragraph and highlight the serach word.
For Html, We can display Paragraph from html file and highlight the search keyword in that Paragraph.
Also, We have paging in this module.
Hope this fine and you get all things from our side.
In the https://1drv.ms/u/s!AugzRBG6eTFwg4sJwygyvkWxOQnMWg?e=gfh2JQ
you can find a new folder with FTS indexer and the web service update.
Indexer config file indexer-config-fts.json specifies paths to two different indexes:
- IndexDir (as before) is the main documents index
- ParagraphIndexDir is for the index containing paragraph-level data.
Both indexes are created in one run.
Both indexes should be added to the web service appsettings.json as in:
"Tologix": { "FullTextDocsIndex": "y:\\contegra\\tologix\\test-index\\tologix-index-fts\\", "FullTextParasIndex": "y:\\contegra\\tologix\\test-index\\tologix-index-fts-paras\\", "SubjectNavigatorIndex": "y:\\contegra\\tologix\\test-index\\subject-nav\\", "DisputesIndex": "y:\\contegra\\tologix\\test-index\\tologix-index-disputes\\", "DisputeDocsIndex": "y:\\contegra\\tologix\\test-index\\tologix-index-dispute-docs\\" },The web service endpoint /api/search/fts was added for full text search requests. It's similar to what you had before, with additional paragraphs field containing a list of matching paragraphs.
See file full-text-search-example.txt for request and response payload example.
When the paragraphs is null, means that the matches are not in paragraph contents but in some other field.
Currently, for documents with ispdfonly=True paragraph number has null. This is because we're still not indexing separate pages in PDF documents so cannot provide appropriate information for this. We're still looking into finding a solution for this.
The web service endpoint /api/search/highlight-para is handling paragraph highlighting request. You need to pass an object with paraId (paragraph identifier from the search results), plus searchRequest and any applicable search control options. See BasicSearchParams class in the sources for all available options.
See file full-text-search-paragraph-highlighting-example.txt for request and response payload example.
Let me know if you have any question.
We are doing integrate FTS contegra search in our application and We are facing issue to get the data form indexing. It throws error so we need to discuss this thing over call.
Also, We need to understand whole procedure how to get data as well paragraph list from indexing.
We are stuck here for further development.
We are available till 6:30 PM IST for all weekdays. Please ping to our Skype group to discuss this thing.
Thanks for quick call.
As discussed, Currently, we are getting error while we are going to fetch result data from API call. Following I have added details.
at System.ThrowHelper.ThrowAddingDuplicateWithKeyArgumentException[T](T key)
at System.Collections.Generic.Dictionary`2.TryInsert(TKey key, TValue value, InsertionBehavior behavior)
at System.Collections.Generic.Dictionary`2.Add(TKey key, TValue value)
at TologixWebSearch.Controllers.SearchController.FullTextSearch(SearchModel sm) in D:\shrinivas\SVN_Project\TologicWebSearch\Controllers\SearchController.cs:line 65
at Microsoft.AspNetCore.Mvc.Internal.ActionMethodExecutor.SyncActionResultExecutor.Execute(IActionResultTypeMapper mapper, ObjectMethodExecutor executor, Object controller, Object[] arguments)
at Microsoft.AspNetCore.Mvc.Internal.ControllerActionInvoker.<InvokeActionMethodAsync>d__12.MoveNext()
Also, We need to highlight the any search word in Subject Navigator module from any field.
Hence, please let us know once you complete this.
I'll let you know when I have a solution for fields highlighting.
Due to one bad news my team member Shrinivas' s Father has been passed away last week. hence, he start working on this by this week.
We are getting one error while we fetching paragraph text from highlitpara method.
Can we take call to look into this issue by next Monday in-between 10 :00 AM to 6:00 PM IST ?
Sorry to hear that. Yes, we can talk on Monday. I'll contact you on Skype as soon as I'm available in the morning (around 13:00 IST, that should be my 08:30).
Thanks for today call.
Does this API search the word from PDF file and provide the PDF page number and if we click on page number then it will display whole page text with highlighted search keyword ?
Could you please confirm ?
My understanding is that Tologix has PDF Highlighter license but some older version. I'm waiting on feedback from
If the upgrade doesn't happen, we'll need to research further and find some other solution for it.
I spoke to
Thanks,
Morgan
I have one technical question, not sure who can answer on this: Can I rely on Highlighter server having access to the network share with the documents being indexed? I can upload PDF to Highlighter wherever file is indexed but with Highlighter having access to the file share we can save time and bandwidth.
Thanks,
Radomir
I have no problem setting things up that way. As long as access to the network share is done securely and does not result in compromising the security of any data on our servers.
Also,
Thanks,
Morgan
As Morgan suggested if Network share securely then I also don't see any problem setting on that way.
Morgan
Yes, we'll be maintaining the legacy application for a beta period (3-4 month). During that period, both applications will be operating in parallel to each other.
Thanks,
Morgan
I can install trial version on the dev server for use with the current test content folders, and we can upgrade the production instance later.
Thanks,
Morgan
I don't have access to other server (except the SQL Server) on which I could install Highlighter for development/testing. Do you want me to upgrade the production instance instead? There will be some downtime though (15-130 minutes until I migrate the config). Please let me know.
Note if it's going to require downtime on the server that will affect the production environment, we should perform the install at between 2:00am and 3:00am Eastern Time (North America) on a Friday or Saturday evening to minimize disruption to users.
Thanks,
Morgan
The all ISLG applications are running on server 10.68.138.10. The development environment application is (https://www.investorstatelawguide.com/) and Production environment (https://www.investorstatelawguide.com/).
You can perform the highlighter upgradation between 2 to 3 AM Eastern Time (North America) on a Friday (it mean tomorrow 27th November).
Please make sure it will be not affected the production application https://www.investorstatelawguide.com/)
Also, Let us know once you completed the upgradation and server will be restarted so we can check all running application from server 10.68.138.10
I discussed the issue above with
Please confirm when the updates will occur, and
Thanks,
Morgan
I can do the upgrade during my Saturday morning (which should be around 2am your time). I'll send a message to
FYI, I'll be on a road from Saturday afternoon to Sunday evening so during that time will not be available. If you think it's too risky to to upgrade before my trip, we can leave it for my Monday morning (which should be around 2am in US and still gives Harsh and the team time to verify the installation and we have enough time for any corrections before US work hours).
Please let me know what do you prefer.
Thanks,
Radomir
Thanks,
Morgan
Please upadte us once you complete the process on server so we will check all aaplications.
I have checked all application as well FTS module in both investorstatelawguide.com and dev.investorstatelawguide.com and all are working fine. Also, the highliter is working in FTS module.
Yes,
Thanks,
Morgan
We have uploaded test documents for ISPDFOnly=True on our database ISLGRebuild on server 10.68.138.11.
Also. The PDF Files have stored in E drive on server (10.68.138.10) on Following Path.
E: Drive : ISLGRebuildDemo\wwwroot\Documents\PDFFiles
Currently, 2 things are pending in FTS module from your side.
1) When we click on any paragraph from html file then we are displaying html parapgh text below the paragraph but, we also need to highlight search keyword inside that paragraph which is not currently working.
2) We need to display Page number when ISPDFOnly = True and when user click on Page number we need to display whole page text and highlight the search keyword
Can we take call tomorrow on skype in-between 10:00 AM to 6:00 PM IST ?
Please confirm
Paragraph extraction was implemented for the HTML file format you send me and highlighting for that worked fine on my end. This HTML had:
Indexer was breaking because it could not parse HTML files in unsupported format. I believe that's the reason why paragraph highlighting didn't work for you. I made a change to the indexer so that invalid file is skipped and error logged. Now, at least valid HTMLs will be handled.
I also managed to test PDF page extraction and highlighting so that's working as well.
In order to extract paragraphs from HTML we need consistency in the HTML formatting. Please, let me know which format you're going to use. If there are multiple different formats, we need to support them all. It would be great if you could provide us with all format details. If you don't have this information, I'm afraid we'll have to do it one by one, analyzing error logs and the actual content.
In the OneDrive folder you can find updated indexer and FTS indexing configuration. Notice new config parameters PdfHighlighterUrl which is required for PDF page extraction.
We are looking into above things. meanwhile we should take call tomorrow between 1 PM to 6 PM IST. so we both will remian on same page.
we also need to discuss regrding html format.
Please confirm for tomorrows call.
After today's call, we have started looking into all functionalities which we devloped for FTS.
1) Data Binding
2) Pragraph Listing
3) Sorting
4) Pagaing of Document Card
But somehow, The paging of Document card is not working. When we click on 2nd page then same data of 1st page is rendering.
had we missed something for paging ? Can we take call tomorrow again 1:30 PM IST to resolve this issue ?
Please confirm.
How we can set Rebuild Project Document Path in highlighter folder ? Currently in application.conf the live existing path is already set.
How we can set Rebuild Project document path also on server 10.68.138.10 and which highlighter URL will use ?
Please give answer as early as possible as we have release by tomorrow for UAT.
First, do you want to use the same Highlighter instance or not?
What you could do is installing another Highlighter instance on a different path and different port.
If using the same instance, you could use a different path prefix for the Rebuild Project and in folder mapping settings point Highlighter to a different folder for that prefix.
If you're using the same instance, you need to use the same Highlighter URL. Otherwise, you need to setup another proxy on IIS to point to another instance's port.
Can we take short call on tomorrow or monday between 10 AM to 6 PM IST? As we are not as much aware about highliter setup on server and we dont want to take any risk for current live islg application.
Please confirm.
Hope you are doing well.
We are start looking into the FTS module bug and for 2 bug we performed following scenario.
Scenario 1 :
Scenario 2 :
We are available 10:00 AM to 6 :00 PM IST from next week.
Please confirm.
If you cannot see the error, please send me the log, index and the search payload you;re using and I'll try to reproduce.
I would suggest to take a short call so we can show errors raised by Industrial team and then you can do further communication via this channel.
Let me know your availability for tomorrow as Harsh has already sent you mail to meet once to understand issues and get solution asap.
Thanks,
Ketan Sondarva
We have another call at 6:00 PM IST. Can we take call at 4:00 PM IST today ? It mean after 15 minutes.
Or we can take call by tomorrow between 11:00 AM to 4:00 PM.
Please confirm.
As discussed in today's call, Following I have provided you database name and Documents path from server so you can test it.
Server IP : 10.68.138.11
Database Name : ISLGRebuildFTS
Server IP : 10.68.138.10
Documents Path : E:\ISLGRebuildDemo\wwwroot\DocumentsFTS
From above path you will find all html & pdf documents from server.
I will also post you all bugs which industrial team raised.
Following, I have mentioned the bugs description and issue which Industrail team raised.
1 ) issue no : 20579 - Search results with no pinpoint references
Steps to reproduce:
Result: I am taken to a search results screen with 5 document results. Only 1 document result contains a pinpoint reference.
Expected: For each document result there must be at least one pinpoint reference for the keyword appearance in the text.
2) issue no : 20572 - FTS > Cannot perform search with filters
Steps to reproduce:
Result: I am taken to the search results page. No results or message is presented.
Expected: I expect the see the documents with my selected language as a result for my search. The documents are displayed if i do not apply a filter.
3) issue no : 20586 - Search with basic filter does not work
Steps to reproduce:
Result: I am taken to the search results page. Some of the search results are in not documents in 'English'
Expected: based on my applied filters I must see only documents containing the word test + with language = English
4) issue no. 20611 - Search > Not able to search without inputting a keyword
Steps to reproduce:
Result:Leads to an empty page with no results
Expected:Will see any results that correspond to the filtering parameters
5) issue no. 20612 : Search > Boolean search not working
Steps to reproduce:
Result:no results found
Expected:Will see results that correspond to the "blue" keyword
6) issue no. 20613 : Search > Fuzzy Typo not working
Steps to reproduce:
Result: No results found
Expected: Should see results that correspond as if I entered "blue"
Please let us know if you need anything else.
The update is in the OneDrive folder "2021-01-06 FTS update"
2) issue no : 20572 - FTS > Cannot perform search with filters
Should be fixed with the update. For the index you sent me, the following test search returns 1 result:
{
"searchRequest": "",
"FilterStatement": {
"type": "boolean",
"operator": "or",
"clauses": [
{
"type": "match",
"exclude": false,
"field": "language",
"value": "French"
}
]
},
"SortField": "sortingdate",
"SortOrder": "desc",
"PageSize": 50,
"PageNum":0
}
3) issue no : 20586 - Search with basic filter does not work
I don't see any issue here - I'm getting only results with English.
My search payload:
{
"searchRequest": "test",
"FilterStatement": {
"type": "boolean",
"operator": "or",
"clauses": [
{
"type": "match",
"exclude": false,
"field": "language",
"value": "English"
}
]
},
"SortField": "sortingdate",
"SortOrder": "desc",
"PageSize": 50,
"PageNum":0
}
or you can send it as:
{
"searchRequest": "test",
"FilterStatement": {
"type": "match",
"field": "language",
"value": "English"
},
"SortField": "sortingdate",
"SortOrder": "desc",
"PageSize": 50,
"PageNum":0
}
In both cases I get 4 results, all English.
I suspect there's some issue with the structure of your request. Please send me the payload you're submitting to the service.
4) issue no. 20611 - Search > Not able to search without inputting a keyword
This looks like a duplicate of #2, should be working now.
5) issue no. 20612 : Search > Boolean search not working
Looks good to me. The following search returns 5 results for be, highlighting both "blue" and "yellow":
{
"searchRequest": "yellow OR blue",
"searchType": "Boolean",
"SortField": "sortingdate",
"SortOrder": "desc",
"PageSize": 50,
"PageNum":0
}
Please send your request payload.
6) issue no. 20613 : Search > Fuzzy Typo not working
Looks good to me. The following search returns 3 results, highlighting "Blue":
{
"searchRequest": "blun",
"Fuzzy": true,
"SortField": "sortingdate",
"SortOrder": "desc",
"PageSize": 50,
"PageNum":0
}
Please send me your payload.
I run all tests with the index you sent me.
I'm still investigating #1 and get back to you later on it.
As discussed before, we need a better description of the HTML format from you. This issue is related to that.
The test file you sent us initially was simple:
To successfully extract paragraph content we really need description of the format(s) you're using. Get not the simplest, but the most complex examples of content to be handled, and explain how to get data from it - what makes a paragraph.
Thanks,
Radomir
Here, I have attached description our html coding manual. The html structured defined by this coding manual.
In this document we covered all the scenarios. so please look into this manual for html description.
Thanks for the document. From what I can tell, it focuses mainly on the visual aspects, not much info about extracting paragraphs data.
1. Check this screenshot from the above mentioned sample document. Two paragraphs are marked with (3), on the same level. My assumption was that paragraph numbers are unique. How should this be represented in index?
2. There's text from page footers, marked with class "pdffootnote", will be indexed as a part of the HTML. However, as far as I can say, this doesn't belong to any particular paragraph, so there will be no paragraph in search results for matches in this content. Makes sense?
3. If you have information which text belongs to which paragraph, can you add additional data attributes to content (e.g. similar to "data-key" attributes that exist in the HTML) to make text extraction simple? For example, adding "data-para" attribute to a text div, where value is the paragraph number, would make the extraction way simpler.
Thanks,
Radomir
Could you please provide answer to
For 2nd Question, You are right the second Paragraph no. 3 as per screenshot are footnote and we need only Paragraph for FTS search result. We don't need footnote. But, I need confirmation from Morgan.
For 3rd Point
Actually, You are looking html was generated by old algorithm. here, I have attached same html where ID is generated uniquely. so you can get idea from this html file.
Could you please send one drive link as we are unable to open the link. and please provide the bug no. which you resolved.
And please provide what we need to do ? Can we replace search controller from one drive ?
We are using following link but it will be not open.
https://1drv.ms/u/s!AugzRBG6eTFwg4sJwygyvkWxOQnMWg?e=gfh2JQ
Is it possible to send only Search Controller file so we can replace it ?
Please provide One drive link.
As of issue numbers, the only issue I see is with HTML parsing. Please check my previous comments where I listed your bugs numbers and search payload I used to test.
Responding to
Thanks,
Morgan
I'm sending you paragraphs extracted from this document. The first column is the paragraph number, the second column is the paragraph test. Please review and correct what needed.
In this document I didn't include footnotes. The plan is to add each footnote to the paragraph that references it. However, we need to make a proper paragraph extraction first.
My assumption was that the paragraphs will have unique numbers withing the document. However, that doesn't appear to be the case. Check, for example, paragraph with number (1) - it appears in multiple parts of the document. That means in search results we should have multiple result paragraphs (1) for the same document, correct?
Morgan
There is one question for you. When Indexing will run every day then ParaId parameter will be change for same document ?
If yes then it will create an issue for us because we have bookmark functionality in application where user can bookmark any searched pinpoint paragraph.
We are using ParaId to get the HTML or PDF text.
For example, any user search the keyword and bookmark paragraph 8 in application.
When we saved the bookmark Paragraph 8 then html text is display through API. But if we check next day after regenerate the indexing the ParaId parameter which we saved is changed.
We should same ParaId for all time to fetch the HTML or PDF text.
Please provide your feedback. Also, We are available to discuss on Skype between 11:00 AM to 6:00 PM IST.
Depending on feedback I get on paragraph extraction questions, we'll probably need to change indexing and how paragraphs are referenced in the index. I'd wait on these answers before making further changes.
I talked with html team and as per your html provided above the Paragraph number 1's ids are different.
1st Paragraph Id : pa1
2nd Paragraphed : pa1.1
so if any user search the keyword and it will match in both paragraph then we should display both paragraph number in application.
We should stop to update the ParaId while indexing are regenerate because we are using paraid in bookmark functionality to retrieve html text.
Following up on the above, yes, it will be common in certain documents for the same paragraph number to be used across the same document, particularly within Treaties and Arbitration Rules. This is why paragraph IDs need to be used as the unique identifiers for paragraphs and footnotes across a document.
For example, within ARB/0029 - ICSID Arbitration Rules (2006) you see that each rule contains subparagraphs that are numbered according to conventional bulleting where the numbering restarts under each rule. As a result, to ensure each subparagraph can be uniquely identified, we have inserted references to relevant rule or section into each paragraph ID:
Therefore, as
Thanks,
Morgan
ParaId that I said is not convenient for bookmarking is not your paragraph ID. It's dtSearch document number which we used for fast retrieval of found paragraphs. For bookmarking, you should use your paragraph ID and I'll look into changing web service to support data retrieval using this id.
How should footnotes be indexed and referenced? Should they be indexed with a paragraph referencing them, or separately?
The footnotes should be indexed separately without the paragraph that references them. In other words, we want the user to be directed to the text of the footnote itself if that produces a hit for the searched keyword. Does that make sense?
Morgan
We are using same ParaId for retrive text from HTML & PDF in bookmark which we use in FTS module.
So my assumption is you will change this in your web service and it will work in both FTS and bookmark. There is no need to change anything in application.
Please note that we save the URL in database where we used ParaId Parameter for bookmark.
There are 2 bugs produced by Industrial Team. Please look into this.
To check both issues you can use ISLGRebuild database on server. and all PDF & Html documents you can find from following path : E:\ISLGRebuildDemo\wwwroot\Documents
21136 : Search > All words > no pinpoint references
Steps to reproduce:
Result: Search results cards do not include pinpoint references
Expected: A paragraph link must exist for each paragraph in the document where at least one of the keywords from the search was found
21138 : Search > Keyword not highlighted
Steps to reproduce:
Result: No keyword is highlighted in the result. The keyword does not seem to appear in the paragraph.
Expected: Only paragraphs with keyword appearances will be available as pinpoint references to preview excerpts. The keyword must be highlighted in the excerpt preview.
I made changes to the HTML parser. To see how extraction worked for your sample file, check:
On the OneDrive (https://1drv.ms/u/s!AugzRBG6eTFwg4sJwygyvkWxOQnMWg?e=2MNKXq) under the folder "2021-01-23" you can find updates for the indexer and web service.
Before indexing with the new indexer, delete FTS index folders completely - new index is needed as there have been some changes in the structure.
In the results, paraId is now a string, not a number, and it looks like this:
In the paragraph highlighting request you also need to send this id:
Let me know if you have any question.
As of the bugs you sent:
21136 "A paragraph link must exist for each paragraph in the document where at least one of the keywords from the search was found"
This is new requirement for me. We'll need to parse complex queries that contain multiple terms and boolean expressions to extract keywords only and then find paragraphs. I'll let you know when this is ready.
21138 I'm not getting any results for "like" so cannot reproduce this.
We need to discuss following bugs with you on skype call. Please provide your confirmation. We are available between 11:00 AM to 6:00 PM IST.
21138 Search > Keyword not highlighted
Steps to reproduce:
Result: No keyword is highlighted in the result. The keyword does not seem to appear in the paragraph.
Expected: Only paragraphs with keyword appearances will be available as pinpoint references to preview excerpts. The keyword must be highlighted in the excerpt preview.
21217 : Preview excerpt > must not be displayed if not keyword was included in query
Steps to reproduce:
Result: Search results include pinpoint references to preview excerpts
Expected: No preview excerpts should be available because I have not included a keyword in my search query
21218 : empty results appearing in search
19515 : Any words search > no highlight (In Subject Navigator Index)
Steps to reproduce:
Result: The search is performed and the tree is filtered to display only matching branches (and parent branches of). No matching terms are highlighted in the results.
Expected:
20613 : Search > Fuzzy Typo not working (In Subject Navigator Index)
Steps to reproduce:
Result:No results found
Expected:Should see results that correspond as if I entered "blue"
20867 : SN > Fuzzy typo not working (In Subject Navigator Index)
Steps to reproduce:
Result:No results found
Expected:Should see results for "award"
We can talk tomorrow about 13:30 IST but please provide me with the examples to test today. There's no much sense is having a call if I have no access to test data.
Thanks,
Radomir
You can check those bug on database : ISLGRebuild (server ip : 10.68.138.11)
The Document Path : E:\ISLGRebuildDemo\wwwroot\Documents
We will discuss more by tomorrow 1:30 PM IST.
21217 : Preview excerpt > must not be displayed if not keyword was included in query
This was intentional as you didn't specify desired behaviour when there is not search query. On OneDrive (https://1drv.ms/u/s!AugzRBG6eTFwg4sJwygyvkWxOQnMWg?e=2MNKXq) under the folder "2021-01-28" you can find update web service that fixes this.
Other issues that you mention I could not reproduce as I couldn't find the keywords you mention in the index. I guess we're still indexing different data?
For tomorrows call, please prepare to send me indices for FTS and subjectnav, as well as indexing logs created indexing one and the other collection.
As discussed in today's call, Here I have attached 2 zip Folder.
1) Full Text Search (Include Indexes folders & indexlog File)
2) Subject Navigator (Include Indexes folder & indexlog File),
Also, I have attached JSON format which we passed to Webservice.
Fuzzy Type Bug :
{"searchRequest":"blun","SearchType":"3","Stemming":false,"Synonyms":false,"Fuzzy":true,"Fuzziness":"1","FilterStatement":{"type":"boolean","Operator":"and","clauses":[{"type":"match","field":"DocumentContentTypeId","values":["37","13","12"]}]},"PageNum":0,"PageSize":20}
Highlight issue with like keyword
{"searchRequest":"like","FilterStatement":{"type":"boolean","Operator":"and","clauses":[{"type":"match","field":"DocumentContentTypeId","values":["37","13","12"]}]},"SearchType":"3","Stemming":false,"Synonyms":false,"Fuzzy":false,"Fuzziness":"1","SortField":"hits","SortOrder":"desc","PageNum":0,"PageSize":"20"}
1) Your FTS index was not created with the latest update that I sent you on Jan 23. I can tell that by "#para_" appearing in the paraId string. I also sent you sent you source code update for it so you can see for your self. The update was also trimming extra whitespace that appears in some HTML element IDs. As your index was not created with the latest update, there's no much sense testing it.
I've noticed in your index that some paragraph IDs even use symbols (I saw some kind of dot). I'm not sure if dtSearch will properly work finding those. To prevent issue with these, I made a change to the indexer to encode both file path and paragraph.
Please, take the indexer update from OneDrive folder 2021-01-30, delete old FTS indexes and re-index!
https://1drv.ms/u/s!AugzRBG6eTFwg4sJwygyvkWxOQnMWg?e=2MNKXq
2) You cannot search for keyword "like" specifically because it's a stop word, listed in the file noise.dat (dtsearch default list). Review that file and remove all keywords you want to use in search - or delete the file.
After that do full re-indexing!
3) Why do you think that "blun" should find "blue" with Fuzziness=1? Did you find this example in the dtSearch documentation? I see that it finds it when Fuzziness=2 so I'd say that fuzzy search works.
4) Subject Navigation highlighting for more than one term works for me:
5) Why do you send "SearchType": "3" in your search request? Are you sure this binds to the desired search type?
Please see following answer of
Why do you think that "blun" should find "blue" with Fuzziness=1? Did you find this example in the dtSearch documentation? I see that it finds it when Fuzziness=2 so I'd say that fuzzy search works.
Alos,
You cannot search for keyword "like" specifically because it's a stop word, listed in the file noise.dat (dtsearch default list). Review that file and remove all keywords you want to use in search - or delete the file.
After that do full re-indexing!. We will re-indexing by this week and let you know once it will be completed.
You cannot search for keyword "like" specifically because it's a stop word, listed in the file noise.dat (dtsearch default list). Review that file and remove all keywords you want to use in search - or delete the file.
After that do full re-indexing!. We will re-indexing by this week and let you know once it will be completed.
Re bug no. 21138, please note that in the legacy application, we have removed all keywords from the noice.dat file. Please do the same for the new application to ensure keywords similar to "like" will generate hits.
Re bug no. 20613,
Thanks,
Morgan
I understand your question about the fuzzy search but we don't have an insight into dtSearch internal implementation. The web service passes parameters to the dtSearch API and apparently that works, considering there are results for fuzziness=2.
Unless
Thanks,
Radomir
At the same time, please note my instructions on bug no. 21138 and removing all keywords from the noise.dat file.
When can we expect the FTS and Subject Navigator searches to be fully implemented within staging.investorstatelawguide.com?
Thanks,
Morgan
Morgan
Thanks,
Morgan
Thanks,
Morgan
Today, We are going to create Indexing on server. But, Don't know it is not created any module indexing.
Following, I have mentioned Server Details :
Server IP : 10.68.138.10
TologixDBIndexer : E:\DevContegraISLGRebuildStagingDBIndexer
Document Path : E:\ISLGRebuildStaging\wwwroot\Documents\
Indexing Path : E:\DevContegraISLGRebuildStagingIndexes\
Database Name : ISLGRebuildStaging (Server 10.68.138.11)
Please take note that now we are going to create indexing on migrated data. so the amount of data is large.
Please look into this as high priority as tomorrow we need to deploy the project on staging server.
Also, Here, I have attached Indexing log after tried to generate indexing.
> Today, We are going to create Indexing on server. But, Don't know it is not created any module indexing.
You can put indexer and index files wherever you like, the same as you did so far. I really don't understand what are you asking me to do here.
Today we are going to Indexing of migrated data. We are using database ISLGRebuildStaging on server 10.68.138.11.
Please note that now we are using migrated data. Hence the amount of data is so large.
Here, I am attached FTS module Indexing and Indexing log file.
But, some how the indexing was not generated properly for all modules.
2021-02-04 05:24:12,541 [1] ERROR - Failure in retrieving data from the database
System.Data.OleDb.OleDbException (0x80040E31): Query timeout expired
at System.Data.OleDb.OleDbDataReader.ProcessResults(OleDbHResult hr)
at System.Data.OleDb.OleDbDataReader.NextResult()
at System.Data.OleDb.OleDbCommand.ExecuteReaderInternal(CommandBehavior behavior, String method)
at System.Data.OleDb.OleDbCommand.ExecuteReader(CommandBehavior behavior)
at System.Data.Common.DbDataAdapter.FillInternal(DataSet dataset, DataTable[] datatables, Int32 startRecord, Int32 maxRecords, String srcTable, IDbCommand command, CommandBehavior behavior)
at System.Data.Common.DbDataAdapter.Fill(DataSet dataSet, Int32 startRecord, Int32 maxRecords, String srcTable, IDbCommand command, CommandBehavior behavior)
at System.Data.Common.DbDataAdapter.Fill(DataSet dataSet, String srcTable)
at DBIndexer.SampleDataSource.RetrieveDataFromDB() in p:\contegra\contegra-tologix\DBIndexer\SampleDataSource.cs:line 582
According to the time of log messages, looks like the query expiration is 30 seconds.
Now, Data size increase. so you need to check all the queries and views which we provided to you and increase the time to generate indexes. Because due to large data the views nd query is taking time to return all the data.
1) Subject Navigator
2) Dispute Document
3) Full Text Search.
You can get the indexer update from 2021-02-04 folder on OneDrive
https://1drv.ms/u/s!AugzRBG6eTFwg4sJwygyvkWxOQnMWg?e=2MNKXq
I enabled unlimited timeout for the sql command.
BTW I recommend you to edit TologixDBIndexer.exe.conf and set the default logging level to INFO. Currently, it's on DEBUG and your indexing log would be huge.
How long does it take to FE_MetafieldwithValueDynamicFTS to complete and start returning data? I startedthe process almost half an hour ago and still waiting....
However, looking at the attached example, I think that the 3rd column ("idFiltered") makes more sense to return to user as the paragraph number. It better indicates paragraph, footnote, etc.
I think this is really a question for
Thanks,
Morgan
The Full Text Search module indexing is created now after replace the TologixDBIndexer.exe and also we replace to IFNO in config file.
But, Subject Navigator module is not generateing the Indexing.
The Indexing log says The Table and View not found. But, We have vw_SubjectNavigatorSearch on ISLGRebuildStaging Databse.
The Index log file.
2021-02-05 02:11:45,989 [1] INFO - Execute IndexJob
2021-02-05 02:11:46,320 [1] INFO - Retrieve data from database
2021-02-05 02:12:17,008 [1] WARN - No tables/views found
2021-02-05 02:12:17,141 [1] INFO - Done
Could you please check and confirm ?
For this case you may want to enable DEBUG log to check any other messages.
I have checked all things are OK. Can we take quick call to check ?
The following 2 modules Indexing are not created and indexing message says :
1) Dispute Document
2) Subject Navigator
Can we take call on Monday (8th February) between 11:00 AM to 6:00 PM IST ? Please provide your confirmation
Indexing message :
2021-02-05 02:11:45,989 [1] INFO - Execute IndexJob
2021-02-05 02:11:46,320 [1] INFO - Retrieve data from database
2021-02-05 02:12:17,008 [1] WARN - No tables/views found
2021-02-05 02:12:17,141 [1] INFO - Done
The Subject Navigator indexing was also suffering from the time out issue. I set it to unlimited. You can download indexer update from folder "2021-02-06".
I have no idea why you got the message that to tables/views were found. I'm also sending you my indexer config file in the above mentioned folder.
Maybe you should look into speeding up the view with indexes or something. I run the indexer and it took about 40 minutes to start receiving the data, and about 20 to index.
Can we take call ? We are available to discuss regarding SN & Dispute Document Library Indexing .
The Subject Navigator Indexing are generated now. But, the Dispute indexing not generated.
Following is indexing log while we tried to generate Dispute Indexing :
2021-02-08 03:41:06,132 [1] INFO - Execute IndexJob
2021-02-08 03:41:06,432 [1] INFO - Retrieve data from database
2021-02-08 04:24:44,463 [1] ERROR - Failure in retrieving data from the database
System.Data.OleDb.OleDbException (0x80040E14): Could not allocate space for object 'dbo.WORKFILE GROUP large record overflow storage: 149430086533120' in database 'tempdb' because the 'PRIMARY' filegroup is full. Create disk space by deleting unneeded files, dropping objects in the filegroup, adding additional files to the filegroup, or setting autogrowth on for existing files in the filegroup.
at System.Data.OleDb.OleDbDataReader.ProcessResults(OleDbHResult hr)
at System.Data.OleDb.OleDbDataReader.NextResult()
at System.Data.OleDb.OleDbCommand.ExecuteReaderInternal(CommandBehavior behavior, String method)
at System.Data.OleDb.OleDbCommand.ExecuteReader(CommandBehavior behavior)
at System.Data.Common.DbDataAdapter.FillInternal(DataSet dataset, DataTable[] datatables, Int32 startRecord, Int32 maxRecords, String srcTable, IDbCommand command, CommandBehavior behavior)
at System.Data.Common.DbDataAdapter.Fill(DataSet dataSet, Int32 startRecord, Int32 maxRecords, String srcTable, IDbCommand command, CommandBehavior behavior)
at System.Data.Common.DbDataAdapter.Fill(DataSet dataSet, String srcTable)
at DBIndexer.SampleDataSource.RetrieveDataFromDB() in p:\contegra\contegra-tologix\DBIndexer\SampleDataSource.cs:line 585
2021-02-08 04:24:44,582 [1] INFO - Done
The Subject Navigator Indexing are created but we find any word the Web service returning the following error :
You can Find SubjectNavigator Indexing on following path :
Server : (10.68.138.10)
Indexing Path : E:\DevContegraISLGRebuildStagingIndexes\SubjectNavigatorIndex
Error :
Unable to access index D:\Contegra Indexes\tologix-index-subject\ D:\Contegra Indexes\tologix-index-subject\index_r_1.ix file is truncated. Committed size=79508586 Actual size=58556416 (file: index_r.ix); No files retrieved in search.
Please look into this and provide the solution.
We copied indexing from server to local system. Could you please check this indexing generated properly or not ?
The indexing is generated in following path on server :
Indexing Path : E:\DevContegraISLGRebuildStagingIndexes\SubjectNavigatorIndex
and the current size is as it says later in the log, I guess after you indexed more data:
I guess your copying process failed.
The Dispute Document View is taking so much time to execute and get the data so we have optimized using tempTable.
For using tempTable, We need to convert into Stored Procedure.
If we assign stored procedure in place of View name in following file in TologixDBIndexer Project then it will work ?
File Name : indexer-config-dispute-docs.json
We replace the following things :
Current : "IndexTablesViews": [ "VW_DisputeContegraSearch" ],
Replace : "IndexTablesViews": [ "SelectDisputeContegraSearch" ],
We will use Stored procedure in place of View. The Column we are getting in Stored procedure are same as View.
You can check this thing on ISLGRebuildStaging Databse.
Please let us know.
Could you please try to Search this thing in Dispute Document Library.
Database : ISLGRebuildStaging on server (10.68.138.11)
Search Keyword: 9REN
Lanaguage : Spanish
We are not finding any result on above search data.
Please provide your feedback
We are using following 2 files for DisputeDocument Indexing. These files we attached from our local system so you can get idea.
Per your config, document identifier column for indexer is "ContentTypeDataMasterID". However, stored procedure "FE_MetafieldwithValueDynamic" provides IDs which are not unique in the results. For example, ContentTypeDataMasterID 12400 appears 25 times.
Whenever a row with the same ID appears, it will override previously indexed row with the same ID. In your case, because of all the duplicates, after 89434 rows indexed, there are only 9556 in the generated index. Almost 90% of data was overwritten.
Looks like you have the same problem with the VW_DisputeContegraSearch view.
We had the same discussion about this around April 17 last year. Please reference comments above around that date. Back then it was said that you will use add "RowId" as an identifier column. It see that FE_MetafieldwithValueDynamic has it, but VW_DisputeContegraSearch does not. After adding the column, update your config files ("DocIdColumnName": "RowId") and re-create Dispute indexes.
Hope this helps.
Due to previous view there was issue to create indexing to large data so we are using the stored procedure (SelectDisputeContegraSearch) in place of VW_DisputeContegraSearch.
Can we take call today between 1:00 PM to 6:00 PM today to resolve this issue ?
We update Indexing file of Dispute (indexer-config-disputes.json) and update "DocIdColumnName": "RowId" but still it does not work.
Please we need to take call to resolve this issue. We are available between 1:00 PM to 6:00 PM IST.
I'll be available for a call within the next two hours but I still need your config files.
As per call, Here, I have attached both Indexing file after updating RowId.
Database Name : ISLGRebuildStaging
For dispute Json, we are using this SP : FE_MetafieldwithValueDynamic
For dispute-docs Json, We are using this SP : SelectDisputeContegraSearch
Both SP have unique Identifier RowId.
We checked after re-indexing but it stills does not work.
Please let us know your inputs.
We have made new query SelectDisputeContegraSearch in place of view VW_DisputeContegraSearch.
We are using this query SelectDisputeContegraSearch for dispute-doc indexing.
IF you will indexing both queries then within 20-25 min the indexing will be done.
ERROR - Failure in retrieving data from the database
System.Data.OleDb.OleDbException (0x80040E14): Could not allocate space for obje
ct 'dbo.WORKFILE GROUP large record overflow storage: 141537287733248' in datab
ase 'tempdb' because the 'PRIMARY' filegroup is full. Create disk space by delet
ing unneeded files, dropping objects in the filegroup, adding additional files t
o the filegroup, or setting autogrowth on for existing files in the filegroup.
at System.Data.OleDb.OleDbDataReader.ProcessResults(OleDbHResult hr)
at System.Data.OleDb.OleDbDataReader.NextResult()
at System.Data.OleDb.OleDbCommand.ExecuteReaderInternal(CommandBehavior behav
ior, String method)
at System.Data.OleDb.OleDbCommand.ExecuteReader(CommandBehavior behavior)
at System.Data.Common.DbDataAdapter.FillInternal(DataSet dataset, DataTable[]
datatables, Int32 startRecord, Int32 maxRecords, String srcTable, IDbCommand co
mmand, CommandBehavior behavior)
at System.Data.Common.DbDataAdapter.Fill(DataSet dataSet, Int32 startRecord,
Int32 maxRecords, String srcTable, IDbCommand command, CommandBehavior behavior)
at System.Data.Common.DbDataAdapter.Fill(DataSet dataSet, String srcTable)
at DBIndexer.SampleDataSource.RetrieveDataFromDB() in p:\contegra\contegra-to
logix\DBIndexer\SampleDataSource.cs:line 585
INFO - Done
DEBUG - Getting properties for db row 0 in table SelectDisputeContegraSearch
ERROR - Object reference not set to an instance of an object.
System.NullReferenceException: Object reference not set to an instance of an obj
ect.
at DBIndexer.SampleDataSource.GetNextDoc() in p:\contegra\contegra-tologix\DB
Indexer\SampleDataSource.cs:line 156
INFO - Done
I'll try indexing staging DB.
We are able to generate indexing. What we will do now ?
We created Indexing in our local environment but we used same databse ISLGRebuildStaging in our local environment.
Please try to create Indexing using ISLGRebuildStaging Databse.
Search for "9REN" returns 11 results. However, when you add a language filter it doesn't find anything. The problem appears to be "SelectDisputeContegraSearch" which doesn't provide the language - it's always empty, that's why it cannot be found. (The Language that you see in the dispute results are coming from the dispute-docs index but the field has to be in the disputes as well as that's where we apply the filter.)
The above indexing result is Dispute Data only we also need Dispute Document data.
The Language we are finding from Dispute Document Data. One Dispute entry is associated with multiple Dispute Documents.
We have changed the Query logic of FE_MetafieldwithValueDynamic. Now each value has one row. Because if we using Pivot in query it is taking so much time due to data.
So we modified the query and make each value has each row. if you find RowId (211742, 211743, 211744) the language data is available. so when user search with 9REN searchrequest and language Spanish then all data we need.
For Dispute Indexing the Row Id for Documents are (51167,
51168,
51169,
51170,
51171,
51172,
51173,
51174,
51175,
51176,
51177,
51178,
51179,
51180,
51181,
51182,
51183,
51184,
51185,
51186,
51187,
51188,
51189 )
and Dispute Document indexing the is the RowId Documents are(
211742, 211743, 211744
)
We need all Dispute & Dispute Document Data while user combined the search
Please suggest.
We are able to take call and discuss.
On September 9 you wrote: "As discussed, You will get one column DisputeId from 1st SQL View and you need to pass DisputeId column value to 2nd SQL View and provide us JSON format result to bind data as per our model."
That's how it works:
Thanks for update. We have updated FE_MetafieldwithValueDynamic query same as previous and we fetched data as per ContenttypedataMasterId. Now no need to change in config file.
But, Now for 2nd Query SelectDisputeContegraSearch we need all ContenttypedataMasterId result.
Suppose, in second query if Claimant Column data in multiple row so we need all rows in model.
Currently, your API returns only one Row.
Please let us know so we can take call and communicate.
If #1 needs to be changed:
The Dispute Document issue we resolved. We added RowId for SelectDisputeContegra (2nd Query) and change in config file for disp-doc to set "DocIdColumnName": "RowId" and it seems work now.
The Other issue we found that if we search keyword with "Tribunal" in Full Text Search module. You API is taking 8 to 10 second to just give response to us. the total record we found near about (6000).
Could you please improve that and provide updated Search Controller ?
You Can use database ISLGRebuildStaging.
(to test it on my own I'd have to create full index all first)
Near about 6000 total records we are displaying on FTS Page after convert JSON in to Model. Following is the JSON for this search.
Considering the huge results response, the search time is not that surprising, especially because we don't limit results in order to have proper sorting by field. Do you use the option to sort search results? By which field do you sort? If you can, please send me the complete search request json.
You can use following database to generate Indexing.
Database Name : ISLGRebuildStaging (10.68.138.11)
The Document Path is Following on Server ip (10.68.138.10)
Document Path : E:\ISLGRebuildStaging\wwwroot\Documents
By default, We sorting on Relevance. Following is Criteria.
Following is the JSON File which we passed when we search with "Tribunal" word.
I am the QC guy from DEVIT side. I have raised one issue for FTS Search :
Step to Reproduce:
1) Go to FTS module
2) Search the text "Tribunal AND Absence" with selecting "Boolean" search type
3) Click on Search button
Actual Result: System displays the files which has Tribunal or Absence word.
Expected Result: System should display the files which has Tribunal and Absence both words.
As per your comment for "21138 - Search > Keyword not highlighted" issue, we delete the noise word and did re-indexing.
After completion of re-indexing, when the user search the noise keyword "Like", Search is working properly but a "Like" word is not highlighted in the result.
@Darsh
1) Search for "tribunal" gives 6059 results, search for "tribunal AND absence" gives 2137 results. By the numbers, it looks fine. Can you give me more details about the context where you see this issue?
2) That still sounds like the index was generated with stopwords list. How many results you have searching for "like"? If it gives back all documents and all paragraphs, it's still there as a stopword.
OK. Please provide the update on Monday.
For Darsh Issue,
He said that when we search with "tribunal AND absence" There are 2137 result found and then user click on any paragraph only "tribunal" word is highlighted. But, As per expectation only those paragraphs need to be find where both "tribunal AND absence" words are available.
Isn't that in conflict with this issue where you want only those paragraphs where both are found?
I am not 100% sure regarding Dtsearch rules.
If you prefer only paragraphs containing all keywords, then we have to modify how how search works - the main search should be in the paragraphs index then, followed by getting documents where matching paragraphs are found, not vice versa.
Here is a video explaining how the highlights should work depending on whether the user is performing a search with "All Words" or "Boolean" when "and" is included as a search term. If there is any further confusion on these requirements, please refer to how the search results are produced in the legacy application:
Thanks,
Morgan
We have removed noise word from noise.dat from TologixDBIndexer Project and then re-indexing.
Is there any other noise list which we need to remove ?
After changing the stopwords list you have to delete the old index and re-index because the noise words are copied to the index folder when index is created. In an existing index you can see used stopwords in index_n.ix file.
We did Same thing. In following path of server we have latest Indexing for all modules. Could you please check it that is it OK or Not ?
Server : 10.68.138.10
Indexing Path : E:\DevContegraISLGRebuildStagingIndexes
In order to workaround the performance issue I had to create an additional index as a part of FTS indexing. On my system, search for "tribunal" is about 10x faster now.
To specify path of this new index use property "FieldsIndexDir" of the indexer config file for FTS.I sent you a sample config file as well.
The SearchController has been updated as well. You need to add "FullTextFieldsIndex" with path to the new index to the application.conf
Let me know how this worked for you.
We are going to do changes as per your above instruction. But in Web Search Project after copied SearchController file the following error is given .
Please suggest.
The Indexer is also not cratering any indexes and given following error after copied TologixDBIndexer.exe file.
System.InvalidOperationException: Nullable object must have a value.
at System.ThrowHelper.ThrowInvalidOperationException(ExceptionResource resource)
at System.Nullable`1.get_Value()
at DBIndexer.CmdLineIndexer.runMainIndexer(DocFieldsDataSource fds, ParagraphDataSource pds) in p:\contegra\contegra-tologix\DBIndexer\CmdLineIndexer.cs:line 199
at DBIndexer.CmdLineIndexer.run() in p:\contegra\contegra-tologix\DBIndexer\CmdLineIndexer.cs:line 123
at DBIndexer.MainForm.Main(String[] args) in p:\contegra\contegra-tologix\DBIndexer\MainForm.cs:line 181
I'm checking indexer and will be back to you in a minute.
We are getting error when we are trying to generate FTS indexing.
The Subject Navigator & Dispute Document indexing is created suceeesfuly.
Following, is our local Indexer config file for FTS and Indexing log file of FTS.
The FTS indexing is stuck after 2 RowId. It is stuck from last 15 minute. Here, I have attached log file.
To troubleshoot, try enabling DEBUG logging level, re-start indexing and see what's logged.
I have again started the Indexing but still after 2 RowID. It is stuck. How I will enable Debug mode ?
We are currently try to indexing in our local environment. but database is same as ISLGRebuildStaging.
After updating PDFHighlighter URL, still FTS indexing is stuck after ROWID 2. Please suggest how we will resolve this issue ?
<root>
<level value="DEBUG" />
<appender-ref ref="ConsoleAppender" />
<appender-ref ref="FileAppender" />
</root>
BTW, is there a problem with the VPN currently? I wanted to try re-indexing on the 10.68.138.10 server but I cannot connect although I'm on the VPN. Any idea?
Also, I am not able to find TologixDBIndexer.exereplace in TologixDBIndexer Project. Could you please guide us how to enable debug mode ?
Can we connect quickly on Skype to resolve this FTS indexing issue in our local environment ?
I'm trying to run indexing again - currently waiting for your proc to give results and start indexing. I'll get back to you soon.
Please send my your config file. I'd like to check it and try it.
I don't think that having a skype now is productive considering we have to wait almost 30 minutes for the stored procedure to give results. Can you reproduce the issue indexing dev database? Or maybe you can isolate the issue by checking what row #2 is and making a proc that returns data set onlt with that row?
Following, I have attached my config file of FTS.
I am going to Pull ISLGRebuildStaging database from server (10.68.138.11) in my local system and again try to re-indexing. If we still get same issue then will let you know.
Let me know how it worked for you indexing stage.
The Indexing issue is still there. We have download whole ISLGRebuildStaging database from server and hosted in our local environment but still after 2 Row the Indexing is stuck.
Is there we missed any thing in TologixDBIndexer project ?
Can we take call and sort out this issue ? It is critical for us to check FTS module with Indexing.
I have copied your DBIndexer3 Project from Server and replaced my indexer config but now we are getting following error.
The Old Indexer is working fine for FTS. there is might be something with new Indexer exe which you provided on 1st March.
Please look into this and let us know how we will resolve this issue to generate Indexes for FTS module.
I'll add more logging to the indexer so that we can figure out where exactly stops for you.
We have updated TologixDBIndexer.exe file and tried to generate FTS indexes and following log is started.
Please confirm that is it Ok or not.
I believe this is why it doesn't work for you. Check the path for the fields index in your config file - I guess you didn't put double back-slash in the folder path.
We will let you know if we still face an issue.
The FTS indexing is created now. but we haven't seen to improve any performance issue. When we search with "Tribunal" word it is still taking so much time to get the response from API.
Following are values.
time 1 : 5308
time 2 : 1895
time 3 : 6157
time 4 : 10232
time 5 : 3566
I'm getting way lower values, all done in 2s on the first run, the next one 1.5s.
I hosted published code in different IIS and it seems fine. Let us know once you will be completed the FTS changes as per discussion on Monday (1st March)
Further to our discussions on the searches in Full Text Search and Subject Navigator during my meeting with DevIT and Industrial earlier today and to give
Morgan
3. I can investigate this.
4. I know why - after it was said to get paragraphs containing any keyword in query, I changed search type of the query for paragraphs to "any keyword". (The man search for the documents is still running as boolean search.) This can be fixed but, if we need to change search to match all keywords in the paragraph, then the complete FTS search needs to be reworked so no point in fixing it.
5. I will investigate this.
6. It's a tricky one. Indexes are huge because in the created index we cache the complete document text and the original file. This is needed by dtSearch to do highlighting faster, although I think it's really needed only for PDF highlighting.
It's possible to make this PDF highlighting work without storing the original file content in index, but at the cost of the highlighting performance.
An another issue with removing original files from the index is that we'd have to create a separate index only for PDFs, using full document path. This is because the current organization of data allows the same file to appear in multiple indexed documents - I hit this issue at the beginning of FTS implementation and had to make changes to accommodate data in database.
In short, we can bring the index size down by creating one more index.
I just had a call with
In the interim,
Thanks,
Morgan
Now, to troubleshoot this issue, I need from you:
Radomir
{ "searchRequest": "tribunal", "FilterStatement": { "type": "boolean", "Operator": "and", "clauses": [ { "type": "match", "field": "ispdfonly", "values": [ "true" ] } ] }, "SearchType": "3", "Stemming": false, "Synonyms": false, "Fuzzy": false, "Fuzziness": "1", "SortField": "hits", "SortOrder": "desc", "PageNum": 0, "PageSize": "20" }Did you have PDF documents on your system when you generated the index? Any errors in your FTS indexing log?
Have you updated Indexer for FTS ?
We checked with tribunal and nationality words and it works fine. We get only those documents which have pinpoint references.
We are looking in to Subject Navigator SQL View and will check again (2nd Point).
Let us know once you completed rest of points.
If you see other issues, please provide details as I asked you above so that I can cross-reference data and index created.
Can we take call for Subject Navigator result ?
What you need from my side. if I am going to search with good faith then result is not generated properly.
Please let us know what I need to provide you.
So, as asked above, I need from you:
Please see following screenshot. If search with good faith then we need following row.
You can find this data on ISLGRebuildStaging databse. We have just updated the SQL View.
Id : 24338
BranchID : 17559
ParentId : 5579
Branch Name : See "Good faith"
HierarchicleParentIds : 1,5579,17559,4202,4661,5207,5579,5580,5583,5585,5597,5606,5657,5661,5686,5688,5693,5695,5708,5724,7523,7826,8002,8142,8258,8594,9588,9684,10431,11241,11258,11685,12649,12959,13084,13181,13666,13693,14749,15532,15562,16154,17381,17559,19044,19092,20332
SelectedNodes : 1,5579,17559
I also noticed that there is no any Branch good faith under B then why your JSON response provide B branch.
There is no any associated detail with good faith word under B branch.
I just indexed SubjectNav after you updated the proc and will check the previous thing you sent me.
It is Parent - Child Branch Structure in Subject Navigator. The good faith word under branch name A > Abuse of process > "see good faith". But your response is not providing A branch nodes.
It is very complicate to provide you the result of all branches but at first point we need this nodes from response.
Can we take short call quickly so i can provide you detail ?
However, I see an issue with this search because nodes in the SelectedNodes are not unique and values repeat a lot. I'll change that.
Another issue I see with this search is that, when you pull all results, the search is slow because all results are being highlighted. I think we should also use progressive highlighting for this. I'll add some options for this.
We are not using Paging in SN module. we need to get all data.
1) My understanding is that for the Subjects Navigator you need all results immediately. I believe you were not getting some nodes because you were taking the first page only and the nodes are not in order.
The "subject-nav" search method ha been modified to return all results, but without highlighted fields. That should allow you to build the complete page with all the content.
One more new thing is that in the response you can find "timeLog" array with a log of time spent in different stages. For example, executing Subjects Navigator search for "good faith" on my system takes 620ms:
I would suggest logging "timeLog" to the browser console. Then you can easily review this when testing.
2) Next, to get highlighted fields for the Subjects Navigator search, you need to use "highlight-subject-nav" service method. The request payload is the same as for the "subject-nav", with addition of "FieldFilterName" and "FieldFilterValues" fields that limit results.
For example, after adding the results (from "subject-nav") to the page, you could collect the "id" field values of all results visible in the viewport, and possibly maybe for another more page, and send request to "highlight-subject-nav":
{ "searchRequest": "good faith", "SearchType": "AllWords", "Stemming": false, "Synonyms": false, "Fuzzy": false, "Fuzziness": "2", "FieldFilterName": "id", "FieldFilterValues": [18379, 5602, 3082] }Following bugs are found by Industrial team in Subject navigator module. Could you please check and provide update :
Bug no : 22141 [staging] any words > search doesn't disregard words such as "and" or "or"
Steps to reproduce:
Result:The results highlight the word "and" but not abuse or ego
Expected:Words such as "and" or "or" will be ignored in the any words search type, and abuse or ego would be found as matches
Bug no : 22160 [Staging] Stemming > Not working
Steps to reproduce:
Result:The results for "likelihood" don't appear when searching for "like"
Expected:If stemming is selected, I should see results for "likelihood" when searching for like
bug No. 22162 : [Staging] Synonyms > Not working
Steps to reproduce:
Result:No synonyms appear for that word, only bias is shown
Expected:Will see highlighted results for synonyms
However, related to change we made in code during our skype call, I'd suggest you to review on your side do you neednodes from both "SelectedNodes" and "hierarchicalparentids". It looks like that the later one pulls in much more nodes.
Bug no : 22141 - That's how dtSearch works. You put a keyword into the query and asked for any/all words. If you don't want it, don't put it into the query box. Or maybe you can add it as a stopword.
Bug no : 22160 - To my knowledge, "like" is not a stem of "likelihood" - at least not as a default stem in search engines. Where did you get this example? Does this work like that in the legacy application? If it does, you probably have custom stemming rules file (stemming.dat) so please send it to me for a review and inclusing into the project.
Bug No. 22162 - Similar to the previous, do you use thesaur.xml in the legacy application? Or maybe use WordNet synonyms? If you do, you should be able to find these on the legacy server. Check https://support.dtsearch.com/dts0190.htm for info in which folders this might be found.
Please see above comment of
In testing the Subject Navigator search, I am using the description of the search types from the legacy app and comparing results to the legacy app.
"An "any words" search request consists of an unstructured natural language or "plain English" query. In a natural language search request, words such as AND and OR are disregarded. Use quotation marks to indicate a phrase, + (plus) to indicate a word that must be present, and - (minus) to indicate a word that must not be present."
and when I search "abuse and ego" on the legacy application, I get results for "abuse" and "ego and not the word "and". This is not occurring on the current application where I am only seeing results for the word "and" and not results for either "abuse" or "ego".
Please let me know if any of the tests I've performed above aren't accurate or should be modified, however, going by the results presented in both applications there do seem to be some issues related to the search options.
Thanks,
Naomi
But, from your description of the legacy behavior, looks like stopwords are used there.
The indexer should fix the issue with FTS where keywords were matched in the document filename when searching for paragraphs. The results were showing all paragraphs of the document in that case although there were no text patches. You need to re-build FTS index.
Update the FTS indexer config with option "CacheText": false. That will prevent caching of the original documents in the index and significantly lower the index size. During the next week I'll prepare an update that approaches PDF highlighting in a more lightweight way. As you don't have PDF highlighting integrated for now, it doesn't affect you.
Next, the search service have been updated to return all results and nodes for the Subjects Navigator search. However, I'd recommend checking again if you need all nodes from the "SelectedNodes" and "hierarchicalparentids" fields. For example, search for "good faith" has 734 matches and returns more than 8500 nodes. For example, I don't understand why node "A", which suppose to be at the top level, references other nodes:
On my system this search takes about 3.5 seconds to execute and I cannot make it faster as we're fetching thousands of dtSearch documents. Running the search and getting these documents is about 0.2s, but getting the referenced nodes adds 3s more:
The response payload from the search service is more than 17 MB in size. That's a huge response which will take extra time to process on your side as well.
The update also addresses highlighting for boolean expressions for FTS, but I'd recommend also building the new FTS index using the updated indexer before testing this.
P.S. Tomorrow (Monday) I'll be off.
As per above comment for FTS module,
As per my suggestion, We need to change the logic and render all the matching keywords parent and child branches only rather than all branches.
If you're sure you want to do this, in the SearchController line 249 replace with:
Thanks,
Morgan
Please confirm that we will render only matching keyword parent and child branches in SN module. so
Please confirm.
Once Morgan Confirms that only matching branches result we need then we need to change the logic only Selectednodes result we need with highlighted word itself.
As We discussed, my team fully occupied with other application pending stuff which we need to complete before 25th March. so now we can not change anything with current build. Please do something that we can get only selectednodes result with highlighted word it self.
It means we need take selctednode result by using that selectednode, we need all its branchid results in response with highlighted word.
For example, if you search with "abuse of process" then get all rows selectednodid and by using selectednodeid provide all those branchid result.
Like, selectednodedid are (1,5,8,11,15,16) then we need each branchid result of (1,5,11,15,16). It means we need to render only matching keyword Parent and child branches with enable highlighting word.
But, First of all we need to take confirmation from Morgan. so please wait for Morgan 's reply first.
I'm not following what you're requesting me to approve. Please provide me with a concrete example of how this affects the display of the results.
Also, it sounds like we're pushing UI work onto
Thanks,
Morgan
We would prefer to take call with
Let's schedule a call tomorrow 10th March 8:15 AM Vancouver time. so please co-ordinate with
Thanks,
Morgan
Thanks,
Morgan
I'll have a new disputes search method for you later tonight or tomorrow morning.
You need to send it the same request (with the query and filters) as to the "disputes", extended with the additional filtering fields:
"FieldFilterName": "disputeid", "FieldFilterValues": [12877]The subject Navigator is OK now. But, For Dispute Document module if we pass only English language then it is taking so much time to give response back and also as discussed we don't want highlight the word in Dispute Document module also.
Following is my search request when I filter with only English language.
{"searchRequest":"","FilterStatement":{"type":"boolean","Operator":"and","clauses":[{"type":"match","field":"Field_62","values":["234"]}]},"SortField":"FullCitationText","SortOrder":"asc"}
Please look into this and let us know.
Another thing like we send following request we are not able to get any result.
{"searchRequest":"","FilterStatement":{"type":"boolean","Operator":"and","clauses":[{"type":"match","field":"Field_62","values":["235"]},{"type":"match","field":"Field_109","values":["377"]}]},"SortField":"FullCitationText","SortOrder":"asc"}
We have implemented the new method : "disputes-details". But, I have seen that you provided result from first stored proc.
We need result from second strode proc by passing Dispute Id SelectDisputeContegraSearch.
As of the dispute search performance, I guess it's because you're searching without a keyword so you're getting a long list of results. (I believe the highlighting is already off.) Can you please send me the timeLog from the response. It should show the actual time and number of records collected.
Please don't change anything. You don't need to create seprate index by using following code in your SearchController for disputes-details method is fine. We are passing same request (with the query and filters) as to the "disputes", extended with the additional filtering fields:
[HttpPost("disputes-details")]
public IActionResult SearchDisputesDetails([FromBody] SearchModel sm)
{
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
List<string> indexes = new List<string>() { Settings.Tologix.DisputesIndex };
ApiError err = Search(sm, indexes, false);
if (err != null)
{
return new ObjectResult(err);
}
var searchTime = stopwatch.ElapsedMilliseconds;
bool highlight = false; // sm.SearchRequest != "xlastword" && !string.IsNullOrWhiteSpace(sm.SearchRequest);
var r = CreateResponse(sm, false, indexes, highlight, false);
r.TimeLog.Insert(0, "search: " + searchTime);
Stopwatch stopwatch2 = new Stopwatch();
stopwatch2.Start();
// get distinct DisputeId values in results
WordListBuilder wordListBuilder = new WordListBuilder();
wordListBuilder.OpenIndex(Settings.Tologix.DisputesIndex, indexCache);
wordListBuilder.SetFilter(sm.ResultsAsFilter);
int values = wordListBuilder.ListFieldValues("DisputeId", "*", 10000);
log.LogDebug("Found " + values + " disputes (wordListBuilder.Count = " + wordListBuilder.Count + ")");
List<string> disputeIds = new List<string>();
for (int i = 0; i < wordListBuilder.Count; ++i)
{
String word = wordListBuilder.GetNthWord(i);
int docCount = wordListBuilder.GetNthWordDocCount(i);
disputeIds.Add(word)
;
//log.LogDebug("- " + word + " " + docCount);
}
List<string> disputewithcontentypedatamaster = new List<string>();
for (int j = 0; j < disputeIds.Count; j++)
{
disputewithcontentypedatamaster.AddRange(disputeIds[j].Split(','));
}
// get all documents for found disputes
if (indexes != null && disputewithcontentypedatamaster.Count > 0)
{
var disputesResults = new List<ResultDocument>();
FindByField("ContentTypeDataMasterId", disputewithcontentypedatamaster, new List<string>() { Settings.Tologix.DisputeDocsIndex },
disputeRes =>
{
for (int i = 0; i < disputeRes.Count; ++i)
{
disputeRes.GetNthDoc(i);
disputesResults.Add(createResultDocument(disputeRes.CurrentItem));
}
});
r.Results = disputesResults;
}
r.TimeLog.Add("collect nodes for disputeId (" + r.Results.Count + "): " + stopwatch2.ElapsedMilliseconds);
stopwatch.Stop();
r.TimeLog.Add("total: " + stopwatch.ElapsedMilliseconds);
return Ok(r);
}
And I think it works OK. But when we search with only language then it takes long time and still highlighting is on.
Can we take one small call so we both remain on same page.
If We search with only English language the following JSON we are pasing.
{"searchRequest":"","FilterStatement":{"type":"boolean","Operator":"and","clauses":[{"type":"match","field":"Field_62","values":["234"]}]}}
The Postman tool reach out maximum time when we pass above search request.
Sorting is also not working after getting result in Dispute Document module. We are passing sorting filed and type in json request but result is not filtered.
IF I want to filter FullCitation by descending order then we are passing following json request.
Sort Filed : FullCitation
Sort Order : desc
{"searchRequest":"","FilterStatement":{"type":"boolean","Operator":"and","clauses":[{"type":"match","field":"Field_109","values":["403"]}]},"SortField":"FullCitationText","SortOrder":"desc"}
Have you gone through above Dispute & Document module queries ? And, Also Could you let us know that FTS module is done from your side ?
As, We are planning to deploy WebAPI and Indexer on server by early next week.
As of the FTS, I'm not aware of any open issues so I consider it complete from my side.
Here, I have attached updated SearchController and both Json file of Dispute Document module.
I'll try to address the issues in order you reported them:
1) Response taking too long for filter only search:
{"searchRequest":"","FilterStatement":{"type":"boolean","Operator":"and","clauses":[{"type":"match","field":"Field_62","values":["234"]}]},"SortField":"FullCitationText","SortOrder":"asc"}
You could increase timeout in Postman - on my system the request takes about 78 seconds. However, the problem with this query is that it gives way too many results - there are 231956 records returned for initially found 5783 matches! The response is over 200MB!
I think it doesn't make sense to return all results. With a a few different searches, someone could scrape your whole database. I think we should put a hard limit of say 1000 (or 10K) items and never return more than that. Better give an error to user to refine the query.
We need DisputeId first value only which we pass to second query for result. Suppose DisputeIds are (12877,1500,1501,1502) so you can fetch always first value of DisputeID column (12877) and pass to second query.
Because first we need to display only dispute node and then on click we call our second method (disputes-details) to get all data.
I am available for call to discuss.
I'm available for a call atm. Will call you in a few minutes.
As mentioned in the call, sorting is now broken because the sort is applied to the first dtSearch search call when we collect the "disputeId". In the rest of the method we collect those dt documents having the disputeId. However, as we're hitting dt search query size limit (70K, because we need to enumerate all the different disputeIds) this results fetching is run in batches so the sorting cannot be applied in dtsearch.
I think I saw that you have only a couple of different fields here for which you use sorting, both referenced with SortField. Is that correct? In that case I can implement sorting in the controller, after fetching all the results. I cannot finish it today but hopefully you can have this tomorrow afternoon.
And, Let us know once you completed Sorting through Controller and provide update to us once you complleted.
I have replaced the Search Controller and checked with pass only English Language. But, It is still taking time for getting response.
What your suggestion. It will work ?
1) Added sorting by field after collecting all results in Disputes search. (BTW,in your sorting example, the field should have been "FullCitation", instead of "FullCitationText".)
2) Added hard-stop limit for the number of results returned for the Disputes and Subjects Navigator search. It's optional but, if you want to enable it, add "ResultListStopCount" to the Tologix section:
{ "SearchSettings": { "Tologix": { "ResultListStopCount": 10000, ...But, I am still confuse about hard-stop limit for the number of results. Could you explain more for this ?
What is the meaning of ResultListStopCount ?
When this limit is reached, the search service will stop collecting results, it will return what was collected until that point, and will set "HardStop" flag in the response. You could use this flag in your app to tell user that the search is too broad and to refine it.
For the FTS module when we search with nationality word the following PDF's file paragraph 28 number we get in result.
But, when we click on 28 number, we are not able to find the text from PDF files.
We are facing this issue in other files also.
Following, I have attached one of the PDF and Video file. Please check and confirm.
Note that the page you sent me is not page 28. This is a PDF document so the results are pages, not paragraphs. The page 28 contains "nationality":
Many places we found that when we click on Page number it doesn't populate result so this issue resolved for generally or for this specific PDF file ?
For FTS module when we search with "Like" word, On the result some documents display same paragraph multiple times.
Please check the attached video for more information.
Also, please find attached PDF and HTML files for your reference.
Today we have deployed latest version on server (10.68.138.10). but we are not able to find Paragraph number in FTS module.
Could you please look into urgently ?
Following are things you can check on server (10.68.138.10)
DBIndexerProjectPath : E:\DevContegraISLGRebuildStagingDBIndexer
Indexes : E:\DevContegraISLGRebuildStagingIndexes
DocumentPath : E:\ISLGRebuildStaging\wwwroot\Documents
Further to most recent results for the searches in Dispute & Dispute Documents: Re: Dispute Documents search field does not produce any results - TOLOGIX - ISLG App Rebuild, the Subject Navigator: Re: Problem with Subject Navigator search field, live site - TOLOGIX - ISLG App Rebuild and the issues with the FTS above, I am very concerned with the lack of progress in finding resolution on this project.
Could you please ensure that you connect with
Thanks,
Morgan
Morgan
Today when we deployed TologixDBIndexer and WebAPI on server and we found that for FTS module indexing are created smaller size rather than our local environment.
As example, On server (10.68.138.10), in FTS Para folder the index_r_1 size is 61,000 KB and in our local it was created 4,63,030 KB.
Is there an issue ? as we have deployed same thing which we used in our local environment.
The Subject Navigator and Dispute Document indexes created successfully and working as per our criteria but in FTS module we are not able to get Paragraph or Page number.
Please look into this as soon as possible as we need to release this module for UAT by tomorrow.
Following up on some of the bugs I logged earlier, I'm still not sure if the search is working as expected.
#22162 - Synonyms > Not Working
"PdfHighlighterUrl": "http://10.68.138.10/highlighter/",
But I get 404 error page when I open http://10.68.138.10/highlighter/
As you're running this directly on the server, there's no need to go through IIS. Try using:
"PdfHighlighterUrl": "http://10.68.138.10:8998",
Hope this helps. Let me know.
#22162 - I already commented this one on Mar 6 and asked you to check the thesaurus config on the legacy server so that we figure out which thesaurus you're using. I didn't get any feedback on this.
Or, if you have source code of the legacy search application that can be helpful as well in figuring out thesaurus options currently used.
#22160 - As before, looks good on my end. Testing this in the Subjects Navigator index, "like" gives 273 results with stemming off, and 392 results with stemming enabled. (My screenshot sent on Mar 6 also shows this working.)
Maybe the web application is not sending the stemming checkbox value to the search service properly.
#22141 - How many results are you pulling in the result page in the legacy application? The problem is the new application is trying to get all results at once. When I test it, I get more than 32000 results and the response is 70MB. And that's without stemming and fuzzy that you have enabled in your search.
#22155 - I'm not sure I understand this one. From the comment in your screenshot, I'd say you expect only those results containing all keywords in the branch name only. Is that correct? If true, that's a new requirement for me. We index all meta fields in order to support filtering but I don't think that dtSearch supports search requests limited to a single field (e.g. branch name). I'll check this.
Could you please let us know which PDF Highliter url we will use on server from following two url?
http://10.68.138.10:8998/highlighter
or
http://10.68.138.10:8998
Could you please provide sample of search request for steamming word in subject navigator module? so we can check our service request by tomorrow.
I tried the one you're using (http://10.68.138.10:8998/highlighter) and that doesn't work properly and create much smaller index.
As of the stemming, the syntax is the same since introduced and you already have it many examples posted:
{ "searchRequest": "education award", "SearchType": "AllWords", "Stemming": true, "Synonyms": false, ...dtSearch does not support search requests limited to a single field using a search query and boolean/anywords/allwords options. We could workaround this by transforming the query to fielded search (https://support.dtsearch.com/webhelp/dtsearch/field_searching.htm) but I need a confirmation from your end as it's an extra development.
and extracted paragraphs don't contain the other "copy" of the paragraph. Here are the all paragraphs extracted:
Please notice that in your video paragraphs are not exactly the same. Some have an extra dot, some have parenthesis, etc. I suspect that you didn't delete the old index after I provided updated indexer so paragraphs extracted by the new indexer were added on a top of what was already in the index (and the old paragraphs were not overwritten because paragraph ID format was modified).
Delete all three FTS indexes, index it and I think it will be fine.
Re #22155, I don't think there is a problem running the search request across different fields; however, as
Why isn't this branch included within the All Words search for "education award" within the new ISLG when it fits the parameters of the search? http://staging.investorstatelawguide.com/SubjectNavigator/Index?branchid=6UQPkRs5-Qc%3D
Thanks,
Morgan
When I run this search in my environment, using staging data, I think I see the same document:
Morgan
As per your provided screen shot both scrrenshot have same result. what is the different? both screenshot display same result with eduction award
The results on staging.islg currently don't include the result highlighted by
Thanks,
Morgan
As per above comment Morgan we are passing following search request to for Subject Navigator but we are not able to get that branch which Morgan saw us.
Could you please look into this search request and suggest us. We are able to take call and resolve this issue.
{
"ErrorMessage":null,
"WasError":false,
"SearchRequest":"education award",
"PageNum":0,
"PageSize":0,
"Fuzzy":true,
"Fuzziness":1,
"Stemming":true,
"WordNetSynonyms":false,
"Synonyms":false,
"PhonicSearching":false,
"SearchType":1,
"SortField":null,
"SortOrder":null,
"SearchFlags":0,
"Custom":null,
"NoFrames":false,
"EnableDateSearch":false,
"StartDate":null,
"EndDate":null,
"FileConditions":null,
"BooleanConditions":null,
"QueryStatement":null,
"FilterStatement":null,
"Facets":null,
"IxId":null,
"IndexIds":null,
"IncludeSynopsis":true,
"Near":14,
"ExcludeEnabled":false,
"ExcludeTerm":null,
"TreePath":null,
"paraId":null,
"FieldFilterName":null,
"FieldFilterValues":null
}
Maybe there's something wrong with your index? My index is 260MB in size.
I'll give you a call in a few minutes.
Thanks for taken Call.
#22162 - I already commented this one on Mar 6 and asked you to check the thesaurus config on the legacy server so that we figure out which thesaurus you're using. I didn't get any feedback on this.
Or, if you have source code of the legacy search application that can be helpful as well in figuring out thesaurus options currently used. - It is resolved on staging.islg with help of
#22160 - As before, looks good on my end. Testing this in the Subjects Navigator index, "like" gives 273 results with stemming off, and 392 results with stemming enabled. (My screenshot sent on Mar 6 also shows this working.)
Maybe the web application is not sending the stemming checkbox value to the search service properly. - It is resolved on staging.islg with help of
#22141 - How many results are you pulling in the result page in the legacy application? The problem is the new application is trying to get all results at once. When I test it, I get more than 32000 results and the response is 70MB. And that's without stemming and fuzzy that you have enabled in your search. -
#22155 - I'm not sure I understand this one. From the comment in your screenshot, I'd say you expect only those results containing all keywords in the branch name only. Is that correct? If true, that's a new requirement for me. We index all meta fields in order to support filtering but I don't think that dtSearch supports search requests limited to a single field (e.g. branch name). I'll check this. It is resolved on staging.islg with help of
When we get the Response from Subject Naviagtor the brnachname will cut. We will not getting whole brnach name. We checked in our SQL View and we provided you full Branch name.
For Example, when you search with abuse of process you get BranchId 19867 in result where branch name you get as follows :
Philip Morris v. Australia Award on Jurisdiction and Admissibility considers that the initiation of a treaty-based investor-State arbitration constitutes an abuse of rights (or an abuse of process, the rights abused being procedural in nature) when an investor has changed its corporate structure to gain the protection of an investment treaty at a point in time when a specific dispute was foreseeable; a dispute is foreseeable when there is a reasonable prospect that a measure which may give rise to
But Actual Branch Text is :
Philip Morris v. Australia Award on Jurisdiction and Admissibility considers that the initiation of a treaty-based investor-State arbitration constitutes an abuse of rights (or an abuse of process, the rights abused being procedural in nature) when an investor has changed its corporate structure to gain the protection of an investment treaty at a point in time when a specific dispute was foreseeable; a dispute is foreseeable when there is a reasonable prospect that a measure which may give rise to a treaty claim will materialize
The last line "a treaty claim will materialize" is cuted from your response.
Please check and confirm.
WordNet is already supported by the search controller. However, instead of using "Synonyms" in the search model, you need to use "WordNetSynonyms", as in:
{ "SearchRequest": "like", "SearchType": "AllWords", "WordNetSynonyms": true, "PageSize": 200, "Fuzzy": false, "Fuzziness": 1, ...In which folder you copied "wordnet" folder on server ? as we can't see an folder on indexing folder.
Also, We need to change Synonyms name in your WebAPI Project's SearchModel ?
WordNet folder location:
Again, you don't need to change anything here as we already setup the config folder during our call.
The bug no. 22155 to wrong result for education award issue is resolved on staging.islg with help of
I already created Subjects Nav index with this and copied for you to E:\DevContegraISLGRebuildStagingIndexes\subject-nav-stage
Could you please explain to
... the problem with this query is that it gives way too many results - there are 231956 records returned for initially found 5783 matches! The response is over 200MB!
I think it doesn't make sense to return all results. With a a few different searches, someone could scrape your whole database. I think we should put a hard limit of say 1000 (or 10K) items and never return more than that. Better give an error to user to refine the query.
Thanks for your help/responses to the above, since these bugs have moved back to UAT I'll follow up on the outstanding issue I'm seeing.
#22162 - Synonyms > Not Working
Thanks,
Morgan
... the problem with this query is that it gives way too many results - there are 231956 records returned for initially found 5783 matches! The response is over 200MB!
I think it doesn't make sense to return all results. With a a few different searches, someone could scrape your whole database. I think we should put a hard limit of say 1000 (or 10K) items and never return more than that. Better give an error to user to refine the query.
I don't want to impose limits on the results, because this will create problems for users if they need to cast broad net searches. Also, this still doesn't explain why the search is so slow for the following example:
This type of search should not be taking this long.
Thanks,
Morgan
Following up on
This would include,
Could you post the time it takes to generate each result. My expectation is that none of these searches should take more than 4-5 seconds to generate a result.
Thanks,
Morgan
Thanks,
Morgan
Following is the payload request of Dispute Document module when user filter with only date range.
{"searchRequest":"","FilterStatement":{"type":"boolean","Operator":"and","clauses":[{"type":"boolean","Operator":"or","clauses":[{"type":"range","field":"Field_61","from":"20191231","to":"20210330"},{"type":"range","field":"Field_110","from":"20191231","to":"20210330"}]}]},"SortField":"FullCitation","SortOrder":"asc"}
The search itself reports 670 results and is executes in about 150ms! However, as requested by the dev team, we don't return these results but do one more step: we collect all different disputeIds that appear in these results and return all results where this matched ContentTypeDataMasterId. This is blown to 96859 results to be collected and returned. I really don't know the data model of the application but something is very fishy to me here - I don't think that almost 100K nodes is used to represent 670 results in page. At the end, this takes tens of seconds.
I think too much pressure and expectation is put here on the search service, to do the job it's not really ideal for. The real search is the one that found data in less than 200ms. I think the second step of the search is more appropriate job for the database.
Can we take call to discuss above things as we are not clear what you want to make search faster ?
Will take call and discuss and finalized the solution.
We have solution that if you provide me first search result from only Dispute Index then it is enough for us because we need only DisputeId (FirstId) and node name only.
After getting result, when user click on node name then will remains our second call Dispute-doc method as it is because that call doesn't take as much time.
Let me know so we can take quick call and conclude that.
Thanks for call.
As discussed, We need first search result from Dispute Indexes where only 10000 rows you can get.
As finalized, We provided you 2 columns in FE_MetafieldwithValueDynamic query .
DisputeCitation column is nodename of dispute and In IsDisputeCitation column we set 1 where DisputeCitation available and for other rows we set 0 where DisputeCitation is null.
You need to return only those DisputeId and DisputeCitation column Where IsDisputeCitation column set 1
Our Second Search call will be reamin as it is where we pass all DisputeId collection and get result.
We have updated FE_MetafieldwithValueDynamic query on server for databse ISLGRebuildStaging.
Let me know if you have any query.
We have updated the FE_MetafieldwithValueDynamic query on server. Now you can get RowId as identifier.
You can get the update form 2021-03-18 folder.
If you passed following date filter then you can get result near about 208 dispute count ?
Yesterday I tested the FTS feature and there were some bugs that arose to bring up here:
#22515 - Stemming > Doesn't Work
Expected:Will be highlighted if stemming is enabled
#22517 – Synonym > Doesn't Work
Expected:Will be highlighted
#22515 - Stemming > Doesn't Work
I think you might be mixing "physical" PDF page with the number shown on the page. Search results show the physical page numbers. On such page 493 there's "likeliness" I think but shows also page number 495 in the text. That means that the page 493 you were looking at is 491 physical. I opened it and it shows "likely" highlighted:
I've found and fixed one issue related to this. It should be fine in the search service update I provided.
Re #22515 - Stemming > Doesn't Work
I understood from your response that I should be looking at the page excerpt for 491 instead of 493 where the word "likely" appears in the pdf physical numbers which makes sense. When I look at the results card in FTS I'm not seeing the page excerpt for 491. I am only seeing a result on 490 for "likeness". Based on your screenshot this seems like I should have an excerpt for page 491?
Please let us know once you will resolve above bug no. 22515 & 22517 for FTS module and provide update to us.
I don't see any issue with these:
#22517 - the example below shows both "bias" and "prejudice" highlighted when searching for "prejudice"
Following is the payload request of above 2 bugs.
Bug no. 22515
{"searchRequest":"like","SearchType":"Boolean","Stemming":true,"WordNetSynonyms":false,"Fuzzy":false,"Fuzziness":"1","FilterStatement":{"type":"boolean","Operator":"and","clauses":[{"type":"match","field":"DocumentContentTypeId","values":["37","13","12"]}]},"PageNum":0,"PageSize":20}
Bug No : 22517
{"searchRequest":"bias","SearchType":"Boolean","Stemming":false,"WordNetSynonyms":true,"Fuzzy":false,"Fuzziness":"1","FilterStatement":{"type":"boolean","Operator":"and","clauses":[{"type":"match","field":"DocumentContentTypeId","values":["37","13","12"]}]},"PageNum":0,"PageSize":20}
Could you please check and let us know.
Following is my search request for highlight para.
Bug no. : 22515
{"searchRequest":"like","SearchType":"3","Stemming":"true","Synonyms":"false","Fuzzy":"false","Fuzziness":"1","paraId":"ECB6D91BB3177C32B5E0B70F4E5AC7C1#MTQ="}
Bug no. : 22517
{"searchRequest":"bias","SearchType":"3","Stemming":"false","Synonyms":"true","Fuzzy":"false","Fuzziness":"1","paraId":"80D3583E561ED87838529EE310CB553A#MTM="}
#22515 - in the sample you gave me, only "like" appears. please provide another example showing that stemming is not working.
#22517 - you're sending Synonyms instead of WordNetSynonyms. It works when you put WordNetSynonyms=true.
Could you please look into this the last video of Morgan in following basecamp thread.
Dispute Documents search field does not produce any results - TOLOGIX - ISLG App Rebuild
And Provide your feedback for Dispute Document Search.
Following is my Payload request as per Morgan's video data.
{"searchRequest":"","FilterStatement":{"type":"boolean","Operator":"and","clauses":[{"type":"match","field":"Field_62","values":["234","235"]},{"type":"match","field":"Field_DocumentTypeId","values":["1064","1067"]},{"type":"match","field":"Field_109","values":["487"]},{"type":"boolean","Operator":"or","clauses":[{"type":"range","field":"Field_61","from":"20200229","to":"20210320"},{"type":"range","field":"Field_110","from":"20200229","to":"20210320"}]}]},"SortField":"FullCitation","SortOrder":"asc"}
In FTS module Contegra Team Raised one issue.
Bug No. : 22502
Steps to reproduce:
Result:No results show, even though there are results for this arbitration rule type
Expected:Will see results
Following is my Search request for above scenarion.
{"searchRequest":"tribunal","SearchType":"Boolean","Stemming":false,"WordNetSynonyms":false,"Fuzzy":false,"Fuzziness":"1","FilterStatement":{"type":"boolean","Operator":"and","clauses":[{"type":"match","field":"Field_27","values":["368"]},{"type":"match","field":"Field_34","values":["44"]},{"type":"match","field":"DocumentContentTypeId","values":["13","12"]}]},"PageNum":0,"PageSize":20}
I assume we need to set or condtiton between metafield.
Please suggest.
Does filtering work if you use only filters on one of these fields?
Are you sure these fields are included in the FE_MetafieldwithValueDynamic?
I'm not sure if you need to use OR, from the UI it doesn't look like OR would be the expected behavior.
Make sure you're using the right field name.
As per Morgan's Video, If we applied following filter with other filters then it does not work.
Respondent State : [Field_109] : value 487
Applicable Instrument(s) : [Field_69] : value 11333
Applicable Arbitration Rules : [Field_70] : value 11086
Above all data available in FE_MetafieldwithValueDynamic result. But Morgan said that above field with apply with other filter combination then doesn't work but if apply only those above filed filter then it works.
Let me know if you need to take call. I am available.
For FTS bug, If we apply filter indvisual with tribunal word. For example,
Only Search [Field_27] with tribunal word then it produce result. 22 Result count
Only Search [Field_34] with tribunal word then it produce result. 16 Result count.
But, Naomi expectation is if we applied both filters then we should get 38 result count.
so what we need to change in JSON search query.
Are you asking for FTS bug ?
IF you available then can we take call to resolve both Dispute Document and FTS bug ?
For FTS Bug Following is just example of ROWId,
[Field_27] available in RowId (1,2,3)
[Field_34] available in RowId (535,558)
We need all these rowid in result.
I can also confirm this differently... When I a run search with Field_34 only, I get 16 results as you said. I'm attaching result JSON and you can see there's no Field_27 in results at all. So, it was not in the data that was indexed.
Yaa I know that both [Field_34] and [Filed_27] data are not availble with Combined AND.
so that's why I asking to how to set OR condition in Payload request so will get result.
Following is my current JSON request
{"searchRequest":"tribunal","SearchType":"Boolean","Stemming":false,"WordNetSynonyms":false,"Fuzzy":false,"Fuzziness":"1","FilterStatement":{"type":"boolean","Operator":"and","clauses":[{"type":"match","field":"Field_27","values":["368"]},{"type":"match","field":"Field_34","values":["44"]},{"type":"match","field":"DocumentContentTypeId","values":["13","12"]}]},"PageNum":0,"PageSize":20}
so, is there any way to set OR condition in above Payload request so we can get all 38 result count.
Just nest the OR clause within the AND:
{ "searchRequest": "tribunal", "SearchType": "Boolean", "Stemming": false, "WordNetSynonyms": false, "Fuzzy": false, "Fuzziness": "1", "FilterStatement": { "type": "boolean", "Operator": "and", "clauses": [ { "type": "boolean", "Operator": "or", "clauses": [ { "type": "match", "field": "Field_27", "values": [ "368" ] }, { "type": "match", "field": "Field_34", "values": [ "44" ] } ] }, { "type": "match", "field": "DocumentContentTypeId", "values": [ "13", "12" ] } ] }, "PageNum": 0, "PageSize": 20 }Will try this and will update you.
Could you provide feedback of Dispute Document Result ?
-------
As per Morgan's Video, If we applied following filter with other filters then it does not work.
Respondent State : [Field_109] : value 487
Applicable Instrument(s) : [Field_69] : value 11333
Applicable Arbitration Rules : [Field_70] : value 11086
Above all data available in FE_MetafieldwithValueDynamic result. But Morgan said that above field with apply with other filter combination then doesn't work but if apply only those above filed filter then it works.
Let me know if you need to take call. I am available.
For Dispute Document module, Suppose I have search with Russian Language and Lithuania Respondent State then I should get result.
We don't need to apply OR operator the operator should be and.
Following is the Payload Search Request for Russian Language and Lithuania Respondent State:
Field_109 : Respondent State (value : 487)
Field_62 : Language (value : 248)
{"searchRequest":"","FilterStatement":{"type":"boolean","Operator":"and","clauses":[{"type":"match","field":"Field_62","values":["248"]},{"type":"match","field":"Field_109","values":["487"]}]},"SortField":"FullCitation","SortOrder":"asc"}
In FE_MetafieldwithValueDynamic Query you get Field_109 value in RowId : 1181 and Field_60 value in RowId : 8829
Both Rows First Value DisputeId is 13585. so we expect to return DisputeId RowId : 1181 from First Seacrh.
And From Second Search you should pass the DisputId Collection [13585,21375] and provide result.
Please let us know.
Exactly that's why it doesn't work with AND. The fields do not belong to the same indexed row. What do you want me to do here? Any workaround that I could apply will result in a significant performance degradation. The fix should be in the database view.
As the first search is pulling data for DisputeIds, I'd say you need to think about changing FE_MetafieldwithValueDynamic in a way that the DisputeId is unique in the results (so that it can be used as an identifier instead of RowId) and you include all fields related to it. That way both Field_109 and Field_60 should appear in the same row.
If that's too complicated on your end, maybe we could make it in two steps:
I hope this makes sense.
P.S. I will be out of the office the whole morning tomorrow and can get back to you only during my afternoon.
We can not make DisputeId as unique identifier as multiple documents have associated with single dispute. so it is possible that you can get multiple dispute Id.
For the second option for two step process lets discuss today as we have only this week to complete this task.
Let us know once you are available to take call and finalized.
We are available between 10:00 AM to 6:00 PM IST.
From everything discussed these months, I hope you understand how indexing and search work. Please provide a view (or views) that will allow us to index all metadata that you need associated with a single row in the first results table, as that will be search result item.
It is not possible to set all metafield in single Row as search is looking for different rows.
We should go with two process as per your suggestion but we need to confirm and discuss which data will provide you in view/stored procedure.
Let us know when you are available to take call as IST time ?
As per discussion in today's call for Dispute Document Filter.
The Initial Search Query FE_MetafieldwithValueDynamic will remain as it is. We haven't changed anything. It means you provided intial search request result from this Indexing.
As per discussion, we made new query FE_SelectDocumentViewForContegraSearch on server (dtabse : ISLGRebuildStaging) which contains all Dispute Document Metafield and DisputeID.
You can use FE_SelectDocumentViewForContegraSearch as second step query for Initial Search .
Please let me know if you have any query.
Also, I need a separate procedure that returns only disputes! The FE_MetafieldwithValueDynamic can remain as is for the "dispute"details" call. However, for indexing, I need one procedure that returns only disputes with their metadata. Using disputeId from each row, I'd pass it to FE_SelectDocumentViewForContegraSearch to get dispute documents for each dispute.
You can find the DisputeId column (3rd Column) in this stored procedure FE_SelectDocumentViewForContegraSearch
The new separate Procedure for only Dispute MetaField is : FE_SelectDisputeViewForContegraSearch
But, Please make sure you will provide result form this query
FE_MetafieldwithValueDynamic as all sorting field and DisputeId Contains filed are availbele in this query.
Another option is that we read data from FE_SelectDisputeViewForContegraSearch once, cache it, and then only filter data when needed. This will be much faster but will take more RAM. I'm not sure how many rows are in this. If you think that the complete dta set can be kept in memory, we can proceed this way.
Next, the new FE_SelectDisputeViewForContegraSearch doesn't even have column disputeId. Please fix or let me know how to use it.
We set DisputeId column in FE_SelectDisputeViewForContegraSearch.
Both query has full dataset.
The indexer config for disputes should be updated - IndexStoredProcTables was modified, and DisputeDocsMetadataProc is the newproperty:
Let me know how it worked for you.
The above changes we made and create new indexes but it doesnt work.
I don't understand why you removed FE_MetafieldwithValueDynamic from indexer because we need initial search result from this indexing.
The Following 2 query we provided you to for jus filter the data and pass the DisputeID in this indexing FE_MetafieldwithValueDynamic so we get all desired result.
FE_SelectDisputeViewForContegraSearch
FE_SelectDocumentViewForContegraSearch
Because all our model columns and sorting fields columns for initital search are available in this query FE_MetafieldwithValueDynamic.
Please let me know once you are available to so we can take call and discuss.
If you prefer the old result format, we can keep index from the old procedure and use it for results after collecting disputed from this search.
If I Search with Canada then it gives 0 result. Previously, It given 34 dispute record.
Now, Any search gives me null result.
Following is my search request.
{"searchRequest":"","FilterStatement":{"type":"boolean","Operator":"and","clauses":[{"type":"match","field":"Field_109","values":["403"]}]},"SortField":"FullCitation","SortOrder":"asc"}
But, still I don't get how you will provide result because all columns which i need in my model are availbel in this query FE_MetafieldwithValueDynamic and now you are not indexing this query result.
Please will take call and finalize.
Please check the things on your end. For the query you sent me, I'm getting 34 results. Response attached.
This is my modified Dispute Index config file. Please check is it OK ?
Can we take quick call as i think there is some confusion for indexing.json file. ?
Could you provide your indexing json file for both Dispute & Dispute Document. so we can check again.
The First Initial Search is working now. But when we pass the second method disputes-details it doesn't give result of Dispute Document Data.
Here my search request for disputes-details service method.
{"searchRequest":"","FilterStatement":{"type":"boolean","Operator":"and","clauses":[{"type":"match","field":"Field_62","values":["234"]}]},"SortField":"FullCitation","SortOrder":"asc","FieldFilterName":"disputeid","FieldFilterValues":[12877]}
As You remembered that we passed disputedid collection to our second query SelectDisputeContegraSearch to get dispute detail & document result.
Please let us know.
Could we connect for above query or you are looking into it ?
Regarding your previous search example, are you sure you provided a good disputeId? For disputeId 12877 I don't see data in the database.
When I copied SearchController and run the API project it gives me foolowing error.
Severity Code Description Project File Line Suppression State
Error CS1061 'TologixSettings' does not contain a definition for 'DisputeComboIndex' and no accessible extension method 'DisputeComboIndex' accepting a first argument of type 'TologixSettings' could be found (are you missing a using directive or an assembly reference?) TologicWebSearch D:\Harsh\Harsh\Contegra Projects\TologixWebSearch\Controllers\SearchController.cs 503 Active
We have also update app.config file and set DisputeCombo Index unnder TologixSetting.
Is ther anything missed fron your side ?
Following line in SearchController display red line.
Still we don't get Dispute Document Detail Data from SelectDisputeContegraSearch. As we discussed, we need to pass all dispute and Disputedocument Id which we provided in this query FE_MetafieldwithValueDynamic (DisputeId column).
Currently we can get only dispute detail when we call our second method disputes-details.
As discussed, I have update Column DisputeId in following stored procedure on our server for Database ISLGRebuildStaging.
FE_SelectDocumentViewForContegraSearch
As an example, For First Search you pass following JSON Request :
First Request :
{"searchRequest":"A.M.F. Aircraftleasing Meier & Fischer GmbH & Co. KG v. Czech Republic, PCA Case No. 2017-15, Respondent Press Release, 5 December 2016 [Czech]","FilterStatement":{"type":"boolean","Operator":"and","clauses":[{"type":"match","field":"Field_62","values":["15541"]}]},"SortField":"FullCitation","SortOrder":"asc"}
When Click on that second method to fetch all Dispute Detail and Document detain in second request :
Second Request :
{"searchRequest":"A.M.F. Aircraftleasing Meier & Fischer GmbH & Co. KG v. Czech Republic, PCA Case No. 2017-15, Respondent Press Release, 5 December 2016 [Czech]","FilterStatement":{"type":"boolean","Operator":"and","clauses":[{"type":"match","field":"Field_62","values":["15541"]}]},"SortField":"FullCitation","SortOrder":"asc","FieldFilterName":"disputeid","FieldFilterValues":[13406]}
I need result for both ContentTypeDataMasterId Row (13406 and 20758) from SelectDisputeContegraSearch .
13406 is our Dispute Detail
20758 is Dispute DocumentDetail
Let me know if this returns expected results.
The Search is working but one issue we found that if any dispute haven't any document then it doesn't fetch dispute data form second result.
Can we take quick call ?
The First Request is working now. The Second request also working but for some dispute when I pass second request it gives me bad request error and doesn't provide result.
This is my first request and it gives me 186 result and it is fine.
{"searchRequest":"","FilterStatement":{"type":"boolean","Operator":"and","clauses":[{"type":"boolean","Operator":"or","clauses":[{"type":"range","field":"Field_61","from":"20200229","to":"20210324"},{"type":"range","field":"Field_110","from":"20200229","to":"20210324"}]}]},"SortField":"FullCitation","SortOrder":"asc","FieldFilterName":"disputeid","FieldFilterValues":[22997]}
Second Request. When I click on dispute the second request pass then it doesn't give result it throws bad request error.
{"searchRequest":"","FilterStatement":{"type":"boolean","Operator":"and","clauses":[{"type":"boolean","Operator":"or","clauses":[{"type":"range","field":"Field_61","from":"20200229","to":"20210324"},{"type":"range","field":"Field_110","from":"20200229","to":"20210324"}]}]},"SortField":"FullCitation","SortOrder":"asc","FieldFilterName":"disputeid","FieldFilterValues":[22997]}
The Following second request is working.
{"searchRequest":"","FilterStatement":{"type":"boolean","Operator":"and","clauses":[{"type":"boolean","Operator":"or","clauses":[{"type":"range","field":"Field_61","from":"20200229","to":"20210320"},{"type":"range","field":"Field_110","from":"20200229","to":"20210320"}]}]},"SortField":"FullCitation","SortOrder":"asc","FieldFilterName":"disputeid","FieldFilterValues":[12400]}
That was my assumption that which dispute have no documnet then it creatr issue.
But, I will check tomorrow.
Could you please provide folder name where you put updated indexer?
With this new indexer the Dispute Document Indexing is not created. It is generating the error.
Here, I have attached the Indexing log file and Indexer config file. The First Dispute Indexing is generated but the Second DisputeDocumentIndexing is not generated. Please check.
Hope your doing good.
Bug no. 22939
Steps to reproduce:
Result:No results found
Expected:Will see the two published dispute documents where the dispute has this instrument
Following the Payload request for above criteria and we found that we are getting 2 Rows from FE_MetafieldwithValueDynamicFTS query.
{
"searchRequest":"t",
"SearchType":"Boolean",
"Stemming":true,
"WordNetSynonyms":false,
"Fuzzy":true,
"Fuzziness":"1",
"FilterStatement":{
"type":"boolean",
"Operator":"and",
"clauses":[
{
"type":"boolean",
"Operator":"or",
"clauses":[
{
"type":"boolean",
"Operator":"and",
"clauses":[
{
"type":"match",
"field":"DocumentContentTypeId",
"values":[
"13"
]
}
]
},
{
"type":"boolean",
"Operator":"and",
"clauses":[
{
"type":"match",
"field":"DocumentContentTypeId",
"values":[
"37"
]
}
]
},
{
"type":"boolean",
"Operator":"and",
"clauses":[
{
"type":"match",
"field":"DocumentContentTypeId",
"values":[
"12"
]
}
]
}
]
},
{
"type":"boolean",
"Operator":"or",
"clauses":[
{
"type":"match",
"field":"Field_69",
"values":[
"12401"
]
}
]
}
]
},
"PageNum":0,
"PageSize":20
}
Could you please check and confirm.
Also, One more bug raised by
Bug No. 22938
Steps to reproduce:
Result:There are no documents found
Expected:Will see documents that are underneath this instrument in the AC
Following the Payload request for above criteria and we found that we are getting 2 Rows from FE_MetafieldwithValueDynamicFTS query.
{
"searchRequest":"t",
"SearchType":"Boolean",
"Stemming":true,
"WordNetSynonyms":false,
"Fuzzy":true,
"Fuzziness":"1",
"FilterStatement":{
"type":"boolean",
"Operator":"and",
"clauses":[
{
"type":"boolean",
"Operator":"or",
"clauses":[
{
"type":"boolean",
"Operator":"and",
"clauses":[
{
"type":"match",
"field":"DocumentContentTypeId",
"values":[
"13"
]
}
]
},
{
"type":"boolean",
"Operator":"and",
"clauses":[
{
"type":"match",
"field":"DocumentContentTypeId",
"values":[
"37"
]
}
]
},
{
"type":"boolean",
"Operator":"and",
"clauses":[
{
"type":"match",
"field":"DocumentContentTypeId",
"values":[
"12"
]
}
]
}
]
},
{
"type":"boolean",
"Operator":"or",
"clauses":[
{
"type":"match",
"field":"Field_ACReference",
"values":[
"22968"
]
},
{
"type":"match",
"field":"Field_ACProvision",
"values":[
"22968_Generally"
]
}
]
}
]
},
"PageNum":0,
"PageSize":20
}
Maybe this is a mistake on my end, but I would expect to see around the same type of matches I would see in the current version, where any documents that contain the letter "t" would appear if they apply to the filter criteria. Wouldn't the search match what is shown in the document text?
However, further to the video below, the results for the specific bug referenced above probably have to do with the fact that the 2 documents that are generated by the filter do not have HTML documents available in staging.islg.
At the same, as described in the video, I noticed that we're displaying paragraph references vertically and horizontally. Currently, this is very inconsistent and I generally dislike the vertical alignment. Please ensure this is resolved in a way that is consistent.
Thanks,
Morgan
Please look into the issue of vertical alignment UI. We found that if Page or Paragraph count is less than 12 then it shows vertical.
The vertical alignment Page/Paragraph UI issue we resolved and uploaded on staging.islg.
I think above both issue due to missing data & PDF, HTML file on staging.islg. Today we have generated the indexing on app.islg and it produce the result with "t" letter.
Also, let's come up with a better UI solution for the alignment of the paragraphs.
Thanks,
Morgan
We have set up indexing for go live on 6th April but when we are trying to generate Indexing for FTS module following error is logged and indexes are not created.
Following, I have attached Indexing Log and indexer json file for live databse. Also, Please note that we are using following Live database on Server for generate indexing.
Databse Name : ISLGRebuildProduction
Please check and provide feedback.
Please ignore above comment. There was issue from our side in query and we are looking into it.
I noticed the DD search is currently not working on app.islg. Is this related to the indexing issues above?
Thanks,
Morgan
There was minor issue to generate index on app.islg. We have resolved it and all 3 modules search is working fine on app.islg.