New Search Engine
Hello all,
I would like to raise an issue to discuss at the next status meeting (scheduled for Thursday, June 6th). Currently, we rely on dtSearch as our search engine for performing keyword searches across PDF documents and SQL databases. However, with the rebuilt ISLG and ILG, we will be eliminating the need to search through PDF documents (which I understand is the primary reason we rely on dtSearch), and I want explore the opportunity of using other search engines that provide more sophisticated ways of searching through SQL databases. In particular, I would like us develop a search tools that allows us to manipulate the search algorithm, and customise it to influence search result rankings.
For example, currently with the Full Text Search on ISLG, we are limited to ranking results by the number of hits for a searched keyword. However, I would like to introduce other factors that will determine result rankings (e.g., not only keyword frequency, but also density, prominence and proximity within the text of a document). In addition, I would like to give us the ability to influence relevancy based on other related data (e.g., the frequency a document is referred to by other documents through the Citators).
Devaang
and
Jitesh
, further to our previous discussions with Machine Learning R&D team, the R&D is starting to produce tangible results that may be used for these types of searches. Could you please add a comment below updating everyone on the progress, and how soon we'll be able to integrate these searches for the rebuilt ISLG and ILG.
In addition, (assuming the ML team is not able to deliver a satisfactory search tool). I would like to start exploring other search engines (e.g., Eslatic: https://www.elastic.co/), and flesh out the pros/cons of each option.
Mitch
,
Kevin
and
Ryan
, it would be great to get Industrial's input on these issues.
Thanks,
Morgan
I would like to raise an issue to discuss at the next status meeting (scheduled for Thursday, June 6th). Currently, we rely on dtSearch as our search engine for performing keyword searches across PDF documents and SQL databases. However, with the rebuilt ISLG and ILG, we will be eliminating the need to search through PDF documents (which I understand is the primary reason we rely on dtSearch), and I want explore the opportunity of using other search engines that provide more sophisticated ways of searching through SQL databases. In particular, I would like us develop a search tools that allows us to manipulate the search algorithm, and customise it to influence search result rankings.
For example, currently with the Full Text Search on ISLG, we are limited to ranking results by the number of hits for a searched keyword. However, I would like to introduce other factors that will determine result rankings (e.g., not only keyword frequency, but also density, prominence and proximity within the text of a document). In addition, I would like to give us the ability to influence relevancy based on other related data (e.g., the frequency a document is referred to by other documents through the Citators).
In addition, (assuming the ML team is not able to deliver a satisfactory search tool). I would like to start exploring other search engines (e.g., Eslatic: https://www.elastic.co/), and flesh out the pros/cons of each option.
Thanks,
Morgan
Unfortunately, I'm out of the office on Thursday, so I will miss the meeting, but Elastic is the system we typically deploy in Drupal projects when more complex search is required, but we have also looked into Apache Solr in the past.
We'd need to do more research in the context of ISLG and the requirements, but both I think would be good options to consider.
Thanks!
Ryan
Thanks,
Morgan
Thanks,
Morgan
I know you are eagerly waiting for our update on ML Project. Actually Dhrumil and Hiran has more idea about that but I have asked them progress. They have finished the ML things and that output they have consumed manually in elastic search so They told me they are now atomizing the things. If you want to know further then please contact them in our ML Project Card or add them for more clarification so that they can provide you better information about that. And In Thursday call they will explain you what they did and what they have to do to finish this first POC.
Thanks,
Jitesh
Sounds good. I've added
Thanks,
Morgan
Apologies for stepping-in before Dhrumil or Hiran could respond. But, Dhrumil has prepared a presentation which we will all review tomorrow on GTM screen share. The PoC is complete with some only cleaning-up items remaining. We will also see the Demo prototype in action after the presentation.
It may not make any sense for a summary right now as the context may be lost or misinterpreted. I am sure, we all will have greater clarity after the presentation and the demo tomorrow.
Look forward to our interaction tomorrow.
Best regards,
Devaang Bhatt | AVP, International Business
Microsoft Specialist, MCP
I'd like to ensure everyone is up to speed in advance of the presentation, so that we can ask informed questions.
Thanks,
Morgan
Here’s the PPTX per your request. Please review and we can together have a comprehensive conversation tomorrow.
Morgan
As per our last call discussion I have set up the environment for your testing in our Server with taking 30 sample PDF for our AzureML base Advance Search. Right now This search is based on frequency we can take it to the next level as you want in the upcoming phase.
Here is the link to the Website. You can give this link to your SME to test this.
http://tologix.demo.wwhnetwork.net/
If you want a reference of the 30 PDF which we have consumed to produce results here I have attached those PDF. Let me know in case of any concern and provide us your valuable feedback so that our team can move further.
Thanks,
Morgan
These keywords will help inform their assessment on whether the demo is providing relevant and accurate results. It may also help us understand why connector words like "and" and "or" are currently omitted from the results.
Thanks,
Morgan
Hi
I have attached the "CSV" file that will give you the keyword & Phrases list those we have consumed for the auto-suggest. Regarding your concern about the "and" and "or" logic. Right now Auto-suggester works like google search first it will show you the best match values of given words so if you have written more than one word first it will going to found the phrases which has covered all the words those you have entered and if result not found then it will check with different word combination and eliminating more than one word and gives you the best match result.
And Yes for your client perspective if it's attracting your client more to give the options like "and" and "or" operator than we can also give that option like Google is giving in its advanced search.
Thank you for the keyword & phrase list. We'll follow-up with a more detailed report in the days ahead, but it is curious that no keyword or phrases in the list includes phrases with connector words like "fair and equitable treatment", "tantamount to expropriation" or "abuse of process". This is important, because these phrases are crucial concepts that appear frequently throughout these sample documents, and would be expected to appear from a client perspective. To give you some context, here are suggested resulted in Google for the following:
My expectation is the auto-suggest search above would produce similar suggested phrases that include the connector words.
Thanks,
Morgan
Yes, you are right, Right now connector words are missing but we can add it. Just Do one thing, please note down all the things and give me the list then I will find the fact and will let you know about all.
Morgan
Below is a report summarizing our findings of our testing of the auto-suggest prototype. Please let me know if you have any questions. Also, we're having a team call with
Thanks,
Morgan
Thanks for giving us your valuable feedback, We will study this and let you know for each and everything those you have mentioned in a report.
Here I have attached our Progress report. Right now we are facing the problem with 'and, or, of ..etc' word. if we are including these terms then phrases that are generated are not accurate. I have given the example in the attachment for your reference.
Right now our team is working on generating accurate phrases with the inclusion of 'and/or/of'.
Simultaneously, our team is also looking to get the same result that Contegra is right now providing you.
Please look into the progress report and let me know in case of any concerns.
Morgan
I will give you update on this once we are ready for the next demo.
Morgan
Right now we have implemented the logic by which we can have the stop words like 'and','or','of' and generated the phrases but we still have to find how much accurate it is so once we have that data then we will able to give you update on this.
Morgan