TOLOGIX - Machine Learning

Feedback on ML Auto-Suggest Search Prototype

Hi Devaang Bhatt Devaang and Dhrumil Shah Dhrumil ,

On review of the recent version of the ML auto-suggest search prototype: http://tologix.devitsandbox.com/, testing revealed three major issues:

  1. Expected phrases are still not included in suggested results. Using the example in the video below, a search in ISLG's Full Text Search for the phrase "tantamount to expropriation" had more than 416 results with more than 27 hits per document for the first 20 results. Whereas doing a search to the term "tantamount" in the ML auto-suggest doesn't produce "tantamount to expropriation" as a suggested phrase. What explains why "tantamount to expropriation" is not produced as a suggested phrase by the prototype?
  2. Continuing with the example above, one of the suggested phrases produced for the ML auto-suggest search when entering "tantamount" into the search field was "measures tantamount nationalization". However, if you perform a search for "measures tantamount nationalization" in ISLG's Full Text Search, no results are produced. Why would the prototype suggest a phrases that has no results across the document collection?
  3. I noticed that the ML auto-suggest search would only suggest a term of phrase if one term was entered into the search field. As soon as you added a space beyond the term, no suggested term or phrase is produced. Why can't the prototype provide suggested terms or phrases when multiple terms are entered into the search field? 
Thanks,

Morgan

Comments & Events

Dhrumil Shah
Hi Morgan Maguire, CEO Morgan ,

Thanks for your prompt reply and feedback. Our Team has started to check these three issues and will revert back to you with detail that we analyze with proper cause & reason.

I believe we didn't have taken all the documents yet. We have taken around 2300 to 2400 PDF so that maybe one reason but still I will verify the cause and will let you know.

 
Morgan Maguire, CEO
Thanks Dhrumil Shah Dhrumil . I look forward to hearing from you. Note I wouldn't expect the issues raised above to be significantly affected  by a more limited document set, because these terms and issues are pervasive throughout the document collection. 

Morgan
Dhrumil Shah
Hi Morgan Maguire, CEO Morgan  , Now Please review the site again, We have updated our ML model. I have resolved the issues but still, let me know if you find anything Our team is waiting for your valuable feedback.
Morgan Maguire, CEO
Hi Dhrumil Shah Dhrumil ,

Thanks for the update. It looks like issue #3 above is resolved, which is great. However, I think the engine needs a lot more refinement before we're ready to introduce it to subscribers.

Having said that, I'm still waiting on your and Devaang Bhatt Devaang 's comments to the approach I've outlined here: Path forward with Machine Learning project - TOLOGIX - Machine Learning. Please provide your comments, and then we can table this for discussion during the next team call on Thursday, December 5th.

Morgan
Dhrumil Shah
Hi Morgan Maguire, CEO Morgan ,

I have also resolved the issue of the second one in which previously we were eliminating the "to" stop word from phrases so you were not able to find the exact phrase of "measures tantamount nationalization" but now you will able to find "measure tantamount to nationalization" in "dtsearch".

Also, Our Team has extracted more phrases almost twice from previous so you will have more phrases in the result.

And for extending capabilites of the search I have seen your issue for tracing the user input or search. I and Devaang will discuss this by today or on Monday and decide what are the possible ways to implement this and will let you know about our thought.

Thanks for your feedback... :)
Morgan Maguire, CEO
Hi Dhrumil Shah Dhrumil

Yes, the resolution of eliminating the "to" stop word is great. We want to ensure the search can incorporate full phrases similar to a Google search. 

At the same time, issue #2 above relates more to the fact that the current prototype is suggesting terms or phrases that will not produce hits within the Full Text Search. As explained in the video above, the phrase "measures tantamount nationalization" produces no hits in the Full Text Search because that exact combination of words does not exist in the text of the documents searched. This is a problem, because it is suggesting to search phrases that don't actually exist in the document collection. 

Thanks,

Morgan