Trey is SVP of Engineering @ Lucidworks, co-author of Solr in Action, founder or Celiaccess.com, researcher/ public speaker on search, analytics, recommendation systems, and natural language processing.

If you’ve been to my talks in recent years, you’ve probably heard me present on many of the different components needed to build a Semantic Search system. This year, we got open source versions of several of those components committed to the Apache Solr project (i.e. the Solr Text Tagger, Semantic Knowledge Graph, and Statistical Phrase Identifier). Our engineering team at Lucidworks has also been busy working to introduce many of the semantic query pipelines and AI jobs (misspelling detection, phrase detection, synonym discovery, head/tail query rewriting, etc.) needed to tie these components together end-to-end query understanding system. This talk goes into some of the linguistic and natural language processing theory behind how a semantic search system can infer user intent, and it walks through an example query pipeline which fits all of these pieces together into a unified semantic search system.




Talk Abstract:
Building a semantic search system – one that can correctly parse and interpret end-user intent and return the ideal results for users’ queries – is not an easy task. It requires semantically parsing the terms, phrases, and structure within queries, disambiguating polysemous terms, correcting misspellings, expanding to conceptually synonymous or related concepts, and rewriting queries in a way that maps the correct interpretation of each end user’s query into the ideal representation of features and weights that will return the best results for that user. Not only that, but the above must often be done within the confines of a very specific domain – ripe with its own jargon and linguistic and conceptual nuances.

This talk will walk through the anatomy of a semantic search system and how each of the pieces described above fit together to deliver a final solution. We’ll leverage several recently-released capabilities in Apache Solr (the Semantic Knowledge Graph, Solr Text Tagger, Statistical Phrase Identifier) and Lucidworks Fusion (query log mining, misspelling job, word2vec job, query pipelines, relevancy experiment backtesting) to show you an end-to-end working Semantic Search system that can automatically learn the nuances of any domain and deliver a substantially more relevant search experience.