About Trey

Founder @ Searchkernel, author of
AI-Powered Search and Solr in Action, startup Advisor, researcher/ public speaker on search, relevance & ranking, recommendation systems, and natural language processing.

The Future of Search and AI

October 18th, 2018

Below is my closing keynote from Activate 2018 (The Search and AI conference), held in Montreal, Canada October 15-18, 2018.

Slides:

https://www.slideshare.net/treygrainger/the-future-of-search-and-ai

Video:

If you’ve been to my talks in recent years, you’ve probably heard me present on many of the different components needed to build a Semantic Search system. This year, we got open source versions of several of those components committed to the Apache Solr project (i.e. the Solr Text Tagger, Semantic Knowledge Graph, and Statistical Phrase Identifier). Our engineering team at Lucidworks has also been busy working to introduce many of the semantic query pipelines and AI jobs (misspelling detection, phrase detection, synonym discovery, head/tail query rewriting, etc.) needed to tie these components together end-to-end query understanding system. This talk goes into some of the linguistic and natural language processing theory behind how a semantic search system can infer user intent, and it walks through an example query pipeline which fits all of these pieces together into a unified semantic search system.

Slides:

https://www.slideshare.net/treygrainger/how-to-build-a-semantic-search-system

Video:

Talk Abstract:
Building a semantic search system – one that can correctly parse and interpret end-user intent and return the ideal results for users’ queries – is not an easy task. It requires semantically parsing the terms, phrases, and structure within queries, disambiguating polysemous terms, correcting misspellings, expanding to conceptually synonymous or related concepts, and rewriting queries in a way that maps the correct interpretation of each end user’s query into the ideal representation of features and weights that will return the best results for that user. Not only that, but the above must often be done within the confines of a very specific domain – ripe with its own jargon and linguistic and conceptual nuances.

This talk will walk through the anatomy of a semantic search system and how each of the pieces described above fit together to deliver a final solution. We’ll leverage several recently-released capabilities in Apache Solr (the Semantic Knowledge Graph, Solr Text Tagger, Statistical Phrase Identifier) and Lucidworks Fusion (query log mining, misspelling job, word2vec job, query pipelines, relevancy experiment backtesting) to show you an end-to-end working Semantic Search system that can automatically learn the nuances of any domain and deliver a substantially more relevant search experience.

Today I gave a really fun presentation at the Southern Data Science Conference entitles “Search for Meaning: The Hidden Structure in Unstructured Data”. While I always try to leave people who attend my talks with some new knowledge or insight, this presentation is probably the most philosophical of all the presentations I’ve given before. A bit of a dive into linguistic theory, some search theory, some graph theory, and some fun examples demonstrating fundamental mistakes and deep insights into what it takes to build software and algorithms that can truly learn from and understand the immensely rich relationships found in what has historically been referred to as just “unstructured” data.

Slides:

https://www.slideshare.net/treygrainger/searching-for-meaning

Search Relevance is my favorite topic to focus upon within the search and information retrieval domain. As such, I was elated when I found out that the team at Open Source Connections was putting together a conference this year called “Haystack” focused entirely on Relevance. I was honored to be invited to talk on the topic of “The Relevance of the Apache Solr Semantic Knowledge Graph.” If you’ve heard me present before on the Semantic Knowledge Graph, you’ll know that it is a really powerful tool for driving semantic understanding within search results. It was great getting to talk with some many like-minded relevance practitioners at Haystack, and to be able to share the below presentation with them, as well.

Slides:

https://www.slideshare.net/treygrainger/relevance-of-the-apache-solr-semantic-knowledge-graph

I was invited to give a guest lecture today to the Furman University Big Data: Mining and Analytics (CSC272). Special thanks to Dr. Kevin Treu, department chair of Furman’s Computer Science program for the invite. We had a great time diving into the many different kinds of data mining and analytics used by search and recommendations to learn from users and turn that learning into actionable intelligence to drive more and more relevant search experiences. Slides from the presentation are available below.

Slides:

https://www.slideshare.net/treygrainger/intent-algorithms-of-search-and-recommendation-engines

This year’s Lucene/Solr Revolution was held in Las Vegas, and was a blast as always. I had to fortune to present on the Apache Solr Semantic Knowledge Graph. The Semantic Knowledge Graph is a project that I was able to work on with my team while I was at CareerBuilder, and which CareerBuilder subsequently agreed to let us open source as both a standalone project and also as a patch back to the Apache Solr project.

Slides:

https://www.slideshare.net/treygrainger/the-apache-solr-semantic-knowledge-graph

Video:

Talk Abstract:
What if instead of a query returning documents, you could alternatively return other keywords most related to the query: i.e. given a search for “data science”, return me back results like “machine learning”, “predictive modeling”, “artificial neural networks”, etc.? Solr’s Semantic Knowledge Graph does just that. It leverages the inverted index to automatically model the significance of relationships between every term in the inverted index (even across multiple fields) allowing real-time traversal and ranking of any relationship within your documents. Use cases for the Semantic Knowledge Graph include disambiguation of multiple meanings of terms (does “driver” mean truck driver, printer driver, a type of golf club, etc.), searching on vectors of related keywords to form a conceptual search (versus just a text match), powering recommendation algorithms, ranking lists of keywords based upon conceptual cohesion to reduce noise, summarizing documents by extracting their most significant terms, and numerous other applications involving anomaly detection, significance/relationship discovery, and semantic search. In this talk, we’ll do a deep dive into the internals of how the Semantic Knowledge Graph works and will walk you through how to get up and running with an example dataset to explore the meaningful relationships hidden within your data.

Last night I had the opportunity to speak at the Greenville Data Science & Analytics Meetup on “Building Search & Recommendation Engines“. It was a great opportunity to present a general introduction to Apache Solr, Search Engines, Relevancy, Recommendations, and generally building intelligent information retrieval systems. I appreciated the level of interest and insightful questions from everyone who attended, and I look forward to more great events from this group in the future!

Slides:

http://www.slideshare.net/treygrainger/building-search-and-recommendation-engines

Talk Abstract:
In this talk, you’ll learn how to build your own search and recommendation engine based on the open source Apache Lucene/Solr project. We’ll dive into some of the data science behind how search engines work, covering multi-lingual text analysis, natural language processing, relevancy ranking algorithms, knowledge graphs, reflected intelligence, collaborative filtering, and other machine learning techniques used to drive relevant results for free-text queries. We’ll also demonstrate how to build a recommendation engine leveraging the same platform and techniques that power search for most of the world’s top companies. You’ll walk away from this presentation with the toolbox you need to go and implement your very own search-based product using your own data.

I had a blast at the Southern Data Science Conference yesterday in Atlanta, GA, where I presented a talk titled “Intent Algorithms: The Data Science of Smart Information Retrieval Systems”. This was the first year the conference was held, and it’s already clear already that is going to hold the title as the preeminent Data Science conference in the Southeast United States. Top speakers, authors, and industry and academic practitioners were represented from the likes of Google, Lucidworks, NASA, Microsoft, Allen Institute for AI, Skymind, CareerBuilder, Glassdoor, Distil Networks, Takt, Elephant Scale, AT&T, Macy’s Technology, Lost Alamos National Laboratory, Georgia Tech, The University of Georgia, and the South Big Data Hub. I had a lot to cover on the topic of “intent algorithms”, so the talk went at quite a rapid pace (due to the 30 minute time limit) to be sure everyone walked away with a solid understanding of the topic. There’s a lot of good material and demos in the presentation, though, so it’s definitely worth checking out the video or slides below!

Slides:

https://www.slideshare.net/treygrainger/intent-algorithms

Video:

Talk Abstract:
Search engines, recommendation systems, advertising networks, and even data analytics tools all share the same end goal – to deliver the most relevant information possible to meet a given information need (usually in real-time). Perfecting these systems requires algorithms which can build a deep understanding of the domains represented by the underlying data, understand the nuanced ways in which words and phrases should be parsed and interpreted within different contexts, score the relationships between arbitrary phrases and concepts, continually learn from users’ context and interactions to make the system smarter, and generate custom models of personalized tastes for each user of the system.

In this talk, we’ll dive into both the philosophical questions associated with such systems (“how do you accurately represent and interpret the meaning of words?”, “How do you prevent filter bubbles?”, etc.), as well as look at practical examples of how these systems have been successfully implemented in production systems combining a variety of available commercial and open source components (inverted indexes, entity extraction, similarity scoring and machine-learned ranking, auto-generated knowledge graphs, phrase interpretation and concept expansion, etc.).

I had a blast last night at the DFW Data Science Meetup presenting on “The Apache Solr Smart Data Ecosystem.” There’s so much going on in the Apache Lucene/Solr world around data intelligence and relevancy, and we had so many questions and great discussion along the way, that the presentation and discussion nearly 3.5 hours! It was great to have such a welcoming and actively engaged audience the whole way through and being able to dive in deep on topics with everyone – thanks @dfwdatascience for your hospitality and for hosting such a great event!

Slides:

http://www.slideshare.net/treygrainger/apache-solr-smart-data-ecosystem

Video:

Talk Abstract:
Search engines, and Apache Solr in particular, are quickly shifting the focus away from “big data” systems storing massive amounts of raw (but largely unharnessed) content, to “smart data” systems where the most relevant and actionable content is quickly surfaced instead. Apache Solr is the blazing-fast and fault-tolerant distributed search engine leveraged by 90% of Fortune 500 companies. As a community-driven open source project, Solr brings in diverse contributions from many of the top companies in the world, particularly those for whom returning the most relevant results is mission critical.

Out of the box, Solr includes advanced capabilities like learning to rank (machine-learned ranking), graph queries and distributed graph traversals, job scheduling for processing batch and streaming data workloads, the ability to build and deploy machine learning models, and a wide variety of query parsers and functions allowing you to very easily build highly relevant and domain-specific semantic search, recommendations, or personalized search experiences. These days, Solr even enables you to run SQL queries directly against it, mixing and matching the full power of Solr’s free-text, geospatial, and other search capabilities with the a prominent query language already known by most developers (and which many external systems can use to query Solr directly).

Due to the community-oriented nature of Solr, the ecosystem of capabilities also spans well beyond just the core project. In this talk, we’ll also cover several other projects within the larger Apache Lucene/Solr ecosystem that further enhance Solr’s smart data capabilities: bi-directional integration of Apache Spark and Solr’s capabilities, large-scale entity extraction, semantic knowledge graphs for discovering, traversing, and scoring meaningful relationships within your data, auto-generation of domain-specific ontologies, running SPARQL queries against Solr on RDF triples, probabilistic identification of key phrases within a query or document, conceptual search leveraging Word2Vec, and even Lucidworks’ own Fusion project which extends Solr to provide an enterprise-ready smart data platform out of the box.

We’ll dive into how all of these capabilities can fit within your data science toolbox, and you’ll come away with a really good feel for how to build highly relevant “smart data” applications leveraging these key technologies.

I was a panelist today for the South Big Data Hub’s open panel on Text Data Analysis. The event was geared toward researchers and companies working on text mining in any sector. Topics discussed included web-scraping, semantic web, analysis tools in R and Python, the benefits of open source search engines such as Solr and elasticsearch as well as current industry search options.

In my presentation, I provided a quick introduction to Apache Solr, described how companies are using Solr to power relevant search in industry, and provided a glimpse at where the industry is heading with regard to implementing more intelligent and relevant semantic search. Slides are attached here for future reference.

Slides: