How to Perform Semantic Search in SQL

How to Perform Semantic Search in SQL

SQL databases, such as Postgres have been around for nearly three decades. They have proven their reliability and extensibility through extensive developer testing and as a result form the backbone for a wide range of applications and data warehouses.
By leveraging the familiar SQL syntax of the SIMILAR TO keyword, we made it possible to perform semantic search / vector search over semi-structured data.
SELECT * FROM reviews WHERE review_body SIMILAR TO 'product does not match images' LIMIT 100
And that’s it. NNextDB [Read “nextdb”] enables you to perform semantic search in SQL installing special plug-ins of performing extra data engineering .
Try it out here
Video preview

Semantic Search Vs Lexical Search

Currently, SQL databases like Postgres and search engines like ElasticSearch support full text search with advanced features such as
  • Stemming, where search matches can be based on root words such as “friend”, “friendly”, “friendship”
  • Ignoring stop (common) words that don’t add much to the query context. (the, a, with)
  • Weighting and search ranking on selected columns to boost their relevance. For instance in a product review site, you might want rank the review title more heavily than the review text.
Semantic search is much more powerful as it enables you to capture the meaning and context of a query more deeply than lexical search. Suppose you needed to search for text-based product reviews to understand what your customers are saying about you business. The following queries all translate to roughly the following same thing:
  • “product does not match images”
  • “images misrepresent the product”
  • “the product looked completely different when it arrived”
  • “the color of the item did not match the color in the picture”
Using semantic search, you can surface these related concepts much more easily. What’s more, similarity search isn’t constrained to text only, it can be applied to videos, images and other complex data.

Advanced Filtering and SQL queries

One main advantage of using NNextDB is that you get to keep the familiar filtering syntax of vanilla SQL. For example using the Amazon review dataset, you can filter for reviews with a certain length and between a set rating.
SELECT review_body, * FROM "amazon_reviews_us_Outdoors_v1_00" WHERE review_body SIMILAR TO 'product does not match images' AND LENGTH (review_body) > 50 AND (star_rating >= 2 AND star_rating <= 4) LIMIT 100
This is a key advantage because you don’t need to learn a new querying mechanism.

What is Vector or Semantic Search?

Briefly, semantic search utilizes advancements in AI/ML to capture meaning and context of unstructured data including text, images, and videos into a numeric representation called a vector embedding - usually in the form of a high-dimensional vector.

How Does Semantic Search Work in the Background?

NNextDB extracts vector embedding from the query texts using a pre-defined model (OpenAI’s text-embedding-ada-002) in this case. It then compares the vectors using approximate nearest neighbor algorithms like the Hierarchical Navigable Small World (HNSW) graph algorithm. Integrations are underway allowing you to specify a wider variety of models such as CLIP, Cohere’s and those from Hugging Face.

How can I get access to NNextDB?

NNextDB is still in private beta, however, you can get limited access now.
To get early access to NNextDB, sign up here