Covering Disruptive Technology Powering Business in The Digital Age

image
Supercharging Search with Vector Technology
image

Written By: Martin Dale Bolima, Tech Journalist, AOPG

 

Unless you have been living under a rock for the past several years, you know how to do a traditional search, right?

You open Google or Bing or Yahoo! and you type a keyword or phrase—say, “Cassandra.”

The algorithms will turn up thousands of matching searches containing the keyword “Cassandra.” However, they can be about Cassandra the Ballerina or Cassandra the comic book character. They can also be about Apache Cassandra, an open-source NoSQL distributed database used by companies like DataStax.

You can refine your search, that’s for sure, but not as precisely as you would want it to be. The reason is because that is the nature of traditional searches. It is wholly dependent on keywords and phrases and will search as such.

The traditional search, however, is giving way to vector search. It is type of search technique that leverages modern digital technologies like Artificial Intelligence (AI), Machine Learning (ML), generative AI, and more.

The Vector Search Defined

The vector search is defined by real-time data firm DataStax as “a method in artificial intelligence and data retrieval that uses mathematical vectors to represent and efficiently search through complex, unstructured data.”

Known also as nearest neighbour search, the vector search finds in given sources and repositories specific sets of data that match a prescribed query the closest. This, in turn, leads to more precise search results. It is possible because in vector search, both data and queries are represented by vectors, enabling the search to utilise distance representations among these vectors to find similarities and semantic relationships.

And so the question is: What is a vector?

“Think of vectors as floating point mathematical representations that represent a feature or an attribute of a particular element. So, the vector itself can have a large number of dimensions, which could range from the tens to the thousands,” explained Deb Dutta, vector search expert and former General Manager for Asia Pacific and Japan at DataStax, in an exclusive virtual interview with Disruptive Tech News (DTN).

These vectors, in turn, comprise the vector database, which Dutta says helps make data retrieval easier and faster.

“The vector database is not just about structured data. Vector database is optimised to store any kind of unstructured data. It could be text, it could be images, it could be audio, it could be video,” Dutta pointed out. “The nature of these elements is converted as embeddings and stored in a repository in a way that can be easily searched and retrieved. And the way they are stored is as floating point representations so they can be easily retrieved.”

The result, again, is search in overdrive—fast, efficient, on-point, and accurate.

But it does not necessarily mean vector search is taking over traditional search. At least not yet, according to Dutta.

Levelling Up Vector Search with Generative AI

Dutta is firm in his stance that vector search is not replacing traditional search. Instead, he believes the two will be complementary—even as artificial intelligence and generative AI are revolutionising the way people are now interacting with data.

“Given the advancements in Large Language Models (LLMs), [vector search will] absolutely complement [traditional search] because traditional search is not going to go away,” Dutta said. “Relational databases are not going away… [Vector search] is not replacing traditional databases and traditional queries. It is extending them.”

And, to prove a point, Dutta gave DTN a short demonstration of the underlying concept of vector search, where content and context intersect seamlessly. The demo, fittingly, involved ChatGPT 4, which Dutta describes as the most powerful LLM out there—at least for now.

The demo showed a typical content-based exchange between a chatbot and an “imaginary” customer—yours truly in this case. The premise is that the customer is inquiring about an error in his card application at a bank, only to get a generic answer that describes the error and several steps that might help in redoing the application.

While ChatGPT 4 “understood” the initial query and provided an answer, this response does not necessarily take into account possible underlying problems on why the error actually occurred. That’s because it does not take into account the context of the issue at hand. Neither does it consider the factors that come into play that made it a major concern in the first place.

This is the kind of ambiguity vector search can prevent by combining content with context.

Vector Search: Content-Rich, Context-Aware

With its capability to find similarities and semantic relationships and connect the dots so to speak, vector search adds context to the equation to provide hyper-precise and useful answers to queries.

Dutta aptly describes this context-powered capability as the “correlation between this capability of having a superbly intelligent agent working with any kind of structured and unstructured data in a database in giving recommendations and responses to customers.”

And he is right.

Vector search is so powerful, so cutting-edge that it can take something as simple as an account number and make sense of it—from the possible concerns associated with that account and the potential queries the owner might ask. It can even use images as visual prompts in much the same way today’s most powerful generative AI platforms do it.

The only thing is, setting up vector search is not as easy as 1-2-3 given the many moving parts involved.

The Vector Search Challenge

Dutta knows fully well the difficulties around building a vector search platform. After all, he has been involved in building it for different companies across all conceivable verticals for DataStax.

“So, basically, the biggest challenge is putting the pieces together because what I just showed you or what I was just talking about is a combination of multiple things,” Dutta pointed out.

And each of those things, it turns out, is a handful already.

“There is a large language model that is the agent in front of that. There’s the vector database, and there are things called ‘frameworks,’ which are the building blocks for applications like these [generative AI],” Dutta explained further. “So, you’ve got these pieces, the agents, the large language model, and the vector database. All of this needs to be put together and then you have your regular database.”

This is where tech providers like DataStax come in: To develop this powerful platform and enable companies to utilise it for whatever purpose they want.

“Putting the tools together and integrating them and making sure that they work in a commercial real-time environment is where the challenge is because you don’t want to put up a solution and then the solution does not perform or the solution breaks down, doing more damage than good,” Dutta noted. “This is where companies like DataStax are coming up and saying that, okay, we will substantiate the whole solution along….”

Supercharging Search with DataStax

DataStax is not stopping there.

DataStax works with a couple of implementation partners who are trained by company itself and working with it in these implementations to come in and do the tooling and do the full integration of all the components that come together and build the solution. More than that, DataStax has created this ecosystem of different technologies that need to work together along with the vector database to give the outcomes expected of vector technology.

In other words, DataStax has created a platform that will supercharge search with vector technology, and it is making it widely available to organisations who wish to take advantage of it—all at very competitive price points.

This could not have come at a better time, with Dutta seeing considerable upticks in vector search adoption, first in the US and even in the Asia Pacific region, where a number of customers are getting into production usage already.

He even expects adoption to pick up even more moving forward, which should come as a surprise because vector search can help companies in all sorts of ways.

It is what technology is for, after all.

(0)(0)

Archive