Most of the data generated today is unstructured data. Things like video, images and audio. Machine learning, which relies on vast quantities of data, will increasingly need to perform well on these unstructured formats. But how do we enable machines to “see” a video and draw meaning from it? Enter vector embeddings.
Vector embeddings are a powerful tool for applying machine learning to unstructured data. In this blog post, we’ll talk about what vector embeddings are, the history behind their development and their current use cases.
On a high level, neural embeddings are succinct representations of data in the form a vector (a list of numbers) computed by a neural network such that similar items are located together in a high dimensional space.
Let’s use an analogy to help illustrate. Consider a number stored in an Excel spreadsheet. That number is represented using two dimensions: a row reference and a column reference. Imagine now if your spreadsheet had three dimensions and instead it was a cube which represented your number in a three dimensional space. Vector embeddings are like this, but instead of two or three dimensions, they have hundreds of dimensions. Vector embeddings are mathematical tensors - the generalization of a vector. This complexity is required to be able to represent something as complex as an image or video.
The history of vector embeddings goes back to the 1950s, when a group of computer scientists developed the concept of vector spaces, which allowed them to represent a wide range of mathematical data in a format that was easier to work with. By the 1990s, this concept had evolved into what we now know as vector embeddings, a technique that allows computers to represent unstructured data in a numerical form and to perform operations on those representations. Their importance in AI and ML applications is only increasing as more and more emphasis is placed on the manipulation of unstructured data.
How are these vectors generated for my data?
Vector embeddings are generated using a process called vector representation or vector encoding. This process involves taking an an “entity” - essentially an input data object - and mapping it to a “vector” of numbers (the high dimensional data point we described in the spreadsheet analogy earlier). An entity here could be a transaction, a user profile, an image, a sound, a long piece of text (sentence or paragraph), a time series, or a graph. This enables engineers to represent the original input (e.g., an image or video) as a vector.
To generate an embedding from an image, the first step is to extract the features of the image. This can be done by applying various computer vision techniques, such as edge detection, color histogram, and texture analysis. These techniques are used to identify the edges, shapes, textures, and other features of the image. The extracted features are then used to construct the vector representation.
The vector representation is constructed by setting each feature to an associated numerical value. For example, an image of a flower might have the following features: petals, sepals, leaves, stem, and color. Each of these features can be assigned a numerical value (e.g. petals = 3, sepals = 4, leaves = 5, stem = 6, and color = 7). These numerical values are then used to construct the vector representation of the image.
What do these embeddings mean?
A vector embedding is a compressed, numerical representation of data object capturing the most essential features of an entity in a way that computers and databases can easily compare it to others.
This vector embedding can then be used for various applications such as search using approximate nearest neighbors (ANN). ANN is a powerful tool for extracting meaning from unstructured data. For example, it could allow you to identify vector embeddings generated from landscape images that are clustered together to classify which ones are of beaches vs. mountains vs. forests.
Overall, vector embeddings are an incredibly powerful tool for understanding natural language and representing it in a way that can be understood and manipulated by computers. This technology has been instrumental in the development of AI and ML-driven applications, and it is likely to become even more important in the future as the volume of unstructured data continues to grow.
References
- Camacho-Collados, Jose, and Mohammad Taher Pilehvar. "From word to sense embeddings: A survey on vector representations of meaning." Journal of Artificial Intelligence Research 63 (2018): 743-788: https://www.jair.org/index.php/jair/article/download/11259/26454/
- Zhang, Yating, Adam Jatowt, and Katsumi Tanaka. "Towards understanding word embeddings: Automatically explaining similarity of terms." 2016 IEEE international conference on big data (big data) IEEE, 2016: https://ieeexplore.ieee.org/abstract/document/7840675