Oracle’s 23ai release introduces a groundbreaking feature: Vector Search. This innovation empowers Oracle Database to seamlessly integrate diverse data types, enabling you to perform powerful, semantic searches alongside traditional business data queries within a single system.
Unlike keyword-based searches, Oracle AI Vector Search leverages AI to understand the meaning behind your queries. This eliminates the need for separate vector databases, streamlining your data infrastructure and reducing complexity.
Furthermore, Vector Search seamlessly integrates with other Oracle Database features, ensuring robust security, optimal performance, and high availability. You can directly access and manipulate data using standard SQL, without requiring any data format conversions. The ability to effortlessly combine disparate data types through simple joins further enhances the power and flexibility of Vector Search.
Introduction
Vector search is an advanced technology that enables searches by analyzing the similarity or distance between data points represented as multidimensional vectors. Unlike traditional keyword-based search, vector search dives deeper by focusing on the underlying meaning and context of words rather than relying solely on exact word matches. This approach makes vector search a highly effective solution for delivering more accurate and relevant results, enhancing search experiences by aligning with the intent behind the query.
Fig 1: Overview of Oracle AI Vector Search
The main process of the vector search
These three are the main processes of vector search:
1. Generating Vector Embeddings:
Use a vector embedding model to create a vector representation model and load it into the database.
2. Storing Vector Embeddings:
Create a column in the table with datatype as a vector where we going to do the vector search and update the added column with the loaded model.
3. Similarity Search:
Performing similarity searches using SQL.
Defining and Utilizing Vectors for Enhanced Data Representation in Oracle Database
VECTOR Data Type:
- The VECTOR data type represents a vector as a series of numbers.
- We need to define the number of dimensions for the vector.
Vector Embeddings:
- Vector embeddings capture the meaning of content like words, documents, audio, or images.
- They represent how similar objects are by placing them close together in a mathematical space.
- You can use different pre trained, open-source models to create these embeddings.
Using Open Neural Network Exchange (ONNX):
- You can import compatible models into Oracle Database using ONNX.
- This allows you to generate vector embeddings directly in the database using SQL.
Choose the Model:
We can’t use vectors effectively unless we can create them from our data. We can either generate them outside and load them into the database or use a model in the database to create vectors from our data. I don’t know how to make my own model, so I’ll use one from pre-loaded model developed by Hugging Face called all-MiniLM-L12-v2. We will utilize it to create vectors from regular text.
You can use the predefined ONNX model from here.
What is all-MiniLM-L12-v2?
Hugging Face’s all-MiniLM-L12-v2 model is a compact yet highly effective tool for natural language processing (NLP) tasks. It is designed for efficiency and works well for tasks like checking sentence similarity and classifying text. This model has 12 layers, making it faster and less demanding on resources than larger models, while still maintaining high accuracy. It’s a great option for developers working on NLP projects.
We have a normal table with data and now here we are going to load the vector data model(all-MiniLM-L12-v2) into the database and going to create a new column in the present table with vector datatype. After this, we need to update the new column with a loaded model from the old column of the table which we are going to use to do vector search.
Current table (IPL_CRI_MATCHES) with the columns:
Fig 2
we have loaded the IPL matches details with Summery on the IPL_CRI_MATCHES Table.
Fig 3
To Import the Model:
We create a folder to store the model, download it, and extract it into that folder.
Fig 4
We can now load the model into the database using the DBMS_VECTOR package.
Fig 5
We check the model information in the USER_MINING_MODELS view.
Fig 6
Updating the table IPL_CRI_MATCHES with Vector column:
We are going to add a new column as IPL_MATCH_VECTOR with vector datatype in the current IPL_CRI_MATCHES table
Fig 7
Fig 8
Now we are going to update the added column with the loaded model using the present column MATCH_SUMMARY where we are going to do vector search.
Fig 9
Now the IPL_MATCH_VECTOR column has loaded with the vector values generated.
Fig 10
Vector search:
Now we are going to search the IPL_MATCH with BAT in them.
The output will show a similarity search related to the matche_summary column containing the word “bat” displaying the first 5 rows.
Fig 11
Next the IPL_MATCH with BOWLER in them,
Fig 12
Difference Between Query Search and Vector Search:
Feature |
Query Search |
Vector Search |
Data |
Structured data |
Unstructured data |
Data Type |
Uses regular data types (e.g., integers, strings) |
Uses vector data types (e.g., float arrays) for embeddings |
Query Language |
SQL |
SQL with extensions for vectors |
Search Type |
Exact matches, range queries |
Similarity searches (e.g., cosine) |
Use Cases |
Transaction databases, CRM |
Recommendation systems, NLP |
Performance |
Fast for structured data |
Efficient for large unstructured datasets |
Complexity |
Simpler query structure |
May require ML models for embeddings |
Integration |
Standard database features |
Machine learning integration |
Result Interpretation |
Clear, precise results |
Results based on similarity |
SQL QUERY for example |
SELECT teams,
stadium,
city,
year,
MATCH_SUMMARY
FROM HR.IPL_CRI_MATCHES
WHERE MATCH_SUMMARY LIKE ‘%bowling%’
ORDER BY MATCH_SUMMARY
FETCH FIRST 5 ROWS ONLY; |
Variable search_text varchar2(100);
exec :search_text := ‘bowling’;
set linesize 200
column teams format a50
column match_summary format a100
SELECT vector_distance(IPL_MATCH_VECTOR, (vector_embedding(all_minilm_l12_v2 using :search_text as data))) as distance,
teams,
stadium,
city,
year,
MATCH_SUMMARY
FROM HR.IPL_CRI_MATCHES
order by 1
fetch approximately first 5 rows only; |
Query search output:
Fig 13
Vector search output:
Fig 14
Conclusion:
In summary, the query search is good for finding exact words, while the vector search is better in finding related ideas. But if you’re looking for specific words, the query search works well. If you need to understand the context and find similar items, vector search is usually more helpful.