Press ESC to close

Building LLM applications with vector search in Azure Cognitive Services

By restricting the underlying large language model (LLM), tools like Semantic Kernel, TypeChat, and LangChain enable apps to be built around generative AI technologies like Azure OpenAI. LLMs traverse semantic spaces and predict syllables in a token chain. Open-ended questions might result in meaningless material. It is critical to rely upon LLM results from reliable data sources like Wikipedia, Stack Overflow, and Reddit to prevent misleading and illogical output and provide correct and clear replies to users.

Semantic memory is used to constrain huge language models

Microsoft’s new LLM-based development stack includes tools for constraining LLM models and generating text from a reduced collection of data. This may be accomplished with tools such as TypeChat or Semantic Kernel, which root the model in a well-defined semantic space. A basis of this technique is semantic memory, which employs a vector search to offer a prompt for factual output. This method is used in Microsoft’s Bing Chat, resulting in the use of Bing’s native vector search features. Semantic memory enables usable, grounded LLM-based applications, such as the usage of open-source vector databases or the addition of vector indexes to SQL and NoSQL databases.

Azure Cognitive Search now includes vector indexing

Azure Cognitive Search is a search engine created by Microsoft that combines Lucene searches with natural language inquiries. It’s a platform that contains private data and makes use of Cognitive Service APIs. It provides vector indexes, allowing AI-based applications to do similarity searches. Azure Cognitive Search integrates with other Azure services to provide high availability and low latency. Microsoft Entra ID can be used by enterprise applications to manage access to secret data.

People Also read – How to minimize data risk for generative AI and LLMs in the enterprise

Create and save embedded vectors for your content

Azure Cognitive Search is a “bring your own embedded vector” service that needs customers to construct embeddings for their material using Azure OpenAI or the OpenAI embedding APIs. The service employs the closest neighbour model in an appeal to the vector index to retrieve documents that are similar to the original query. This is used by Microsoft as part of the Retrieval Augmented Generation (RAG) design pattern in Azure Machine Learning, which provides a low-code solution to constructing and exploiting vector indexes.

How to Begin with Vector Search in Azure Cognitive Search

Azure Cognitive Search is a vector query tool that leverages embeddings to load a search index quickly. To use it, build Azure OpenAI and Cognitive Search resources in the same region construct embeddings and load the index using asynchronous calls. Vectors are kept in a search index as vector fields, which are mapped using a Hierarchical Navigable Small World proximity network. The data set must be saved as JSON documents, but the index does not have to include source documents. To perform a query, pass the query body to the embedding model, together with the target vector index, the number of matches, and associated text fields.

Moving on from basic text vectors

The vector capabilities of Azure Cognitive Search are far more than just text matching. To facilitate searches across texts in several languages, the cognitive search can function with multilingual embeddings. You can also utilise more complicated APIs. For example, you might use Bing semantic search technologies in a hybrid search to deliver more accurate results, boosting the quality of your LLM-powered application’s output.


Microsoft is combining technologies and techniques from its GPT-4-powered Bing search engine and Copilots, such as orchestration engines such as Semantic Kernel and Azure AI Studio’s quick flow, to improve big language models, vector search, and index delivery while lowering costs and learning curves.