Friday Links 0.0.13 - Azure Search

This is based on an email I send my .NET team at work

Happy Friday,

Today is all about Azure Search.

Background

Azure Search is a hosted full text document search index. You upload documents to the search index, and then query it using more natural feeling search terms.

It seems really similar to Lucene which is what’s used in most of the CMS’s we work with.

Create the index

https://azure.microsoft.com/en-us/documentation/articles/search-create-index-dotnet/

In this step you need to define what fields are available in the index, their types, and some parameters related to how they can be searched. Long text fields like titles and descriptions can be marked as IsSearchable which means Azure can search them using full-text indexes. Other fields can be marked as IsSortable or IsFilterable depending on how you want to use them in processing your results.

Each document needs a unique key field marked with IsKey. This is how Azure knows to update the document on subsequent re-indexing.

Add documents to the index

There are a couple ways to add documents to the index.

https://azure.microsoft.com/en-us/documentation/articles/search-import-data-dotnet/

One is to manually push batches of documents. You can set options around each index operation for how Azure should treat any existing documents with the same key. This is a good choice for when you’re indexing .NET objects you have in memory or are not well represented in a single SQL table.

https://azure.microsoft.com/en-us/documentation/articles/search-indexer-overview/

The second option is to set up Azure search to pull data right from a data source, like a SQL table, DocumentDB or blob storage. With the right configuration it can notice updates and keep its own index in sync. This looks like a great option for when data is easily represented in persistent storage and you don’t want to write any code to keep the search index set up.

Search the index

https://azure.microsoft.com/en-us/documentation/articles/search-query-dotnet/

Finally, you can query the index providing options around sorting, filtering, and full-text searching. The default sort is based on relevance.

It also provides options for highlighting, so you can easily have it help you highlight terms in your search results. This is needed because the search results can use stemming: even though you searched for “test” it might return results for “testing” or “tests” as well. Manual string replacements might only highlight half the word in that case.

Improving Results

Search results can generally be improved by boosting the relevance of search matches in certain fields. For example, when searching products, a term appearing in the title is better indication of relevance than it appearing in the description.

https://msdn.microsoft.com/library/azure/dn798928.aspx

One way to tell Azure how to boost relevance is with Scoring profiles. You can use the API, or the portal, to add a profile to the index that tells Azure that certain fields are more relevant for search results than others. The profile’s can also say that more recent documents are likely to be more relevant than older ones.

https://msdn.microsoft.com/library/mt589323.aspx

You can also use Lucene’s query syntax to do ad-hoc boosting. This is a good fit for when you’re already familiar with Lucene and want to play with the relevance factors on a query by query basis.

You can also boost on a term by term basis.

Example

I’ve got an example app that indexes git commits in Azure search and lets you search for them. See the AzureSearch class for examples of using the Azure Search SDK.

Next time a client asks you for “Google quality” search, know that while you might not be able to do that, you can at least get Bing quality with Azure Search.