[Elasticsearch] Building a simple E-commerce search with Elasticsearch — Part 1

Mallikarjuna J S
6 min readFeb 15, 2020
Pic : Shutterstock

In this post let us try to build a basic search functionality required for an e-commerce site. I am using Elasticsearch as a search engine here. To learn the basics of search engine you can go through my other post .

Lets’ get started.

Step 1 — Preparing your data

The first thing required to build a search solution is to prepare the data required for search. I am taking minimum number of fields, but that are commonly used for search in all the e-commerce sites. Here is the books data I will be using in this post.

Books json data.

Step 2 — Building settings and mappings

The settings and mappings determine the way your data is stored in the index and also determines what kind of queries you can make against your index. It should be configured based on the fields present in your data and it should help you query your index efficiently and with relevancy.

Some of the basic configuration should include tokenising, stemming, stop word elimination and synonym expansion.

In this example I am using a custom analyzer called “books_custom_analyzer” that uses standard tokenizer and does an extra step of lowercasing and stemming. I am using the english stemmer to stem all the words once it is tokenized. It helps us to search singular or plural form of words. Example: book/books, run/running etc.

Index settings and mappings

Step 3 — Indexing the data

Now that we have inserted our settings and mappings, it is time for us to index the data. I am inserting 6 documents in this example but it can be any number of documents (It can be millions, as Elasticsearch supports clusters and scales very well for huge amount of data as well. More about clusters in an other post).

Indexing is a process of processing and storing the data such that it can be looked up efficiently by the search engine. When you insert any data to Elasticsearch, it uses the mappings and settings and creates an inverted index which makes it helpful for it is to query during the query time. Inverted index is the major data structure of any search engine which helps to search for any word very quickly compared to Relational or any other databases.

I am using Elasticsearch bulk api with the above created index and inserting the data in one go. Just run this in terminal and your data should be indexed and ready for search.

Index data to Elasticsearch using Bulk API.

Step 4 — Prepare for search

Now that we have inserted the data it’s ready for search. There are many search queries available in Elasticsearch that can be used to query your index. The one you pick depends on what kind of experience you need to provide to your users. There are so many options and you can build it in runtime using a programming language based on different configurations as well.

For now I will be using the multi_match query to search against name, author and description field.

Query 1 :

curl -X GET "localhost:9200/books/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"multi_match" : {
"query": "randomness",
"fields": [ "name", "description","author"]
}
}
}
'

Results: You see that the results appeared from match against the name and description field.

{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.4655045,
"hits" : [
{
"_index" : "books",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.4655045,
"_source" : {
"id" : 1,
"name" : "Fooled By Randomness",
"author" : "Nassim Taleb",
"price" : 12,
"description" : "Fooled by Randomness: The Hidden Role of Chance in Life and in the Markets is a book by Nassim Nicholas Taleb that deals with the fallibility of human knowledge."
}
},
{
"_index" : "books",
"_type" : "_doc",
"_id" : "6",
"_score" : 1.2255893,
"_source" : {
"id" : 6,
"name" : "Antifragile",
"author" : "Nassim Taleb",
"description" : "From the author of Fooled by Randomness. Antifragile : Things That Gain From Disorder is a book by Nassim Nicholas Taleb"
}
}
]
}
}

Query 2: Stemmed search term (fool instead of fooled.)

curl -X GET "localhost:9200/books/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"multi_match" : {
"query": "Fool",
"fields": [ "name", "description","author"]
}
}
}
'

Results: You see that the results contained 2 hits, which matched the fooled in the document but you searched with search term fool. This makes search very powerful and provides lots of ways to search for the end user. There are hundreds of settings available in Elasticsearch that helps you tune the search experience. You can find everything on the Elasticsearch website.

{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.4655045,
"hits" : [
{
"_index" : "books",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.4655045,
"_source" : {
"id" : 1,
"name" : "Fooled By Randomness",
"author" : "Nassim Taleb",
"price" : 12,
"description" : "Fooled by Randomness: The Hidden Role of Chance in Life and in the Markets is a book by Nassim Nicholas Taleb that deals with the fallibility of human knowledge."
}
},
{
"_index" : "books",
"_type" : "_doc",
"_id" : "6",
"_score" : 1.2255893,
"_source" : {
"id" : 6,
"name" : "Antifragile",
"author" : "Nassim Taleb",
"description" : "From the author of Fooled by Randomness. Antifragile : Things That Gain From Disorder is a book by Nassim Nicholas Taleb"
}
}
]
}
}

Step 5 — Relevancy Tuning

In the above step we were able to do a basic search. Now let us go one step further and tune the search such that we get the products that are most relevant on top. The relevancy tuning is the most important step in providing a good experience for your users and help discover the products they are really looking for. Elasticsearch uses scoring algorithms to rank your search results. One of the most commonly used algorithm is tf-idf.

One of the commonly used relevancy tuning technique is giving more weightage to the most important field in our document. In our case name field is the most important field , so let’s give an additional weightage to the field so that it boosts the document that matches the search term in name field compared to any other field.

Query: Search term — “randomness”

curl -X GET "localhost:9200/books/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"multi_match" : {
"query": "randomness",
"fields": [ "name^2", "description","author"]
}
}
}
'

Output: If you see the score in every document now it is higher in the first one compare to one above where we did not use any boosting. This is because the search term match happend against name field and hence higher score. This effectively increases you relevancy when you are matching important fields.

{
"took" : 7,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 2.931009,
"hits" : [
{
"_index" : "books",
"_type" : "_doc",
"_id" : "1",
"_score" : 2.931009,
"_source" : {
"id" : 1,
"name" : "Fooled By Randomness",
"author" : "Nassim Taleb",
"price" : 12,
"description" : "Fooled by Randomness: The Hidden Role of Chance in Life and in the Markets is a book by Nassim Nicholas Taleb that deals with the fallibility of human knowledge."
}
},
{
"_index" : "books",
"_type" : "_doc",
"_id" : "6",
"_score" : 1.2255893,
"_source" : {
"id" : 6,
"name" : "Antifragile",
"author" : "Nassim Taleb",
"description" : "From the author of Fooled by Randomness. Antifragile : Things That Gain From Disorder is a book by Nassim Nicholas Taleb"
}
}
]
}
}

Additional Enhancements

Now that we have built a basic search that does all the searching for us. You can take a look at adding synonyms, custom stopwords, facets, autocomplete, spell check and many other functionalities to your search. Click follow if you like to learn about the other functionality. I will be adding the posts soon.

Also here is the link to building a simple spellcorrector (Did you mean feature of google) with Elasticsearch. Hope you enjoyed the posts.

--

--