Before creating a document on Elasticsearch and writing a query to fetch the document let’s understand how the Elasticsearch stores/index documents in it and how it will search the document based on the query.
Elasticsearch uses its own structure called as Inverted Index to store the document. It is this design which makes the full-text search very fast.
An Inverted index contains a list of unique words which appears in a documents, and each word would contain list of documents in which it actually appears.
For example lets consider the documents with the below content in it :
- Tom knows how to repair a computer.
- Tom knows that his computer is useless.
Now lets create the inverted index by splitting the words of the content in both the documents and list in which documents each words appears:
Words Doc-1 Doc-2 ------------------------- Tom | X | X knows | X | X how | X | to | X | repair | X | a | X | computer| X | X that | | X his | | X is | | X useless | | X ------------------------
By now we would have understood why the above structure is called as Inverted index. It is the content that would be mapped with the document rather than a document mapping to a content.
Now, if we want to search for
useless computer, we just need to find the documents in which each word appears:
As we can see but the documents contains at least a word which the user has just searched. Now in this case Elasticsearch will apply a Naive Similarity Algorithm that just counts the number of matching words in document, and based on that we can say Second document is more relevant to our search than the first document.