In this article, I’m documenting some of my findings when implementing Full-Text Search (FTS) in PostgreSQL (using Django ORM) and ElasticSearch.

As a Django developer, I started looking into available options to perform a Full-Text search over a standard size of around a million rows. Two options made sense to try out: PostgreSQL and ElasticSearch.

Before diving down into my findings, let’s clarify the distinction between Full-Text Search (FTS) (or “Searching”) and database filters or queries. “Searching” involves starting with nothing and adding results to it. Database Filtering begins with a collection and then eliminates entries from it based on criteria. Filtering won’t work with fuzzy inputs, but “Searching” can be done with Fuzzy inputs.

I use Django Web Framework with PostgreSQL for most of my projects. PostgreSQL has supported Full-Text Search (FTS) since 2008, and Django has supported FTS since 1.10 (2016) via django.contrib.postgres. So, it was the fastest & easiest option for me to integrate. Here are some of my findings:

  • This is a cheaper and faster option as it does not require any extra setup and maintenance.

  • On my local (Razer Blade 2.4 GHz 6 Core i7) testing, up to 500,000 records using GIN Index got results around ~30ms consistently. On checking online about benchmarks done by other people, I found it returns results in ~30-50ms for 1.5M records.

  • using Trigram can slow it down up to 5-times.

  • Current Django integration doesn’t support Stemming or Fuzziness directly

ElasticSearch

ElasticSearch is a very established name, and lots of libraries are available for its integration with Django and other frameworks. Here are the findings:

  • This tech is optimized just for searching, but setting it up and maintaining the infrastructure can be time-consuming.

  • Setting it up yourself requires dedicated servers or services, which are expensive than the PostgreSQL option.

  • Scaling as data grows is more manageable, and it supports all search options like Trigram, EdgeGram, Stemming, Fuzziness

  • On my local (Razer Blade 2.4 GHz 6 Core i7) testing up to 500,000 records, it consistently returns results in around ~25ms. On checking online about benchmarks done by other people, I found it returns results in ~5-30ms for 1.5M records.

Comparison Graph

Postgresql vs ElasticSearch performance graph

Postgresql vs ElasticSearch performance graph

Conclusion

With each new version of PostgreSQL, the search response time is improving, and it is proceeding toward an apple to apple comparison when compared with ElasticSearch. So, if the project is not going to have millions of records or large-scale data, Postgresql Full-Text Search would be the best option to opt for.

Terminologies

  • Stemming: It is the process of reducing a word to its root form to ensure variants of that word match the result during the search. For example, Referencing, Reference, References can be stemmed to a word Refer and on search of word refer will return results having any variant of that word.

  • NGram: It is like a sliding window that moves across the word - a continuous sequence of characters up to a specified length. For example, the term Refer will be turned into [R, RE, REF, E, EF, EFE, F, FE, FER]. NGram can be used to search a word in parts or even from in between. The most commonly used types of NGram are Trigram and EdgeGram.

  • Fuzziness: Fuzzy matching allows you to get results that are not an exact match. For example, search for the word box will also return results having fox. Common application includes Spell Check and Spam filtering.

Discuss on Hacker News

Related articles: