As a Django developer, I started looking into available options to perform a Full-Text search over a standard size of around a million rows. Two options made sense to try out: PostgreSQL and ElasticSearch.
Before diving down into my findings, let’s clarify the distinction between Full-Text Search (FTS) (or “Searching”) and database filters or queries. “Searching” involves starting with nothing and adding results to it. Database Filtering begins with a collection and then eliminates entries from it based on criteria. Filtering won’t work with fuzzy inputs, but “Searching” can be done with Fuzzy inputs.
PostgreSQL Full-Text Search
I use Django Web Framework with PostgreSQL for most of my projects. PostgreSQL has supported Full-Text Search (FTS) since 2008, and Django has supported FTS since 1.10 (2016) via
django.contrib.postgres. So, it was the fastest & easiest option for me to integrate. Here are some of my findings:
This is a cheaper and faster option as it does not require any extra setup and maintenance.
On my local (Razer Blade 2.4 GHz 6 Core i7) testing, up to
GIN Indexgot results around
~30msconsistently. On checking online about benchmarks done by other people, I found it returns results in
Trigramcan slow it down up to 5-times.
Current Django integration doesn’t support
ElasticSearch is a very established name, and lots of libraries are available for its integration with Django and other frameworks. Here are the findings:
This tech is optimized just for searching, but setting it up and maintaining the infrastructure can be time-consuming.
Setting it up yourself requires dedicated servers or services, which are expensive than the PostgreSQL option.
Scaling as data grows is more manageable, and it supports all search options like Trigram, EdgeGram, Stemming, Fuzziness
On my local (Razer Blade 2.4 GHz 6 Core i7) testing up to
500,000records, it consistently returns results in around
~25ms. On checking online about benchmarks done by other people, I found it returns results in
With each new version of PostgreSQL, the search response time is improving, and it is proceeding toward an apple to apple comparison when compared with ElasticSearch. So, if the project is not going to have millions of records or large-scale data, Postgresql Full-Text Search would be the best option to opt for.
Stemming: It is the process of reducing a word to its root form to ensure variants of that word match the result during the search. For example,
Referencescan be stemmed to a word
Referand on search of word
referwill return results having any variant of that word.
NGram: It is like a sliding window that moves across the word - a continuous sequence of characters up to a specified length. For example, the term
Referwill be turned into
[R, RE, REF, E, EF, EFE, F, FE, FER]. NGram can be used to search a word in parts or even from in between. The most commonly used types of
Fuzziness: Fuzzy matching allows you to get results that are not an exact match. For example, search for the word
boxwill also return results having
fox. Common application includes Spell Check and Spam filtering.