In this article, I’m documenting some of my findings when implementing Full-Text Search (FTS) in PostgreSQL (using Django ORM) and ElasticSearch.
As a Django developer, I started looking into available options to perform a Full-Text search over a standard size of around a million rows. Two options made sense to try out: PostgreSQL and ElasticSearch.
Before diving down into my findings, let’s clarify the distinction between Full-Text Search (FTS) (or “Searching”) and database filters or queries. “Searching” involves starting with nothing and adding results to it. Database Filtering begins with a collection and then eliminates entries from it based on criteria. Filtering won’t work with fuzzy inputs, but “Searching” can be done with Fuzzy inputs.
PostgreSQL Full-Text Search
I use Django Web Framework with PostgreSQL for most of my projects. PostgreSQL has supported Full-Text Search (FTS) since 2008, and Django has supported FTS since 1.10 (2016) via django.contrib.postgres
. So, it was the fastest & easiest option for me to integrate. Here are some of my findings:
-
This is a cheaper and faster option as it does not require any extra setup and maintenance.
-
On my local (Razer Blade 2.4 GHz 6 Core i7) testing, up to
500,000
records usingGIN Index
got results around~30ms
consistently. On checking online about benchmarks done by other people, I found it returns results in~30-50ms
for1.5M
records. -
using
Trigram
can slow it down up to 5-times. -
Current Django integration doesn’t support
Stemming
orFuzziness
directly
ElasticSearch
ElasticSearch is a very established name, and lots of libraries are available for its integration with Django and other frameworks. Here are the findings:
-
This tech is optimized just for searching, but setting it up and maintaining the infrastructure can be time-consuming.
-
Setting it up yourself requires dedicated servers or services, which are expensive than the PostgreSQL option.
-
Scaling as data grows is more manageable, and it supports all search options like Trigram, EdgeGram, Stemming, Fuzziness
-
On my local (Razer Blade 2.4 GHz 6 Core i7) testing up to
500,000
records, it consistently returns results in around~25ms
. On checking online about benchmarks done by other people, I found it returns results in~5-30ms
for1.5M
records.
Comparison Graph
Conclusion
With each new version of PostgreSQL, the search response time is improving, and it is proceeding toward an apple to apple comparison when compared with ElasticSearch. So, if the project is not going to have millions of records or large-scale data, Postgresql Full-Text Search would be the best option to opt for.
Terminologies
-
Stemming: It is the process of reducing a word to its root form to ensure variants of that word match the result during the search. For example,
Referencing
,Reference
,References
can be stemmed to a wordRefer
and on search of wordrefer
will return results having any variant of that word. -
NGram: It is like a sliding window that moves across the word - a continuous sequence of characters up to a specified length. For example, the term
Refer
will be turned into[R, RE, REF, E, EF, EFE, F, FE, FER]
. NGram can be used to search a word in parts or even from in between. The most commonly used types ofNGram
areTrigram
andEdgeGram
. -
Fuzziness: Fuzzy matching allows you to get results that are not an exact match. For example, search for the word
box
will also return results havingfox
. Common application includes Spell Check and Spam filtering.