Revisiting site search + SQLite as a search engine

Revisiting site search + SQLite as a search engine

Back in 2017, I researched adding search functionality to our websites (blog post) and ended up going with Zoom Search Engine. This has worked OK, but not without issues...

So, as I was re-structuring our web-sites, I decided to have another look at the options for this.

Full "site search engine as a service"

(= crawler + index hosting + search UI / widget + management / statistics)

I had a quick second look at the services from my 2017 round (prices for 400 documents):

Also found a couple of new ones:

A few aspects (including price) of SiteSearch360 seemed more interesting than the others, so I decided to give this a try.

I got this fully implemented and running on our websites.

One thing I really liked was the way you tell SiteSearch360 which parts of web-pages to index using CSS query selectors (or XPath). I eventually ended up using this concept in my solution (see below).

The crawling and management/statistics were also nice, but I wasn't real happy with the search UI integration.

Full text search as a service

(= index hosting + management / statistics - but no crawler)

I looked at:

These are much cheaper than the "full site search engine as a service" services above, but also require more programming (crawler + search UI) to get up and running.

In recent years I've noticed Algolia search on many programming related web-sites (possibly because it is free for open source projects).

I was curious enough that I programmed my own simple crawler (including the CSS query selector trick I learned from SiteSearch360) and tried uploading the data to Algolia - just to get a feel for how all this works.

Once I had extracted the text data (with my crawler) and converted it into JSON, the upload process was simple enough and searching worked very well within the Algolia management interface.

I didn't take this any further, because at this point I realized that the output from my crawler could also easily be fed into other full text search solutions...

Dedicated full text search server software

I had a cursory look at

These are open source / free, advanced dedicated full text search server software programs (running on your own server) designed for enterprise searching with thousands of documents.

But I really don't want to run and maintain yet another application on our server (one of my issues with Zoom Search Engine), and they seem like overkill as my largest website only has around 400 documents.

All the major database servers (Microsoft SQL Server, MySQL, PostgreSQL, MongoDB) include some type of full text search functionality which could potentially be used for site search.

Of course as with the "Full text search as a service" providers, I would need to program my own crawler and search UI.

I already run Microsoft SQL server, so I had a look at the full text search features in this. Unfortunately I couldn't find any easy way to get out "snippets" of the indexed text with highlighting of query terms - which is pretty essential for displaying search results.

As for the other database servers listed above - again - I really don't want to run and maintain yet another application on our server, so I did not consider these further.

SQLite

In recent years, I have been reaching for SQLite in more and more situations where I would have used Microsoft SQL server in the past. SQLite is often both faster and easier to work with.

Turns out that SQLite also does full text search.

And unlike Microsoft SQL Server, it has built in "highlight" and "snippet" functions for generating text for the search result listing.

So combining my own crawler (incorporating the "CSS query selector" trick from SiteSearch360), SQLite full text search, and some existing search UI template/code from my Zoom Search Engine integration, I was quickly able to put together a full site search system - without introducing any new paid services, or any new applications running on our server (SQLite is just a library / .dll file already in use on our web-sites).

Indeed compared to the previous Zoom Search Engine setup, the complexity has been reduced, and I have much better control over the crawling process.

And performance is amazing - searches execute in 1-2 milliseconds.

Notes for using SQLite full text search with .Net

I use the official SQLite NuGet package System.Data.SQLite.Core which includes the full text search functionality (FTS5). I have read that full text search is NOT included in Microsoft's fork "Microsoft.Data.Sqlite".

Also the full text search functionality is an "extension", which needs to be enabled and loaded (after opening the database connection) by calling:

connection.EnableExtensions(True)
connection.LoadExtension("SQLite.Interop.dll", "sqlite3_fts5_init")