[ANN] RediSearch 1.0.2

Hey,

I’ve released RediSearch 1.0.2 today.

It includes a few minor bugfixes over 1.0.1 (which in turn includes some bug fixes over 1.0.0)

But most importantly it fixed a big performance regression in unions present since the 0.20 days or so.

This is more important as since 1.0.1 stemming is done implicitly with unions. I’ve seen queries go up an order of magnitude in speed because of this fix.

For example, on an index of 500k reddit comments, these are the results before the fix:

Verbatim, no stemming.

$ redis-benchmark -n 1000000 ft.search rd “donald trump” verbatim nocontent

14542.37 requests/sec.

And the default mode, with stemming. You’d expect some perfomance loss but not like this:

$ redis-benchmark -n 1000000 ft.search rd “donald trump” nocontent

345.89 requests/sec.

After the fix, the verbatim throughput remains the same, however the stemmed version is no:

$ redis-benchmark -n 1000000 ft.search rd “donald trump” nocontent

11115.27 requests/sec

so that’s over X30 faster with unions.

the difference is less dramatic with other queries, but still, it’s an important upgrade, especially if you’re using 1.0.1.

For enterprise users - this is included in redisearch enterprise 1.0.3.

Oops, the links of course! :slight_smile:

Source code can be downloaded from https://github.com/RedisLabsModules/RediSearch/releases/tag/v1.0.2

Docker image: https://hub.docker.com/r/redislabs/redisearch/

or just docker pull redislabs/redisearch:1.0.2

Impressive!

Congrats on the release Dvir, I wonder if it's going to be possible,
or already is, to serialize the result into a Redis key + setting the
TTL so that it's possible to scale to Redis-ish numbers of IOPs when
queries do not need to get super fresh results.

Hi Salvatore.
I was thinking of adding something like FT.SEARCHSTORE. It can also be used to prefetch results for paging (i.e run searchstore on 50 results, then get 0-10, 10-20, …)

It will be a bit more tricky in the cluster but certainly doable. And we can even be smart about it and do auto invalidation when documents are inserted (i.e. if a doc matching “foo” is added, we have a map of all queries containing “foo” in the cache, that might change). Not sure if it’s worth it though.

Since the response format is not a straightforward list that can be stored in a sorted set, we’ll need to either create a new data type for a cached search result, or serialize it into a sorted set, and then have the module abstract the deserialization.

The cluster thing is definitely a semantical issue even if there are
solutions indeed. So maybe it could be a better idea to make this a
bit more transparent and implement a query cache inside the module
itself... with TTL and auto invalidation on inserts if required to do
so (but I would make this optional, for workloads with continuous
inserts). However with this way of doing things pagination must be fit
into the API someway.