Suggest / autocomplete mid word queries

Hello,

I read a cool thread on mid-word querying via the RediSearch autocomplete engine: Complex querying

I am trying to achieve this but I think my current solution is inefficient and was wondering how you guys would make it work.

Let’s stick to the original post and have a phrase: star wars trilogy

I create permutations of this so I index these with payload set to “star wars trilogy”.

In my index I will have:

star wars trilogy
wars star trilogy
trilogy star wars
star trilogy wars
wars trilogy star
trilogy wars star

This works fine - as it will suggest "star wars trilogy" for whichever word we enter - but as you can see if we have a longer phrase, the permutation count quickly add up and we need to index hundreds of thousands of suggestions.

Looking forward to your ideas, thanks in advance!

PS: I am trying to imitate one of Algolia’s solution, eg: Algolia Places you can see that the order of the entered words doesn’t matter and you’ll still get autocomplete results.

Can you better explain the usecase, I do not see why not just using normal index with TEXT field that will tokenize your text into words and you will be able to search each word by prefix:

127.0.0.1:6379> FT.CREATE idx SCHEMA s TEXT
OK
127.0.0.1:6379> ft.add idx doc1 1.0 FIELDS s "star wars trilogy"
OK
127.0.0.1:6379> FT.SEARCH idx sta*
1) (integer) 1
2) "doc1"
3) 1) "s"
   2) "star wars trilogy"
127.0.0.1:6379> FT.SEARCH idx tril*
1) (integer) 1
2) "doc1"
3) 1) "s"
   2) "star wars trilogy"

Thanks for the quick reply.

The usecase is basically same as the link above, an autocomplete input box where searching with a typo eg.: triol would return a suggestion which is: star wars trilogy

The prefix solution you mentioned works perfectly as long as there is no typo, so I started using the autocompleter, as it had the fuzzy prefix search functionality.

Previously I tried using the fts engine with queries something like these:

  1. %triol*%
  2. triol*
  3. %triol%

but of course the 1st is an invalid query and querying the 2nd and 3rd would return nothing because of the typo.

Thanks again!

@drs so what you want is a fuzzy matching, the %% syntax should work perfectly and you just need to decide on the LD you want (https://en.wikipedia.org/wiki/Levenshtein_distance#:~:text=Informally%2C%20the%20Levenshtein%20distance%20between,considered%20this%20distance%20in%201965.), you can see that with LD 3 you get results:

127.0.0.1:6379> FT.CREATE idx SCHEMA s TEXT
OK
127.0.0.1:6379> ft.add idx doc1 1.0 FIELDS s "star wars trilogy"
OK
127.0.0.1:6379> FT.SEARCH idx triol
1) (integer) 0
127.0.0.1:6379> FT.SEARCH idx %triol%
1) (integer) 0
127.0.0.1:6379> FT.SEARCH idx %%triol%%
1) (integer) 0
127.0.0.1:6379> FT.SEARCH idx %%%triol%%%
1) (integer) 1
2) "doc1"
3) 1) "s"
   2) "star wars trilogy"

Please notice that 3 is the maximum LD we allow.

@meirsh Thanks for the suggestion, I see, I think I overcomplicated this a little bit.

Anyways if the word is a long one, this would still return 0. Eg: given an indexed value: International Man of Mystery

By querying intr the query would most likely fail cause the high LD, so I came up with an idea if charlen < 4 I am using the * prefix search, after it I am using the LD you mentioned.

Not sure if its a good solution, but seems to work for now:D

Thanks again for your replies!

You can also do one query with both fuzzy and prefix:
FT.SEARCH idx "%%%triol%%% triol*"

1 Like