FT.SEARCH on a fields using an exact match

Hello,

I tried os many things but I couldn’t figure out how to store the string Y-mAbs Therapeutics, Inc. and query for its exact match using RediSearch. What am I doing wrong:

127.0.0.1:6379> ft.info us_stocks_idx
 1) index_name
 2) us_stocks_idx
 3) index_options
 4) (empty array)
 5) fields
 6) 1) 1) type
       2) type
       3) TEXT
       4) WEIGHT
       5) "1"
       6) NOSTEM
    2) 1) symbol
       2) type
       3) TEXT
       4) WEIGHT
       5) "1"
       6) NOSTEM
    3) 1) cusips
       2) type
       3) TEXT
       4) WEIGHT
       5) "1"
       6) NOSTEM
    4) 1) exchange
       2) type
       3) TEXT
       4) WEIGHT
       5) "1"
       6) NOSTEM
    5) 1) common_stock
       2) type
       3) TEXT
       4) WEIGHT
       5) "1"
       6) NOSTEM
    6) 1) company_name
       2) type
       3) TEXT
       4) WEIGHT
       5) "1"
       6) NOSTEM
 7) num_docs
 8) "8924"
 9) max_doc_id
10) "108175"
11) num_terms
12) "21324"
13) num_records
14) "62752"
15) inverted_sz_mb
16) "17592186044400.393"
17) total_inverted_index_blocks
18) "36344"
19) offset_vectors_sz_mb
20) "0.51718902587890625"
21) doc_table_size_mb
22) "8.8462095260620117"
23) sortable_values_size_mb
24) "0"
25) key_table_size_mb
26) "0.22104549407958984"
27) records_per_doc_avg
28) "7.0318242940385476"
29) bytes_per_record_avg
30) "293962647783489.81"
31) offsets_per_term_avg
32) "8.6421468638449763"
33) offset_bits_per_record_avg
34) "8"
35) gc_stats
36)  1) bytes_collected
     2) "19747393"
     3) total_ms_run
     4) "484879"
     5) total_cycles
     6) "18730"
     7) avarage_cycle_time_ms
     8) "25.887827015483182"
     9) last_run_time_ms
    10) "49"
    11) gc_numeric_trees_missed
    12) "0"
    13) gc_blocks_denied
    14) "101"
37) cursor_stats
38) 1) global_idle
    2) (integer) 0
    3) global_total
    4) (integer) 0
    5) index_capacity
    6) (integer) 128
    7) index_total
    8) (integer) 0

Tried both of these:

127.0.0.1:6379> ft.add us_stocks_idx test 1 PARTIAL REPLACE FIELDS symbol test company_name "Y-mAbs Therapeutics, Inc."
127.0.0.1:6379> ft.add us_stocks_idx test 1 PARTIAL REPLACE FIELDS symbol test company_name "Y\-mAbs\ Therapeutics\,\ Inc\."

FT.GET :

127.0.0.1:6379> ft.get us_stocks_idx test
1) "symbol"
2) "test"
3) "company_name"
4) "Y-mAbs Therapeutics, Inc."

the Query:

127.0.0.1:6379> ft.search us_stocks_idx @company_name:"Y\-mAbs\ Therapeutics\,\ Inc\."
1) (integer) 4
2) "SQM"
3)  1) "symbol"
    2) "SQM"
    3) "exchange"
    4) "NYS"
    5) "figi"
    6) "BBG000BKK4S1"
    7) "type"
    8) "ad"
    9) "previous_close"
   10) "30.26"
   11) "latest_ex_dividend"
   12) "2020-06-04"
   13) "latest_dividend"
   14) "0.17092"
   15) "company_name"
   16) "Sociedad Quimica y Minera de Chile SA"
   17) "gics_sector"
   18) "Process Industries"
   19) "gics_industry"
   20) "Chemicals: Agricultural"
   21) "avg_volume_10_days"
   22) "823375.5"
4) "IRS"
5)  1) "symbol"
    2) "IRS"
    3) "exchange"
    4) "NYS"
    5) "figi"
    6) "BBG000DRSX48"
    7) "type"
    8) "ad"
    9) "previous_close"
   10) "4.12"
   11) "latest_ex_dividend"
   12) "2017-11-10"
   13) "latest_dividend"
   14) "1.382465000"
   15) "company_name"
   16) "IRSA Inversiones y Representaciones SA"
   17) "gics_sector"
   18) "Finance"
   19) "gics_industry"
   20) "Real Estate Development"
   21) "avg_volume_10_days"
   22) "156435.1"
6) "GRAM"
7)  1) "symbol"
    2) "GRAM"
    3) "exchange"
    4) "NYS"
    5) "figi"
    6) "BBG004ND4949"
    7) "type"
    8) "ad"
    9) "previous_close"
   10) "2.19"
   11) "latest_ex_dividend"
   12) "2016-04-26"
   13) "latest_dividend"
   14) "0.070845000"
   15) "company_name"
   16) "Gra\xc3\xb1a y Montero SAA"
   17) "gics_sector"
   18) "Industrial Services"
   19) "gics_industry"
   20) "Engineering & Construction"
   21) "avg_volume_10_days"
   22) "70723.7"
8) "AVAL"
9)  1) "symbol"
    2) "AVAL"
    3) "exchange"
    4) "NYS"
    5) "figi"
    6) "BBG004T0ZMF6"
    7) "type"
    8) "ad"
    9) "previous_close"
   10) "4.73"
   11) "latest_ex_dividend"
   12) "2020-06-29"
   13) "latest_dividend"
   14) "0.0273"
   15) "company_name"
   16) "Grupo Aval Acciones y Valores SA"
   17) "gics_sector"
   18) "Finance"
   19) "gics_industry"
   20) "Major Banks"
   21) "avg_volume_10_days"
   22) "119108"

I played with all variants of querying too - escaping, no escaping, etc but yet it’s doing partial tokenization as far as I can tell. The desired result is of course to return just the test doc as result of an exact match to the company_name I provide.

On Redis-cli you need double escaping to escape the escape :slight_smile:

127.0.0.1:6379> ft.add us_stocks_idx test 1 PARTIAL REPLACE FIELDS symbol test company_name "Y\\-mAbs\\ Therapeutics\\,\\ Inc\\."
1 Like

Thanks, could you please show me how to do it in python (using the base Redis lib or the RediSearch lib, either one would do). Also, I noticed that when I added the value to the field it looks like it stored the escaping of the string?

127.0.0.1:6379> ft.search us_stocks_idx @company_name:"Y\\-mAbs\\ Therapeutics\\,\\ Inc\\."
1) (integer) 1
2) "test"
3) 1) "symbol"
   2) "test"
   3) "company_name"
   4) "Y\\-mAbs\\ Therapeutics\\,\\ Inc\\."

4) "Y\\-mAbs\\ Therapeutics\\,\\ Inc\\." Doesnt this return Y\-mAbs\ Therapeutics\,\ Inc\. as raw text in the query results?

python:

import re
company_name = r'Y-mAbs Therapeutics, Inc.'

r.execute_command('FT.SEARCH', 'us_stocks_idx', '@comapny_name:"Y\\-mAbs\\ Therapeutics\\,\\ Inc\\."')

[0]

r.execute_command('FT.SEARCH', 'us_stocks_idx', '@comapny_name:"Y\-mAbs\ Therapeutics\,\ Inc\."')

[0]

r.execute_command('FT.SEARCH', 'us_stocks_idx', r'@comapny_name:"Y\-mAbs\ Therapeutics\,\ Inc\."')

[0]

r.execute_command('FT.SEARCH', 'us_stocks_idx', f'@comapny_name:"{re.escape(company_name)}"')

[0]
1 Like

On the python client you need single escaping:

In [1]: import redis                                                                                                               

In [2]: r = redis.Redis()                                                                                                          

In [3]: r.execute_command('ft.create idx SCHEMA t TEXT')                                                                           
Out[3]: b'OK'

In [4]: r.execute_command('FT.ADD', 'idx', 'doc1', '1.0', 'FIELDS', 't', r'Y\-mAbs\ Therapeutics\,\ Inc\.')                        
Out[4]: b'OK'

In [5]: r.execute_command('hgetall doc1')                                                                                          
Out[5]: [b't', b'Y\\-mAbs\\ Therapeutics\\,\\ Inc\\.']

In [6]: print(r.execute_command('hgetall doc1'))                                                                                   
[b't', b'Y\\-mAbs\\ Therapeutics\\,\\ Inc\\.']

In [8]: r.execute_command('FT.SEARCH', 'idx', r'Y\-mAbs\ Therapeutics\,\ Inc\.')                                                   
Out[8]: [1, b'doc1', [b't', b'Y\\-mAbs\\ Therapeutics\\,\\ Inc\\.']]

And yes RediSearch added the escaping as double escaping to the document in case of partial reindexing of the document will be needed.

If you only need exact (or prefix) search I suggest to use tag fields which are only tokenize by ‘,’ or any other char you chose (this way you only have to escape the query): https://oss.redislabs.com/redisearch/Tags/

In [12]: r.execute_command('ft.create idx SCHEMA t TAG SEPARATOR ;')                                                               
Out[12]: b'OK'

In [13]: r.execute_command('FT.ADD', 'idx', 'doc1', '1.0', 'FIELDS', 't', 'Y-mAbs Therapeutics, Inc.')                             
Out[13]: b'OK'

In [14]: print(r.execute_command('hgetall doc1'))                                                                                  
[b't', b'Y-mAbs Therapeutics, Inc.']

In [17]: r.execute_command('FT.SEARCH', 'idx', '@t:{Y\-mAbs\ Therapeutics\,\ Inc\.}')                                              
Out[17]: [1, b'doc1', [b't', b'Y-mAbs Therapeutics, Inc.']]

In [18]: r.execute_command('FT.SEARCH', 'idx', '@t:{Y\-mAbs\ Therapeutics\,*}')                                                    
Out[18]: [1, b'doc1', [b't', b'Y-mAbs Therapeutics, Inc.']]
3 Likes