How can I make whitespace remain as is after tokenization (like the underscore)?

Pham_Quang_Nha · January 15, 2019, 6:55am

Hello,

According to the document https://oss.redislabs.com/redisearch/Escaping/ ,the underscore is ignored by the tokenizer. Now I would like to make the whitespace to be ignored as well.

I tried to remove the whitespace ( [’ '] ) from the ToksepMap_g in this file https://github.com/RedisLabsModules/RediSearch/blob/master/src/toksep.h , then recompiled the source code, but it didn’t work.

Any suggestion would be truly appreciated!

Nha.

meirsh · January 15, 2019, 8:02am

Why not using tags? We only tokenize tags by comma and you can also change it on index creation (notice that you will have to escape the spaces on queries).

Pham_Quang_Nha · January 15, 2019, 8:09am

Hi, thank you for your quick response.

We need to utilize the stemming of TextField to do some prefix search. For example, we would like to search “Harry Po*” and get “Harry Potter” as a result, but we also want to make “Harry Potter” become a single word, so when the user search for “Potter” only, they won’t get any result. I believe TagField cannot fulfill this requirement.

Best,

Nha.

Vào 15:02:48 UTC+7 Thứ Ba, ngày 15 tháng 1 năm 2019, me...@redislabs.com đã viết:

meirsh · January 15, 2019, 8:56am

You will have to index the data both as TAG and as TEXT, then issue a union query on the TAG field and on the TEXT field.

Pham_Quang_Nha · January 15, 2019, 9:30am

Could you please explain your idea more detail? How can a union query on TAG and TEXT fields can give 0 result when user search for “Potter” only?

thank you.

best,

Vào 15:56:46 UTC+7 Thứ Ba, ngày 15 tháng 1 năm 2019, me...@redislabs.com đã viết:

meirsh · January 15, 2019, 9:39am

I think I missed understood you, you do not want results when the user searches for “Potter” only? If so then the TAG field is totally good for you:

127.0.0.1:6379> FT.CREATE idx SCHEMA name TAG SORTABLE

OK

127.0.0.1:6379> FT.ADD idx doc1 1.0 FIELDS name “Harry Potter”

OK
127.0.0.1:6379> FT.SEARCH idx “@name:{Harry\ Potter}”

(integer) 1
“doc1”
1. name
“Harry Potter”

127.0.0.1:6379> FT.SEARCH idx “@name:{Harry\ Po*}”

(integer) 1
“doc1”
1. name
“Harry Potter”
127.0.0.1:6379> FT.SEARCH idx “@name:{Potter}”
(integer) 0

Notice that as I mentioned before, you need to escape spaces on the query.

Pham_Quang_Nha · January 15, 2019, 9:51am

This is perfect. Thank you so so so much!

Is there anyway to solve the same problem, but we add the “weight” to the context. As far as I know, the TAG field does not have Weight.

Thank you again.

best,

Vào 16:39:41 UTC+7 Thứ Ba, ngày 15 tháng 1 năm 2019, me...@redislabs.com đã viết:

Pham_Quang_Nha · January 15, 2019, 10:55am

Hi, I found my answer, hopefully this may help someone else:

127.0.0.1:6379> FT.CREATE idx SCHEMA name TAG SORTABLE city TAG

OK

127.0.0.1:6379> FT.ADD idx doc1 1.0 FIELDS name “Harry Potter” city “California”

OK

127.0.0.1:6379> FT.ADD idx doc2 1.0 FIELDS name “Someone else” city “Harry Potter state”

OK

127.0.0.1:6379> FT.SEARCH idx “@name:{harry\ p*} => { $weight: 1.0; $slop: 1; $inorder: true; } | @city:{harry\ p*} => { $weight: 2.0; $slop: 1; $inorder: true; }”

(integer) 2
“doc2”
1. name
2. “Someone else”
3. city
4. “Harry Potter state”
“doc1”
1. name
2. “Harry Potter”
3. city
4. “California”

Again, thank you so much for your wonderful guidance.

Best,

Nha.

Vào 16:51:41 UTC+7 Thứ Ba, ngày 15 tháng 1 năm 2019, phqnha đã viết:

meirsh · January 15, 2019, 10:58am

Sure, happy I could help.

Pham_Quang_Nha · January 22, 2019, 2:32am

Sorry I did not test it carefully. Look like the order is determined by the order when inserting the documents (in my case, last in first out,) and not affected by the $weight in my query. Is this a bug?

Thank you.

Vào 17:58:52 UTC+7 Thứ Ba, ngày 15 tháng 1 năm 2019, me...@redislabs.com đã viết:

meirsh · January 23, 2019, 9:41am

I am sorry but currently we are not supporting the $weight with tags, we are working to support it.

Pham_Quang_Nha · January 23, 2019, 9:48am

Hi, thank you for your response.

I realized the default scorer (tf*idf) give the infinity score if the target tag is found, that’s why the ranking didn’t work because it could not compare inf numbers. So I changed the scorer to BM 25 and I got very nice results.

best.

Vào 16:41:19 UTC+7 Thứ Tư, ngày 23 tháng 1 năm 2019, me...@redislabs.com đã viết:

meirsh · January 23, 2019, 9:53am

By the way, you can write you own scorer if you want to, this way you might be able to do what you need with out modify the source:
https://oss.redislabs.com/redisearch/Extensions.html

Pham_Quang_Nha · February 11, 2019, 10:09am

Hi, it is not really related to my original issue, but since you mentioned about scorer function, I think posting it here is appropriate.

I would like to write a custom scorer, but after reading your documents and examples, I still can’t figure out how to include some custom data to calculate the score. Your scorer function signature is: double MyScorer(RSScoringFunctionCtx *ctx, RSIndexResult *h, RSDocumentMetadata *dmd, double minScore)

Now I would like to include 2 types of data to calculate the score:

A dynamic data: for example, we have many users with different weights that affect the final score, how could I inject a user’s weight into MyScorer?
A static data: some data belong to the document and will be included in the payload, but how can I get them from the payload, please give me a concrete example.

Thank you so much.

Vào 16:53:29 UTC+7 Thứ Tư, ngày 23 tháng 1 năm 2019, me...@redislabs.com đã viết:

meirsh · February 12, 2019, 8:33am

So I agree our extension api is not well documented and we should fix it. Regarding you questions, you should import redisearch.h file (all the structs you need should be there)

you can add this dynamic data on the ft.search request using payload, then you can access this payload using the given RSScoringFunctionCtx (ctx->payload)
you can put the static data on the document payload and then access it using the give RSDocumentMetadata (dmd->payload)

Hope this answer your question, let me know if you need any further clarification.

Topic		Replies	Views
Prefix search on the full sentence instead of term RediSearch	4	627	June 19, 2022
TAG fields and escaping RediSearch	6	1140	January 12, 2018
Redisearch 2.0 Queries RediSearch redisearch	6	1416	September 2, 2020
1.0.10 - Tag prefix completions: interesting. Docs? RediSearch	6	524	April 20, 2018
Escaping special characters with lettusearch RediSearch redisearch	6	2628	August 26, 2020

How can I make whitespace remain as is after tokenization (like the underscore)?

Related Topics