Memory issue with Redis Search (2.0.1)

Hi,

We use Redis Search (version 2.0.1) with 3 indices (Redis version is 6.0.5).
Data in 2 of these 3 indices are heavily overridden, but our keys number is almost constant (about 530K keys).
Memory utilization is getting slowly higher on each day (for the same amount of keys).
It seems that GC details may hint something, when I tried FT.INFO these 2 indices, I observed the following:
“gc_stats”,
[
“bytes_collected”,
“7916851898”,
“total_ms_run”,
“52334274”,
“total_cycles”,
“14845”,
“average_cycle_time_ms”,
“3525.3805321657123”,
“last_run_time_ms”,
“34327”,
“gc_numeric_trees_missed”,
“10923”,
“gc_blocks_denied”,
“6350486”
],

these last 2 figures keep growing.
Right now, our we restart these nodes on a weekly basis in order to keep Redis from exhausting the servers memory.

Any idea?

Thanks,
Itai

Hello Itai,

Thank you for your message.

Though close to 8GB were collected, you report the size increase. We suspect you replace data faster than we can clean it. Is it possible for you to not use the server for maintenance and try calling ft.debug GC_FORCEINVOKE index_name several times. It would be great if you can collect GC stats between those calls.

In addition, can you please share FT.INFO from different times in the cycle you describe. It would help us assess the issue.

Thanks,

Ariel

Hi Ariel,

Thanks for replying.
Let me share the following FT.INFO output for our major index (DataCacheItemIndex):

Out for JAN-17:

[ “index_name”, “DataCacheItemIndex”, “index_options”, , “index_definition”, [ “key_type”, “HASH”, “prefixes”, [ “” ], “language_field”, “__language”, “default_score”, “1”, “score_field”, “__score”, “payload_field”, “__payload” ], “fields”, [ [ “DataCacheItem_SourcedKey”, “type”, “TEXT”, “WEIGHT”, “1” ], [ “DataCacheItem_DataVersion”, “type”, “NUMERIC” ], [ “DataCacheItem_ExpiryTimeInTicks”, “type”, “NUMERIC” ] ], “num_docs”, “63493”, “max_doc_id”, “83608”, “num_terms”, “11735”, “num_records”, “42730”, “inverted_sz_mb”, “0.37751007080078125”, “total_inverted_index_blocks”, “64546”, “offset_vectors_sz_mb”, “0.020109176635742188”, “doc_table_size_mb”, “427.868408203125”, “sortable_values_size_mb”, “0”, “key_table_size_mb”, “2.4975166320800781”, “records_per_doc_avg”, “0.67298758029937744”, “bytes_per_record_avg”, “9.2639360427856445”, “offsets_per_term_avg”, “0.49347063899040222”, “offset_bits_per_record_avg”, “8”, “hash_indexing_failures”, “0”, “indexing”, “0”, “percent_indexed”, “1”, “gc_stats”, [ “bytes_collected”, “156598”, “total_ms_run”, “1274”, “total_cycles”, “9”, “average_cycle_time_ms”, “141.55555555555554”, “last_run_time_ms”, “178”, “gc_numeric_trees_missed”, “0”, “gc_blocks_denied”, “36” ], “cursor_stats”, [ “global_idle”, 0, “global_total”, 0, “index_capacity”, 128, “index_total”, 0 ] ]

Output for JAN-18:

[ “index_name”, “DataCacheItemIndex”, “index_options”, , “index_definition”, [ “key_type”, “HASH”, “prefixes”, [ “” ], “language_field”, “__language”, “default_score”, “1”, “score_field”, “__score”, “payload_field”, “__payload” ], “fields”, [ [ “DataCacheItem_SourcedKey”, “type”, “TEXT”, “WEIGHT”, “1” ], [ “DataCacheItem_DataVersion”, “type”, “NUMERIC” ], [ “DataCacheItem_ExpiryTimeInTicks”, “type”, “NUMERIC” ] ], “num_docs”, “471328”, “max_doc_id”, “61385104”, “num_terms”, “110288”, “num_records”, “203689”, “inverted_sz_mb”, “15.912251472473145”, “total_inverted_index_blocks”, “1094435”, “offset_vectors_sz_mb”, “25.307598114013672”, “doc_table_size_mb”, “251475.265625”, “sortable_values_size_mb”, “0”, “key_table_size_mb”, “17.214992523193359”, “records_per_doc_avg”, “0.43215978145599365”, “bytes_per_record_avg”, “81.91510009765625”, “offsets_per_term_avg”, “130.28166198730469”, “offset_bits_per_record_avg”, “8”, “hash_indexing_failures”, “0”, “indexing”, “0”, “percent_indexed”, “1”, “gc_stats”, [ “bytes_collected”, “678119471”, “total_ms_run”, “796671”, “total_cycles”, “1854”, “average_cycle_time_ms”, “429.70388349514565”, “last_run_time_ms”, “479”, “gc_numeric_trees_missed”, “107”, “gc_blocks_denied”, “147980” ], “cursor_stats”, [ “global_idle”, 0, “global_total”, 0, “index_capacity”, 128, “index_total”, 0 ] ]

Output for JAN-19:

[ “index_name”, “DataCacheItemIndex”, “index_options”, , “index_definition”, [ “key_type”, “HASH”, “prefixes”, [ “” ], “language_field”, “__language”, “default_score”, “1”, “score_field”, “__score”, “payload_field”, “__payload” ], “fields”, [ [ “DataCacheItem_SourcedKey”, “type”, “TEXT”, “WEIGHT”, “1” ], [ “DataCacheItem_DataVersion”, “type”, “NUMERIC” ], [ “DataCacheItem_ExpiryTimeInTicks”, “type”, “NUMERIC” ] ], “num_docs”, “503564”, “max_doc_id”, “239324892”, “num_terms”, “202415”, “num_records”, “251137”, “inverted_sz_mb”, “96.928802490234375”, “total_inverted_index_blocks”, “2931850”, “offset_vectors_sz_mb”, “106.74593353271484”, “doc_table_size_mb”, “848671”, “sortable_values_size_mb”, “0”, “key_table_size_mb”, “18.325801849365234”, “records_per_doc_avg”, “0.49871912598609924”, “bytes_per_record_avg”, “404.708251953125”, “offsets_per_term_avg”, “445.6978759765625”, “offset_bits_per_record_avg”, “8”, “hash_indexing_failures”, “0”, “indexing”, “0”, “percent_indexed”, “1”, “gc_stats”, [ “bytes_collected”, “2629072389”, “total_ms_run”, “2700790”, “total_cycles”, “4555”, “average_cycle_time_ms”, “592.92864983534582”, “last_run_time_ms”, “573”, “gc_numeric_trees_missed”, “543”, “gc_blocks_denied”, “699433” ], “cursor_stats”, [ “global_idle”, 0, “global_total”, 0, “index_capacity”, 128, “index_total”, 0 ] ]

Can you please explain the meaning of the different GC stats (gc_blocks_denied, gc_numeric_trees_missed etc.)?

Thanks,
Itai

Hello Itai,

This is very helpful but I would like to ask you to share more stats in the coming days so I have more data to work with.

You have 2 NUMERIC fields are the numbers there “stable”? or are changing a lot? I expect DataVersion to have a small number of values and ExpiryTimeInTicks to have many different values. If there is anything you can share about How you use the fields, it would be helpful.

As for your question about the stats -
gc_numeric_trees_missed counts the number of nodes in our data structure that actually hold values (or ranges). It is an internal detail that we can look into.
gc_blocks_denied - counts times when the last block on the inverted index was updated while the GC was running and therefore we don’t clean during the current iteration.

Regards,
Ariel

Hi Ariel,

As to your question:
Some of the documents in this index can change in a very high frequency (multiple time every second) while other may change once a day.
When a data update occur - we replace the document (insert a new one instead - no values update).
DataVersion is a growing integer (many possible values), and as you assumed rightly, ExpiryTimeInTicks has many different values.

I’m adding the following FT.INFO of this index from today:

FT.INFO DataCacheItemIndex
[
“index_name”,
“DataCacheItemIndex”,
“index_options”,
,
“index_definition”,
[
“key_type”,
“HASH”,
“prefixes”,
[
“”
],
“language_field”,
“__language”,
“default_score”,
“1”,
“score_field”,
“__score”,
“payload_field”,
“__payload”
],
“fields”,
[
[
“DataCacheItem_SourcedKey”,
“type”,
“TEXT”,
“WEIGHT”,
“1”
],
[
“DataCacheItem_DataVersion”,
“type”,
“NUMERIC”
],
[
“DataCacheItem_ExpiryTimeInTicks”,
“type”,
“NUMERIC”
]
],
“num_docs”,
“505546”,
“max_doc_id”,
“410592033”,
“num_terms”,
“276755”,
“num_records”,
“217571”,
“inverted_sz_mb”,
“224.60232543945312”,
“total_inverted_index_blocks”,
“4823455”,
“offset_vectors_sz_mb”,
“195.05136108398438”,
“doc_table_size_mb”,
“1369144.875”,
“sortable_values_size_mb”,
“0”,
“key_table_size_mb”,
“18.392066955566406”,
“records_per_doc_avg”,
“0.43036836385726929”,
“bytes_per_record_avg”,
“1082.4632568359375”,
“offsets_per_term_avg”,
“940.04339599609375”,
“offset_bits_per_record_avg”,
“8”,
“hash_indexing_failures”,
“0”,
“indexing”,
“0”,
“percent_indexed”,
“1”,
“gc_stats”,
[
“bytes_collected”,
“4546889112”,
“total_ms_run”,
“4586730”,
“total_cycles”,
“6662”,
“average_cycle_time_ms”,
“688.49144401080753”,
“last_run_time_ms”,
“892”,
“gc_numeric_trees_missed”,
“1095”,
“gc_blocks_denied”,
“1292661”
],
“cursor_stats”,
[
“global_idle”,
0,
“global_total”,
0,
“index_capacity”,
128,
“index_total”,
0
]
]

Thanks,
Itai

Hello Itai,

I believe we can start see a pattern here how all the stats are growing though the number of documents stays stable.

Can you please share the configuration of RediSearch? You can get it using FT.CONFIG GET *.

Regards,
Ariel,

Sure:

FT.CONFIG GET *
[
[
“EXTLOAD”,
null
],
[
“SAFEMODE”,
“true”
],
[
“CONCURRENT_WRITE_MODE”,
“false”
],
[
“NOGC”,
“false”
],
[
“MINPREFIX”,
“2”
],
[
“FORKGC_SLEEP_BEFORE_EXIT”,
“0”
],
[
“MAXDOCTABLESIZE”,
“1000000”
],
[
“MAXSEARCHRESULTS”,
“1000000”
],
[
“MAXEXPANSIONS”,
“200”
],
[
“MAXPREFIXEXPANSIONS”,
“200”
],
[
“TIMEOUT”,
“500”
],
[
“INDEX_THREADS”,
“8”
],
[
“SEARCH_THREADS”,
“20”
],
[
“FRISOINI”,
null
],
[
“ON_TIMEOUT”,
“return”
],
[
“GCSCANSIZE”,
“100”
],
[
“MIN_PHONETIC_TERM_LEN”,
“3”
],
[
“GC_POLICY”,
“fork”
],
[
“FORK_GC_RUN_INTERVAL”,
“30”
],
[
“FORK_GC_CLEAN_THRESHOLD”,
“100”
],
[
“FORK_GC_RETRY_INTERVAL”,
“5”
],
[
“_MAX_RESULTS_TO_UNSORTED_MODE”,
“1000”
],
[
“UNION_ITERATOR_HEAP”,
“20”
],
[
“CURSOR_MAX_IDLE”,
“300000”
],
[
“NO_MEM_POOLS”,
“false”
],
[
“PARTIAL_INDEXED_DOCS”,
“false”
],
[
“UPGRADE_INDEX”,
“Upgrade config for upgrading”
]
]

Good evening Itai,

It looks like you are using forkGC with the latest parameters (which is good).

Can you change FORK_GC_RUN_INTERVAL to 1. It was meant to reduce GC cycles, but in this case, we probably want as much garbage collection as we can get.

Can you also try and set MAXDOCTABLESIZE to 500,000 or even 1,000,000. This might reduce access time to the DocTable and will speed up the GC. Please note, you will need to recreate the index for this change to take place.

I hope this helps. Please keep sending the stats so we can track them.

Cheers,

Ariel

Hi Ariel,

This morning I’ve set FORK_GC_RUN_INTERVAL to 5 seconds and I’ll monitor it this week.
In case memory still rise I’ll set it to 1s and continue monitoring.
As for MAXDOCTABLESIZE - the current configuration is already 1,000,000.
I didn’t change this setting for now, but let me know if you think we should increase/decrease it.

Thanks,
Itai

Hi Ariel,

Regarding your answer about GC stats:

"gc_blocks_denied - counts times when the last block on the inverted index was updated while the GC was running and therefore we don’t clean during the current iteration."

Does it mean that when GC is blocked - this data won’t be clean at all, or it should be cleaned on the following iteration?

Thanks,
Itai

@itai it will be cleaned in the next iteration, it was just missed in the current iteration. A hight value here as well as the numeric_ranges_missed is an indication of a high insertions/updates rate.

Hi,

After setting the FORK_GC_RUN_INTERVAL to 5 seconds at our staging environment, the GC stats grow faster compared to our production (where this flag is set to 30 seconds).

Memory on the other hand looks more stable at the moment on our staging environment:
Production is using 4.3 GB while stating use 3.5 GB

Staging last night:

FT.INFO DataCacheItemIndex
[
“index_name”,
“DataCacheItemIndex”,
“index_options”,
,
“index_definition”,
[
“key_type”,
“HASH”,
“prefixes”,
[
“”
],
“language_field”,
“__language”,
“default_score”,
“1”,
“score_field”,
“__score”,
“payload_field”,
“__payload”
],
“fields”,
[
[
“DataCacheItem_SourcedKey”,
“type”,
“TEXT”,
“WEIGHT”,
“1”
],
[
“DataCacheItem_DataVersion”,
“type”,
“NUMERIC”
],
[
“DataCacheItem_ExpiryTimeInTicks”,
“type”,
“NUMERIC”
]
],
“num_docs”,
“558115”,
“max_doc_id”,
“3681083”,
“num_terms”,
“174292”,
“num_records”,
“396682”,
“inverted_sz_mb”,
“3.5131597518920898”,
“total_inverted_index_blocks”,
“613870”,
“offset_vectors_sz_mb”,
“0.45287799835205078”,
“doc_table_size_mb”,
“15060.4609375”,
“sortable_values_size_mb”,
“0”,
“key_table_size_mb”,
“20.220046997070312”,
“records_per_doc_avg”,
“0.71075314283370972”,
“bytes_per_record_avg”,
“9.2865695953369141”,
“offsets_per_term_avg”,
“1.1971225738525391”,
“offset_bits_per_record_avg”,
“8”,
“hash_indexing_failures”,
“0”,
“indexing”,
“0”,
“percent_indexed”,
“1”,
“gc_stats”,
[
“bytes_collected”,
“4757564”,
“total_ms_run”,
“231163”,
“total_cycles”,
“222”,
“average_cycle_time_ms”,
“1041.2747747747749”,
“last_run_time_ms”,
“852”,
“gc_numeric_trees_missed”,
“3”,
“gc_blocks_denied”,
“1321”
],
“cursor_stats”,
[
“global_idle”,
0,
“global_total”,
0,
“index_capacity”,
128,
“index_total”,
0
]
]

Staging this morning:

FT.INFO DataCacheItemIndex
[
“index_name”,
“DataCacheItemIndex”,
“index_options”,
,
“index_definition”,
[
“key_type”,
“HASH”,
“prefixes”,
[
“”
],
“language_field”,
“__language”,
“default_score”,
“1”,
“score_field”,
“__score”,
“payload_field”,
“__payload”
],
“fields”,
[
[
“DataCacheItem_SourcedKey”,
“type”,
“TEXT”,
“WEIGHT”,
“1”
],
[
“DataCacheItem_DataVersion”,
“type”,
“NUMERIC”
],
[
“DataCacheItem_ExpiryTimeInTicks”,
“type”,
“NUMERIC”
]
],
“num_docs”,
“556686”,
“max_doc_id”,
“24476567”,
“num_terms”,
“227051”,
“num_records”,
“424578”,
“inverted_sz_mb”,
“14.594446182250977”,
“total_inverted_index_blocks”,
“1032454”,
“offset_vectors_sz_mb”,
“9.8219995498657227”,
“doc_table_size_mb”,
“135681.46875”,
“sortable_values_size_mb”,
“0”,
“key_table_size_mb”,
“20.173484802246094”,
“records_per_doc_avg”,
“0.76268845796585083”,
“bytes_per_record_avg”,
“36.043754577636719”,
“offsets_per_term_avg”,
“24.257293701171875”,
“offset_bits_per_record_avg”,
“8”,
“hash_indexing_failures”,
“0”,
“indexing”,
“0”,
“percent_indexed”,
“1”,
“gc_stats”,
[
“bytes_collected”,
“263942648”,
“total_ms_run”,
“2207817”,
“total_cycles”,
“1804”,
“average_cycle_time_ms”,
“1223.8453436807094”,
“last_run_time_ms”,
“1004”,
“gc_numeric_trees_missed”,
“234”,
“gc_blocks_denied”,
“247763”
],
“cursor_stats”,
[
“global_idle”,
0,
“global_total”,
0,
“index_capacity”,
128,
“index_total”,
0
]
]

Production last night:

FT.INFO DataCacheItemIndex
[
“index_name”,
“DataCacheItemIndex”,
“index_options”,
,
“index_definition”,
[
“key_type”,
“HASH”,
“prefixes”,
[
“”
],
“language_field”,
“__language”,
“default_score”,
“1”,
“score_field”,
“__score”,
“payload_field”,
“__payload”
],
“fields”,
[
[
“DataCacheItem_SourcedKey”,
“type”,
“TEXT”,
“WEIGHT”,
“1”
],
[
“DataCacheItem_DataVersion”,
“type”,
“NUMERIC”
],
[
“DataCacheItem_ExpiryTimeInTicks”,
“type”,
“NUMERIC”
]
],
“num_docs”,
“518435”,
“max_doc_id”,
“8161649”,
“num_terms”,
“141457”,
“num_records”,
“165898”,
“inverted_sz_mb”,
“1.6361274719238281”,
“total_inverted_index_blocks”,
“635950”,
“offset_vectors_sz_mb”,
“1.9572124481201172”,
“doc_table_size_mb”,
“67579.5390625”,
“sortable_values_size_mb”,
“0”,
“key_table_size_mb”,
“18.742328643798828”,
“records_per_doc_avg”,
“0.31999769806861877”,
“bytes_per_record_avg”,
“10.341318130493164”,
“offsets_per_term_avg”,
“12.370770454406738”,
“offset_bits_per_record_avg”,
“8”,
“hash_indexing_failures”,
“0”,
“indexing”,
“0”,
“percent_indexed”,
“1”,
“gc_stats”,
[
“bytes_collected”,
“39886816”,
“total_ms_run”,
“1010300”,
“total_cycles”,
“1535”,
“average_cycle_time_ms”,
“658.17589576547232”,
“last_run_time_ms”,
“636”,
“gc_numeric_trees_missed”,
“15”,
“gc_blocks_denied”,
“20892”
],
“cursor_stats”,
[
“global_idle”,
0,
“global_total”,
0,
“index_capacity”,
128,
“index_total”,
0
]
]

Production this morning:

FT.INFO DataCacheItemIndex
[
“index_name”,
“DataCacheItemIndex”,
“index_options”,
,
“index_definition”,
[
“key_type”,
“HASH”,
“prefixes”,
[
“”
],
“language_field”,
“__language”,
“default_score”,
“1”,
“score_field”,
“__score”,
“payload_field”,
“__payload”
],
“fields”,
[
[
“DataCacheItem_SourcedKey”,
“type”,
“TEXT”,
“WEIGHT”,
“1”
],
[
“DataCacheItem_DataVersion”,
“type”,
“NUMERIC”
],
[
“DataCacheItem_ExpiryTimeInTicks”,
“type”,
“NUMERIC”
]
],
“num_docs”,
“519546”,
“max_doc_id”,
“59516368”,
“num_terms”,
“181467”,
“num_records”,
“289683”,
“inverted_sz_mb”,
“16.049144744873047”,
“total_inverted_index_blocks”,
“1140229”,
“offset_vectors_sz_mb”,
“21.415153503417969”,
“doc_table_size_mb”,
“257534.609375”,
“sortable_values_size_mb”,
“0”,
“key_table_size_mb”,
“18.788341522216797”,
“records_per_doc_avg”,
“0.55756950378417969”,
“bytes_per_record_avg”,
“58.093666076660156”,
“offsets_per_term_avg”,
“77.517204284667969”,
“offset_bits_per_record_avg”,
“8”,
“hash_indexing_failures”,
“0”,
“indexing”,
“0”,
“percent_indexed”,
“1”,
“gc_stats”,
[
“bytes_collected”,
“573642128”,
“total_ms_run”,
“1949032”,
“total_cycles”,
“2701”,
“average_cycle_time_ms”,
“721.59644576082928”,
“last_run_time_ms”,
“943”,
“gc_numeric_trees_missed”,
“145”,
“gc_blocks_denied”,
“211028”
],
“cursor_stats”,
[
“global_idle”,
0,
“global_total”,
0,
“index_capacity”,
128,
“index_total”,
0
]
]

Any idea?

@itai seems like you have a lot more traffic on production than on staging, you can tell that by the max_doc_id differences (which indicate the number of documents updated). It not related to the FORK_GC_RUN_INTERVAL value.

Thanks for the feedback.
Part of this difference between production to staging can be also explained by the fact that Staging was restarted about 10 hours after production.
This morning Staging memory usage reached 4.3 GB.
Here are staging stats from this morning:

FT.INFO DataCacheItemIndex
[
“index_name”,
“DataCacheItemIndex”,
“index_options”,
,
“index_definition”,
[
“key_type”,
“HASH”,
“prefixes”,
[
“”
],
“language_field”,
“__language”,
“default_score”,
“1”,
“score_field”,
“__score”,
“payload_field”,
“__payload”
],
“fields”,
[
[
“DataCacheItem_SourcedKey”,
“type”,
“TEXT”,
“WEIGHT”,
“1”
],
[
“DataCacheItem_DataVersion”,
“type”,
“NUMERIC”
],
[
“DataCacheItem_ExpiryTimeInTicks”,
“type”,
“NUMERIC”
]
],
“num_docs”,
“549522”,
“max_doc_id”,
“143807676”,
“num_terms”,
“323772”,
“num_records”,
“399783”,
“inverted_sz_mb”,
“220.51103210449219”,
“total_inverted_index_blocks”,
“3182577”,
“offset_vectors_sz_mb”,
“91.591178894042969”,
“doc_table_size_mb”,
“588547.625”,
“sortable_values_size_mb”,
“0”,
“key_table_size_mb”,
“19.922290802001953”,
“records_per_doc_avg”,
“0.72751045227050781”,
“bytes_per_record_avg”,
“578.37017822265625”,
“offsets_per_term_avg”,
“240.23110961914062”,
“offset_bits_per_record_avg”,
“8”,
“hash_indexing_failures”,
“0”,
“indexing”,
“0”,
“percent_indexed”,
“1”,
“gc_stats”,
[
“bytes_collected”,
“1844751981”,
“total_ms_run”,
“11296983”,
“total_cycles”,
“8495”,
“average_cycle_time_ms”,
“1329.8390818128312”,
“last_run_time_ms”,
“1048”,
“gc_numeric_trees_missed”,
“3121”,
“gc_blocks_denied”,
“2244941”
],
“cursor_stats”,
[
“global_idle”,
0,
“global_total”,
0,
“index_capacity”,
128,
“index_total”,
0
]
]

Seems that the RUN interval change didn’t stop the memory usage growth