Using REPLACE: sortable_values_size_mb and doc_table_size_db growing until maxmemory is hit.

Thank you! If I remove the geo tag from the schema, it doesn’t seem to have much effect on creeping memory usage while upserting, and the memory usage after flushing is still high. I also tried using the fork GC on 1.4.3, but it similarly seems to not resolve the issue.

So I’ve run your script (I modified the random range from 1,500k to 1,10k — my laptop’s memory is quite limited;). Anyway, with a total of 10k docs (and a max docid of 20k), i couldn’t find any leaks at all; and memory was exactly the same regardless of whether it was freshly inserted, after debug reload, or restarting the server.

The test was performed on the master branch, but I am going to assume that 1.4.3 (or lower) is being used, as the master codebase is yet to be released.

Mark Nunberg | Senior Software Engineer
Redis Labs - home of Redis

Email: mark@redislabs.com

And… I could not reproduce this using 1.4.2 either.

Mark Nunberg | Senior Software Engineer
Redis Labs - home of Redis

Email: mark@redislabs.com

Bah, hit reply instead of reply all. Anyway for posterity:

Ok, so the runtime leak that occurs after FLUSHALL in 1.4.2 appears to be with the GC fork only, not when I run the image defaults:

docker run -d -p 6379:6379 redislabs/redisearch:1.4.2 --loadmodule /usr/lib/redis/modules/redisearch.so GC_POLICY FORK

1.4.3 also exhibits this, though.

Regardless – whether using fork or default, using the script that I sent over, INFO MEMORY shows me that used memory keeps climbing even after the 10k documents have been inserted. Are you not seeing that as well?

After maybe 10 minutes running (and at 10k docs):

Memory

used_memory:326841592

used_memory_human:311.70M

used_memory_rss:509292544

used_memory_rss_human:485.70M

used_memory_peak:326841592

used_memory_peak_human:311.70M

used_memory_peak_perc:100.02%

used_memory_overhead:76585484

used_memory_startup:790976

used_memory_dataset:250256108

used_memory_dataset_perc:76.75%

A couple minutes later:

Memory

used_memory:384145560

used_memory_human:366.35M

used_memory_rss:607907840

used_memory_rss_human:579.75M

used_memory_peak:384145560

used_memory_peak_human:366.35M

used_memory_peak_perc:100.02%

used_memory_overhead:90984484

used_memory_startup:790976

used_memory_dataset:293161076

used_memory_dataset_perc:76.47%

I do not understand how did you end up with 10K docs if your random on the docid in between 0 - 500K?

In addition you are using random() which are generating a lot of different terms (on search engine the assumption is that the number of term is smaller then the amount of data), for each term we create a key which takes memory (and I assume this is part of the memory growth).

So after modifying the script (change random() to randint(0,10000), decrease the docid range to 10000, and remove the geo field), I checked it on the current master (with FORKGC) and memory seems stable. On 1.4.3 though the memory kept growing so I agree there is a memory leak there which we need to investigate.

One thing you should notice, I run redisearch with MAXDOCTABLESIZE 10000 cause I knew there is going to be around 10K docs. The default value of the doctable is 1M so you might see memory increasing till max_doc_id will reach 1M and then the memory should stop increasing.

I should have mentioned - before posting that message I changed the script the same way Mark did just to double check - changed the random range from 0-500k to 0-10k.

I am indeed updating values often, but it’s a good point about random() - I’m not updating values to something new every single time, with the exception of the geofield, that’s an error in the script. As I mentioned when I reboot the instance and the rdb reloads, memory usage is considerably reduced, and then starts climbing again, again using 1.4.2 with GC FORK. This makes sense if there is a leak in GC FORK. It also makes sense that this would be happening with GC DEFAULT if there was a leak with geofield updating, which does update pretty much all the time. I have not tested this with the current master - I will make time to give that a go today. What I can also do is change back to GC DEFAULT, remove the geofield, and see if memory usage is still reduced after reboot using 1.4.2. If that fixes it, then it was very possibly the leak with geofield that made me turn to GC FORK which also had a leak.

I will report back with what I find.

I can confirm that memory utilization is stable in both 1.4.2 and 1.4.3 with no geofield, DEFAULT GC, and limited terms. Of course if the terms keep growing, one can expect the DB to grow. I can also confirm stable utilization when running master with FORKGC - after a DEBUG RELOAD, a few megabytes are recovered, but it seemed stable before the reload, and it was rock stable afterward. Setting the MAXDOCTABLESIZE seems to make a very marginal difference in memory usage, but that’s just a couple MB, and otherwise both ways seem stable.

At this time, my inclination is to think that the geo index issue, potentially along with an inability of DEFAULT GC to keep up was the initial source of the problem. This was then exacerbated by using the FORKGC. I think it likely that if the geo index gets cleaned out on update, my issue will be resolved. Thank you guys for helping out with this.