Redis Graph Memory Issues

Hello!

I’ve been experiencing memory problems running redis graph for long periods of time.

What tools can I use to diagnose and try to find the source of this?

The amount of data in the graph is not changing much (although I am deleting + creating ALOT every few minutes). However, I see the memory used by redis-server increase from 200MB → 8GB over 2 days steadily.

Thanks,
Dan

I am also using redisgraph-bulk-loader. I am quickly deleting and then creating graphs using redisgraph-bulk-loader. When we call GRAPH.DELETE does that delete the graph entirely or does it queue an operation to do that?

Hi @dsolnik,

There is no RedisGraph-specific tooling for introspecting on memory usage; I personally use Valgrind.

In some areas, deleting entities from a graph bookmarks space for reuse instead of freeing it outright, so if your deletions far outpace your creations, you might see increased memory consumption! I doubt that would account for a 7GB increase, however.

By default, GRAPH.DELETE asynchronously marks a graph for deletion, so it’s possible that there are leaks in a workflow that calls GRAPH.DELETE then overwrites the graph key with the bulk loader. This behavior can be overwritten by building RedisGraph with the command make clean && make memcheck. If successful, the server log should include a line like:

156483:M 01 Feb 2021 13:13:51.022 * <graph> Graph deletion will be done synchronously.

But this approach is only recommended for debugging.

That is exactly my workflow.

I’ll try and not reuse keys and see if that fixes things.

Thanks.

Thank you so much for your help!

After no longer reusing the key, things are looking good! I’ll let you know in a day or 2 if I’m still leaking.

Is there a way to delete a Graph synchronously that is recommended for production?

I’m glad to hear that it worked!

If you open an issue on the GitHub, we should be able to quickly expose synchronous graph deletion as a launch-time configuration.

Great!

I made a new issue for this:

Thank you for all your help!

I used synchronous graph deletions and then ran valgrind and I think there might be a leak in GraphBlas Matrix resize? Or maybe this is how it’s supposed to run and I’m misunderstanding.

Although the blocks are still reachable, every bulk insert results in a larger used memory (I saw this by running INFO in redis-cli).

I ran redis with valgrind (with synchronous graph deletions) and ran my program. I stopped after a few minutes and looked at the log there were alot of entries that referred to GrB_Matrix_resize:

==1061== 1,824 bytes in 6 blocks are possibly lost in loss record 226 of 233
==1061==    at 0x483DD99: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==1061==    by 0x40149CA: allocate_dtv (dl-tls.c:286)
==1061==    by 0x40149CA: _dl_allocate_tls (dl-tls.c:532)
==1061==    by 0x49AF322: allocate_stack (allocatestack.c:622)
==1061==    by 0x49AF322: pthread_create@@GLIBC_2.2.5 (pthread_create.c:660)
==1061==    by 0x702EDDA: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==1061==    by 0x70268E0: GOMP_parallel (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==1061==    by 0x5CF5399: GB_convert_hyper_to_sparse (GB_convert_hyper_to_sparse.c:93)
==1061==    by 0x5CF2962: GB_sparse_or_bitmap (GB_conform.c:72)
==1061==    by 0x5CF2962: GB_sparse_or_bitmap (GB_conform.c:53)
==1061==    by 0x5CF2962: GB_conform (GB_conform.c:357)
==1061==    by 0x5C4B6B6: GB_Matrix_wait (GB_Matrix_wait.c:239)
==1061==    by 0x5D0E788: GB_resize (GB_resize.c:77)
==1061==    by 0x5C38BC6: GrB_Matrix_resize (GrB_Matrix_resize.c:32)
==1061==    by 0x5C38BC6: GrB_Matrix_resize (GrB_Matrix_resize.c:12)
==1061==    by 0x5B5E4C7: _MatrixSynchronize (graph.c:252)
==1061==    by 0x5B61EDC: Graph_GetLabelMatrix (graph.c:1310)

Is this a bug or is this expected?

I can give you the complete valgrind log if that would help further.

To my knowledge, there are no leaks pertaining to GrB_Matrix_resize. GrB_Matrix_resize is called by RedisGraph every time a matrix is retrieved and its dimensions don’t conform to the expectation (X and Y axes being as long as the graph’s node count). This consistency is required to perform traversals with multiplications.

This memory does not get freed until the matrices themselves are freed, which typically means when the entire graph is deleted.

Let me know if I misunderstood your question!

When I stopped redis and got this log every Graph had been deleted synchronously.

Redis Graph still held references (that were possibly lost according to valgrind) to memory allocated by GnB resize even when all the graphs were deleted.

From this summary, many of the possibly lost bytes are allocated by GrB_Matrix_resize.
To be precise, 7904 bytes on Thread 1, which I assume is the Redis main thread where redisgraph-bulk-loader runs.

==1061== LEAK SUMMARY:
==1061==    definitely lost: 0 bytes in 0 blocks
==1061==    indirectly lost: 0 bytes in 0 blocks
==1061==      possibly lost: 20,976 bytes in 69 blocks
==1061==    still reachable: 57,610 bytes in 515 blocks
==1061==         suppressed: 0 bytes in 0 blocks
==1061== Reachable blocks (those to which a pointer was found) are not shown.
==1061== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==1061== 
==1061== For lists of detected and suppressed errors, rerun with: -s
==1061== ERROR SUMMARY: 58 errors from 18 contexts (suppressed: 0 from 0)

@dsolnik Can you provide sample data and/or scripts to model your workflow? I’m having trouble reproducing these issues locally.

I’ll create some similar data to what I use later this week: I’ll provide the csvs and a redisgraph-bulk-loader command.

My workflow is a loop:

  1. Get updated data (this usually doesn’t change much between iterations)
  2. upload all the data to a Graph A using redisgraph-bulk-loader.
  3. Delete the Graph A using Graph.DELETE.
  4. Go back to 1.

For now, I’m just going to restart Redis every time I delete the graph.