What is the best way to minimise memory overhead (RediSearch) when inserting a lot of documents?
Our RediSearch DB is just one index with around 5MLN documents.
It looks like (various tests) that to insert all 5 MLN docs we have to spin-up larger (more memory) node that we need to run Redis for normal read/search operations.
The DB is populated only once and serves as a read-only data. However the data needs to be updated every month. So ideally we need to run the smallest node possible to save the cost.
We are updating/inserting documents by using redis protocol and piping, eg:
cat data.txt | redis-cli --pipe.
RediSearch works fine with 5GB Ram for read/search operation on entire 5MLN docs. However to successfully populate that 5MLN we need to use node with at lest 8GB of RAM.
We are already looking to expand our DB to around 15GB so I assume the overhead will be even larger?
We are running (mostly default settings):
- RediSearch version 1.6.13 (Git=v1.6.13)
concurrent writes: OFF, gc: ON, prefix min length: 2, prefix max expansions: 200, query timeout (ms): 500, timeout policy: return, cursor read size: 1000, cursor max idle (ms): 300000, max doctable size: 20000000, search pool size: 20, index pool size: 8,
--loadmodule /usr/lib/redis/modules/redisearch.so MAXDOCTABLESIZE 20000000 GC_POLICY FORK FORK_GC_CLEAN_THRESHOLD 10000
redis.conf: |- save 900 1 save 300 10 save 60 1000 dir /data dbfilename master.rdb rdbchecksum yes
We are running RediSearch on k8s on GCP.