I’ve created a script which loads 3000 objects in Redis via RediSearch FT.ADD command. It’s a loop so it adds them one by one.
3000 objects takes approx. 6 seconds to import.
Adding them into Redis first and then adding the existing hashkey into RediSearch via the FT.ADDHASH command seems to be a lot slower.
I am wondering if there are more efficient ways of doing this? Is there a batch import available?
What is the programming language are you using? and which library?
In redis-py you can use pipeline to send the ft.add commands without waiting for reply, this should speed up the insertion time (see the pipeline section here: https://github.com/andymccurdy/redis-py)
I found with node_redis and RediSearch (which uses a similar architecture but not as strictly) that the very long commands of FT.ADD can be over-pipelined (I saw 100k unresolved pipeline commands at times, as an example) and, in some cases, can saturate the event loop in Redis. For my solution, I ended up opening a high number of connections (100) and liberally using pipelining was very quick - 12x over using a single connection with much smaller number of unresolved pipeline commands (in the 100s usually).
A special-case of pipelining is when we expressly don’t care about the response from a particular operation, which allows our code to continue immediately while the enqueued operation proceeds in the background. Often, this means that we can put concurrent work on the connection from a single caller. This is achieved using the flags parameter:
// sliding expiration
db.KeyExpire(key, TimeSpan.FromMinutes(5), flags: CommandFlags.FireAndForget);
var value = (string)db.StringGet(key);
The FireAndForget flag causes the client library to queue the work as normal, but immediately return a default value (since KeyExpire returns a bool , this will return false , because default(bool) is false - however the return value is meaningless and should be ignored). This works for *Async methods too: an already-completed Task<T> is returned with the default value (or an already-completed Task is returned for void methods).
Pipeline size can be programmatically setup from your application (not from the redis library) although it’s not necessarily will speed up the execution. For example if you have 3000 commands, you can theoretically run a batch of 100 pipelined commands, which will result in 30 batches.