Most efficient way of importing data

I’ve created a script which loads 3000 objects in Redis via RediSearch FT.ADD command. It’s a loop so it adds them one by one.
3000 objects takes approx. 6 seconds to import.

Adding them into Redis first and then adding the existing hashkey into RediSearch via the FT.ADDHASH command seems to be a lot slower.

I am wondering if there are more efficient ways of doing this? Is there a batch import available?

What is the programming language are you using? and which library?

In redis-py you can use pipeline to send the ft.add commands without waiting for reply, this should speed up the insertion time (see the pipeline section here: https://github.com/andymccurdy/redis-py)

Let me know if it helps.

I’m using C# NRediSearch library which is a 1-on-1 port from the JRediSearch lib. I hope I’m wrong, but I don’t see the pipeline implementation there.

I am not familiar with this library (although I plane to check it in the nearest future).
From short exploration in the code I do see that there is a ‘AddDocumentAsync’ function (https://github.com/StackExchange/StackExchange.Redis/blob/master/src/NRediSearch/Client.cs).
This might be exactly what you need.

What Meir said is on the right track. The one thing you might run into with .net and NRediSearch (which uses StackExchange Redis IIRC) is the multiplexing architecture of the library (https://stackexchange.github.io/StackExchange.Redis/PipelinesMultiplexers#multiplexing).

I found with node_redis and RediSearch (which uses a similar architecture but not as strictly) that the very long commands of FT.ADD can be over-pipelined (I saw 100k unresolved pipeline commands at times, as an example) and, in some cases, can saturate the event loop in Redis. For my solution, I ended up opening a high number of connections (100) and liberally using pipelining was very quick - 12x over using a single connection with much smaller number of unresolved pipeline commands (in the 100s usually).

Hope this helps!

Kyle, isn't there any way to decide the pipeline size? If no then its a very good feature to add to the library. I will try to get to it and add it.

A special-case of pipelining is when we expressly don’t care about the response from a particular operation, which allows our code to continue immediately while the enqueued operation proceeds in the background. Often, this means that we can put concurrent work on the connection from a single caller. This is achieved using the flags parameter:

// sliding expiration
db.KeyExpire(key, TimeSpan.FromMinutes(5), flags: CommandFlags.FireAndForget);
var value = (string)db.StringGet(key);

The FireAndForget flag causes the client library to queue the work as normal, but immediately return a default value (since KeyExpire returns a bool , this will return false , because default(bool) is false - however the return value is meaningless and should be ignored). This works for *Async methods too: an already-completed Task<T> is returned with the default value (or an already-completed Task is returned for void methods).

Pipeline size can be programmatically setup from your application (not from the redis library) although it’s not necessarily will speed up the execution. For example if you have 3000 commands, you can theoretically run a batch of 100 pipelined commands, which will result in 30 batches.