Most efficient way of importing data

I’ve created a script which loads 3000 objects in Redis via RediSearch FT.ADD command. It’s a loop so it adds them one by one.
3000 objects takes approx. 6 seconds to import.

Adding them into Redis first and then adding the existing hashkey into RediSearch via the FT.ADDHASH command seems to be a lot slower.

I am wondering if there are more efficient ways of doing this? Is there a batch import available?

What is the programming language are you using? and which library?

In redis-py you can use pipeline to send the ft.add commands without waiting for reply, this should speed up the insertion time (see the pipeline section here: https://github.com/andymccurdy/redis-py)

Let me know if it helps.

I’m using C# NRediSearch library which is a 1-on-1 port from the JRediSearch lib. I hope I’m wrong, but I don’t see the pipeline implementation there.

I am not familiar with this library (although I plane to check it in the nearest future).
From short exploration in the code I do see that there is a ‘AddDocumentAsync’ function (https://github.com/StackExchange/StackExchange.Redis/blob/master/src/NRediSearch/Client.cs).
This might be exactly what you need.

What Meir said is on the right track. The one thing you might run into with .net and NRediSearch (which uses StackExchange Redis IIRC) is the multiplexing architecture of the library (https://stackexchange.github.io/StackExchange.Redis/PipelinesMultiplexers#multiplexing).

I found with node_redis and RediSearch (which uses a similar architecture but not as strictly) that the very long commands of FT.ADD can be over-pipelined (I saw 100k unresolved pipeline commands at times, as an example) and, in some cases, can saturate the event loop in Redis. For my solution, I ended up opening a high number of connections (100) and liberally using pipelining was very quick - 12x over using a single connection with much smaller number of unresolved pipeline commands (in the 100s usually).

Hope this helps!

Kyle, isn't there any way to decide the pipeline size? If no then its a very good feature to add to the library. I will try to get to it and add it.