Greetings…
I’m not an expert, but I’ve accomplished what I’ve wanted with RediSearch… Maybe I can point you in the right direction…
-
The first thing to do (if you haven’t already) is to get a database GUI working so that you can see what’s going on in the database as you learn Redis and RediSearch… I use Redis Desktop, but Redis has FREE one which I haven’t tried, but sounds fine: https://redislabs.com/redisinsight/
-
I would also recommend working with a small data set until you get everything tweaked…
-
A CLI adds and extra layer of complexity to the whole process because many of the CLIs don’t support all of the nuances of the commands… Thus your control over things is not so granular… I used the PHPRedis which does NOT support the RediSearch commands, but has a “rawcommand” method that lets you execute a raw command via bash shell (or something like that)… So, basically, I would suggest getting everything running with just a few records using pure command line and THEN figure out how to replicate that in Python and then in Python in bulk for your million records…
-
Your index looks too simple… My data set had several fields that I addressed one by one and tried to specify for every field whether it should be indexed, the type of data, and its weight. My raw command was a mile long. I don’t have a copy of the one I wrote for the command line, but via my CLI, it looked like this:
$result = $redis->rawCommand(“FT.CREATE”, “socialPostsIDX”, “NOHL”, “SCHEMA”, “modified”, “NUMERIC”, “SORTABLE”, “NOINDEX”, “title”, “TEXT”, “WEIGHT”, “5.0”, “content”, “TEXT”, “WEIGHT”, “3.0”, “location”, “TEXT”, “WEIGHT”, “2.0”, “author”, “TEXT”, “NOSTEM”, “WEIGHT”, “1.0”, “authid”, “NUMERIC”, “NOINDEX”, “authimg”, “TEXT”, “NOINDEX”, “authreg”, “NUMERIC”, “NOINDEX”, “authurl”, “TEXT”, “NOINDEX”, “imgbase”, “TEXT”, “NOINDEX”, “created”, “NUMERIC”, “NOINDEX”, “permalink”, “TEXT”, “NOINDEX”, “size”, “TEXT”, “NOINDEX”, “vidurl”, “TEXT”, “NOINDEX”, “comments”, “NUMERIC”, “NOINDEX”, “views”, “NUMERIC”, “NOINDEX”, “revive”, “NUMERIC”, “NOINDEX”, “dataextra”, “TEXT”, “NOINDEX”);
If you just want to lump your JSON together into one field maybe specifying the data type “TEXT” would help when you create the index?? Or maybe splitting your JSON up into fields is the way to go, especially if not all will need to be searched. I had a lot of “noindex” fields in my data set, because I decided to store the entire doc in RediSearch. That’s sort of a strategy decision… You can index a few searchable fields including a doc uuid and pull the doc from elsewhere based on a search return OR you can just put the whole doc in RediSearch if you’ve got the resources to do it
- Once you get your index setup correctly and you can add a doc against that index and search for it, then you can try to figure out how to replicate things in your CLI and in bulk… Make sure you read up on pipelining… It’s the only way to go when adding a million records… https://redis.io/topics/pipelining I was able to do this via PHP and it saved the day…
Good luck,
Alec