I had a question about a usecase we had in my project.
We had to read a list of ids from 2 csvs - opted-in_ids.csv , available_ids.csv. These are pushed into s3 by another team.
In our site, at a time, we need to check if a particular id is in opted in list or/and available list. ie, check if these ids are present in either of those 2 csvs. For this we just used 2 keys, OPTED_IN, AVAILABLE and just pushed the array of the entries from csvs as a Stringified Json in our lambda code.
So, in client code, we will read this and parse it and search for the id in the array list. The array list is like millions of records.
Is there a better way for this? Are these better stored as sets? SADD with the arraylist and use SISMEMBER on client side?
We did start seeing performance issues when the list got very big.
If we were to move this to SETS, what would be the best way to load the data into sets from the csv(as it contains millions of records). We might need to go for a batch behavior to not bring down the db.
This list is updated once in every 2 weeks or so. SO loading of SET might have to be run multiple times.
Any inputs will be appreciated