RedisGears on Redis Cluster and SETS (sismember and SADD)

AlexMikhalev · January 8, 2021, 4:52pm

I want to make sure I am not re-processing sentences when I do nlp pipeline because I calculate rank on next step.
Example code:

def process_item(record):
    shard_id=hashtag()
    log(f"Matcher received {record['key']} and my {shard_id}")
    for each_key in record['value']:
        sentence_key=record['key']+f':{each_key}'
        tokens=set(record['value'][each_key].split(' '))
        processed=execute('SISMEMBER','processed_docs_stage3{%s}' % hashtag(),sentence_key)
        log(f"Matcher thinks {processed}")
        log("Matcher: length of tokens " + str(len(tokens)))
        if not processed:
            execute('SADD','processed_docs_stage3{%s}' % hashtag(),sentence_key)
        else:
            log(f"Alteady processed {sentence_key}")


bg = GearsBuilder('KeysReader')
bg.foreach(process_item)
bg.count()
bg.run('sentence:*',  mode="async_local")

If I run it twice it will produce desired behaviour in logs:
Matcher Alteady processed sentence:PMC5539802.xml:{06S}:53
Is it consistent behaviour? In other words the same hset keys will be allocated to the same shards so Sets behaviour even in cluster configuration will be valid? Obviously if I remove hashtag checks will fail with permission error.
I want to make sure I am not re-processing sentences regardless of gears to shards allocation.

meirsh · January 8, 2021, 5:28pm

@AlexMikhalev this will work as long as you do not reshard? After reshard you will have to process again the sentenses and recreate the sets (or tolarate reprocessing of the same sentence at least one more time aftet reshard)

AlexMikhalev · January 8, 2021, 7:42pm

Thank you @meirsh, can you help me understand visibility of cluster records in shards:
Let’s say I want to be able to control debug variable via redis cluster in each function I want to add

  debug=execute('GET','debug{%s}'% hashtag()) 
  if debug:
        log(f"Message")

If I than run redis-cli -p 9001 -h 127.0.0.1 set debug 1, debug variable still will be None.
I remember seeing some clever hack on github proposed by @itamarhaber ?? where it will remap global variable to shards. Or is any other way doing it?

meirsh · January 9, 2021, 11:54am

@AlexMikhalev specifically for debugging I would use the Redis log level and add level=‘debug’ to the log function (https://oss.redislabs.com/redisgears/runtime.html#log):

log(f"message", level='debug')

But generally, you can use command reader to change the debug{hashtag} variable on all the shards:

GB(‘CommandReader’).foreach(lambda x: execute(set, ‘debug{%s}’ % hashtag(), x[1])).register(trigger=‘DebugMode’)

Then do the following to enable debug mode:
> rg.trigger DebugMode 1

AlexMikhalev · January 10, 2021, 10:18am

Thank you @meirsh, my logic for handcrafting debug mod was that I don’t want to restart cluster when I want to change log level. Is it achievable with log and level=‘debug’? My understanding it will require change in redis.conf and restart of all nodes.
I am using similar to second option - except I use ‘ShardsIDReader’ instead of command.

meirsh · January 10, 2021, 10:23am

You can do config set loglevel debug on each shard or again use gears with ShardsIDReader to perform this command on all the shards with a single line.

Topic		Replies	Views
Re partitioning strategy for Redis Gears RedisGears	4	934	May 22, 2020
Redis Gears cron/ scheduler RedisGears	2	2249	October 30, 2020
Best way to debug Stream Reader RedisGears	4	932	June 5, 2020
Transaction across multiple master nodes RedisGears	6	1135	June 10, 2020
Is the Reader distributed? If yes how does that work? RedisGears	7	725	March 19, 2020

RedisGears on Redis Cluster and SETS (sismember and SADD)

Related Topics