RedisGears on Redis Cluster and SETS (sismember and SADD)

I want to make sure I am not re-processing sentences when I do nlp pipeline because I calculate rank on next step.
Example code:

def process_item(record):
    shard_id=hashtag()
    log(f"Matcher received {record['key']} and my {shard_id}")
    for each_key in record['value']:
        sentence_key=record['key']+f':{each_key}'
        tokens=set(record['value'][each_key].split(' '))
        processed=execute('SISMEMBER','processed_docs_stage3{%s}' % hashtag(),sentence_key)
        log(f"Matcher thinks {processed}")
        log("Matcher: length of tokens " + str(len(tokens)))
        if not processed:
            execute('SADD','processed_docs_stage3{%s}' % hashtag(),sentence_key)
        else:
            log(f"Alteady processed {sentence_key}")


bg = GearsBuilder('KeysReader')
bg.foreach(process_item)
bg.count()
bg.run('sentence:*',  mode="async_local")

If I run it twice it will produce desired behaviour in logs:
Matcher Alteady processed sentence:PMC5539802.xml:{06S}:53
Is it consistent behaviour? In other words the same hset keys will be allocated to the same shards so Sets behaviour even in cluster configuration will be valid? Obviously if I remove hashtag checks will fail with permission error.
I want to make sure I am not re-processing sentences regardless of gears to shards allocation.

@AlexMikhalev this will work as long as you do not reshard? After reshard you will have to process again the sentenses and recreate the sets (or tolarate reprocessing of the same sentence at least one more time aftet reshard)

Thank you @meirsh, can you help me understand visibility of cluster records in shards:
Let’s say I want to be able to control debug variable via redis cluster in each function I want to add

  debug=execute('GET','debug{%s}'% hashtag()) 
  if debug:
        log(f"Message") 

If I than run redis-cli -p 9001 -h 127.0.0.1 set debug 1, debug variable still will be None.
I remember seeing some clever hack on github proposed by @itamarhaber ?? where it will remap global variable to shards. Or is any other way doing it?

@AlexMikhalev specifically for debugging I would use the Redis log level and add level=‘debug’ to the log function (Runtime - RedisGears - Programmable engine for data processing in Redis):

log(f"message", level='debug')

But generally, you can use command reader to change the debug{hashtag} variable on all the shards:

GB(‘CommandReader’).foreach(lambda x: execute(set, ‘debug{%s}’ % hashtag(), x[1])).register(trigger=‘DebugMode’)

Then do the following to enable debug mode:
> rg.trigger DebugMode 1

1 Like

Thank you @meirsh, my logic for handcrafting debug mod was that I don’t want to restart cluster when I want to change log level. Is it achievable with log and level=‘debug’? My understanding it will require change in redis.conf and restart of all nodes.
I am using similar to second option - except I use ‘ShardsIDReader’ instead of command.

You can do config set loglevel debug on each shard or again use gears with ShardsIDReader to perform this command on all the shards with a single line.