Pytorch on gears - one more way to fry RedisGears cluster

AlexMikhalev · May 22, 2020, 1:31pm

Now I have 3 steps in the pipeline and functioning , I want to add 4th - tokenisation using Bert Model.
Unfortunately, tokeniser depends on Pytorch which is ~800 MB download.
It seems after installation pytorch cluster becomes unstable:

161808:M 22 May 2020 15:01:13.516 * <module> GEARS: Successfully spellchecked sentence sentences:bafab6b3dd88dcdefe111698d02f81998c9accdb:236:{1x3}
161783:S 22 May 2020 15:03:42.420 * <module> Processing ./torch-1.4.0-cp37-cp37m-manylinux1_x86_64.whl

161783:S 22 May 2020 15:03:51.325 * <module> Installing collected packages: torch

161783:S 22 May 2020 15:04:02.674 * <module> Successfully installed torch-1.4.0

161783:S 22 May 2020 15:04:09.381 # <module> disconnected : 10.144.17.211:30006, status : -1, will try to reconnect.

161783:S 22 May 2020 15:04:09.402 # <module> disconnected : 10.144.17.211:30005, status : -1, will try to reconnect.

161783:S 22 May 2020 15:04:09.422 # <module> disconnected : 10.144.17.211:30003, status : -1, will try to reconnect.

161783:S 22 May 2020 15:04:09.443 # <module> disconnected : 10.144.17.211:30002, status : -1, will try to reconnect.

161783:S 22 May 2020 15:04:09.464 # <module> disconnected : 10.144.17.211:30004, status : -1, will try to reconnect.

The command I am trying to run gears-cli --host 10.144.17.211 --port 30001 tokenizer_bert_run.py --requirements requirements_tokenizer.txt

where requirements:

torch==1.4
transformers==2.9.1

and the code

tokenizer = None 


def loadTokeniser():
    global tokenizer
    from transformers import AutoTokenizer
    tokenizer = AutoTokenizer.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")
    return tokenizer

def tokenise_sentence(record):
    global tokenizer
    if not tokenizer:
        tokenizer=loadTokeniser()
    sentence_key=x['key']
    sentence_orig=x['value']
    # sentence_key=record['value']['sentence_key']
    # sentence_orig=record['value']['content']
    shard_id=hashtag()
    log(f"Tokeniser received {sentence_key} and my {shard_id}")
    tokens = tokenizer.tokenize(record['value']['content'])
    key = "tokenized:bert:%s:{%s}" % (sentence_key,shard_id)
    for token in tokens:
        execute('lpush', key, token)
        execute('SADD','processed_docs_stage3_tokenized', sentence_key)


bg = GearsBuilder()
bg.foreach(tokenise_sentence)
bg.count()
bg.run('sentences:*')

I don’t think it reaches point where it runs code.
gears-cli times out with

Results
-------

Errors
------
%d)     %s (1, 'Execution max idle reached')

meirsh · May 22, 2020, 1:47pm

@AlexMikhalev can you share the full logs of all the shards? I guess its just takes to long to install this requirement and we are reaching execution Max idle timeout (by the way I already have a PR that set the requirement installation idle timeout to longer value by default because it make sense it might take a while https://github.com/RedisGears/RedisGears/pull/326). Notice that you can increase this timeout https://oss.redislabs.com/redisgears/configuration.html#executionmaxidletime.

AlexMikhalev · May 22, 2020, 1:57pm

@meirsh is any way to get debug log out of shards?
I am running ./create-cluster tailall and excerpt is above - nothing else, no failures.

AlexMikhalev · May 22, 2020, 2:02pm

I increased redis-trib.py execute --addr 10.144.17.211:30001 --master-only RG.CONFIGSET ExecutionMaxIdleTime 30 and tried to re-submit the same script as above. Resulted in segfault - see gist.

gist.github.com

https://gist.github.com/AlexMikhalev/e3f8c8da7d0d865c88d06fa6af5e4c1a

dump.txt

161783:S 22 May 2020 15:59:26.305 # === ASSERTION FAILED ===
161783:S 22 May 2020 15:59:26.305 # ==> src/cluster.c:215 'reply->type == REDIS_REPLY_STATUS' is not true
161783:S 22 May 2020 15:59:26.305 # (forcing SIGSEGV to print the bug report.)
161783:S 22 May 2020 15:59:26.305 # Redis 6.0.1 crashed by signal: 11
161783:S 22 May 2020 15:59:26.305 # Crashed running the instruction at: 0x55dbdad6b99f
161783:S 22 May 2020 15:59:26.305 # Accessing address: 0xffffffffffffffff
161783:S 22 May 2020 15:59:26.305 # Failed assertion: reply->type == REDIS_REPLY_STATUS (src/cluster.c:215)

------ STACK TRACE ------
EIP:

This file has been truncated. show original

meirsh · May 22, 2020, 2:17pm

OK @AlexMikhalev the issue is also this:
Downloading torch-1.4.0-cp37-cp37m-manylinux1_x86_64.whl (753.4 MB)

The size 753.4 MB is more then the default redis bulk size. Try increase it with:
CONFIG SET proto-max-bulk-len 2048mb
Make sure to do it on all the shards and do not forget to increase ExecutionMaxIdleTime

I just tried it and it worked for me.

Regarding the crash, do you mind opening and issue on github?

AlexMikhalev · May 22, 2020, 2:45pm

Tried to increase memory bulk (after create-cluster clean&restart refresh). Still failed with idle timeout even on empty cluster.
I will try to replicate crash and file bug report on github/redisgears.

meirsh · May 22, 2020, 2:47pm

@AlexMikhalev as I said this config set is not enough you need to also increase the ExecutionMaxIdleTime, when you load the module you can give it as a parameter, set it to something like 5 minutes to be on the safe side (300000 ms).

AlexMikhalev · May 22, 2020, 2:49pm

@meirsh is ExecutionMaxIdleTime in seconds or in ms?

meirsh · May 22, 2020, 2:56pm

ms (some more chars to reach 20 chars so it will allow me to send the message )

AlexMikhalev · May 22, 2020, 3:21pm

For others, fix is to run:

redis-trib.py execute --addr ip:30001 RG.REFRESHCLUSTER
redis-trib.py execute --addr ip:30001 CONFIG SET proto-max-bulk-len 2048mb
redis-trib.py execute --addr  ip:30001 RG.CONFIGSET ExecutionMaxIdleTime 300000

Topic		Replies	Views
Error on Building a gears module RedisGears	1	824	June 9, 2020
Loading Pytorch models in RedisAI - Redis crashes RedisAI redisai , python	6	1313	November 6, 2020
numpy on redis gears docker RedisGears	8	2193	April 27, 2020
How to install Python packages to RedisGears docker image RedisGears python	5	1476	August 31, 2020
Tensorflow model yields different outputs on Redis Gear - correct redisAI implementation? RedisGears redisai	5	943	May 26, 2020

Pytorch on gears - one more way to fry RedisGears cluster

Related Topics