Loading Pytorch models in RedisAI - Redis crashes

Dear all,

I am currently facing an issue when loading a PyTorch model to RedisAI as Redis crashes using the redislabs/redismod:edge docker image. In order to exemplify the issue I will use the imagenet example provided in https://github.com/RedisAI/redisai-examples.

While I can successfully load the already serialised resnet50 model to RedisAI, Redis keeps crashing if I try to load a model that I have serialised on my own following the model_saver.py script. I am currently using PyTorch 1.6 on Python 3.7.

Thanks a lot!

Best regards,

manl

The bug report is as follows:

=== REDIS BUG REPORT START: Cut & paste starting from here ===
1:M 27 Oct 2020 08:18:45.084 # Redis 6.0.1 crashed by signal: 11
1:M 27 Oct 2020 08:18:45.084 # Crashed running the instruction at: 0x7f53c9dca975
1:M 27 Oct 2020 08:18:45.084 # Accessing address: 0x18
1:M 27 Oct 2020 08:18:45.084 # Failed assertion: <no assertion failed> (<no file>:0)

------ STACK TRACE ------
EIP:
/usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(+0x2f2f975)[0x7f53c9dca975]

Backtrace:
redis-server *:6379(logStackTrace+0x32)[0x562639f61872]
redis-server *:6379(sigsegvHandler+0x9e)[0x562639f61f4e]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12730)[0x7f53fabac730]
/usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(+0x2f2f975)[0x7f53c9dca975]
/usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(+0x2f2fbb8)[0x7f53c9dcabb8]
/usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(_ZN5torch3jit16ScriptTypeParser18parseClassConstantERKNS0_6AssignE+0x8d)[0x7f53ca06154d]
/usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(+0x2f374af)[0x7f53c9dd24af]
/usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(+0x2f39e43)[0x7f53c9dd4e43]
/usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(+0x2f3a225)[0x7f53c9dd5225]
/usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(+0x2f3a30a)[0x7f53c9dd530a]
/usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(_ZNK5torch3jit16ScriptTypeParser17parseTypeFromExprERKNS0_4ExprE+0x1c5)[0x7f53ca063ba5]
/usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(+0x2f36844)[0x7f53c9dd1844]
/usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(+0x2f39e43)[0x7f53c9dd4e43]
/usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(+0x2f3a225)[0x7f53c9dd5225]
/usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(_ZNK5torch3jit14SourceImporter13loadNamedTypeERKN3c1013QualifiedNameE+0x2e)[0x7f53c9dc84fe]
/usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(+0x2f3bf54)[0x7f53c9dd6f54]
/usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(+0x2f10e93)[0x7f53c9dabe93]
/usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(+0x2f17352)[0x7f53c9db2352]
/usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(+0x2f19d60)[0x7f53c9db4d60]
/usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(+0x2f1a311)[0x7f53c9db5311]
/usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(_ZN5torch3jit21readArchiveAndTensorsERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEN3c108optionalISt8functionIFNS9_13StrongTypePtrERKNS9_13QualifiedNameEEEEENSA_ISB_IFNS9_13intrusive_ptrINS9_6ivalue6ObjectENS9_6detail34intrusive_target_default_null_typeISL_EEEESC_NS9_6IValueEEEEENSA_INS9_6DeviceEEERN6caffe29serialize19PyTorchStreamReaderE+0x6b2)[0x7f53c9dd6982]
/usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(+0x2f3bc9d)[0x7f53c9dd6c9d]
/usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(+0x2f3e3c4)[0x7f53c9dd93c4]
/usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(_ZN5torch3jit4loadESt10unique_ptrIN6caffe29serialize20ReadAdapterInterfaceESt14default_deleteIS4_EEN3c108optionalINS8_6DeviceEEERSt13unordered_mapINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESI_St4hashISI_ESt8equal_toISI_ESaISt4pairIKSI_SI_EEE+0x179)[0x7f53c9dd9bf9]
/usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so(_ZN5torch3jit4loadERSiN3c108optionalINS2_6DeviceEEERSt13unordered_mapINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESC_St4hashISC_ESt8equal_toISC_ESaISt4pairIKSC_SC_EEE+0x75)[0x7f53c9dda3f5]
/usr/lib/redis/modules/backends/redisai_torch/redisai_torch.so(torchLoadModel+0x215)[0x7f53fa86b475]
/usr/lib/redis/modules/backends/redisai_torch/redisai_torch.so(RAI_ModelCreateTorch+0x8a)[0x7f53fa8641ea]
/usr/lib/redis/modules/redisai.so(RAI_ModelCreate+0x16d)[0x7f53fa9bc80d]
/usr/lib/redis/modules/redisai.so(RedisAI_ModelSet_RedisCommand+0x91b)[0x7f53fa9b422b]
redis-server *:6379(RedisModuleCommandDispatcher+0x54)[0x562639f91ca4]
redis-server *:6379(call+0x9d)[0x562639f1df0d]
redis-server *:6379(processCommand+0x327)[0x562639f1e687]
redis-server *:6379(processCommandAndResetClient+0x10)[0x562639f2c280]
redis-server *:6379(processInputBuffer+0x18f)[0x562639f307cf]
redis-server *:6379(+0xd4b4c)[0x562639fadb4c]
redis-server *:6379(aeProcessEvents+0x111)[0x562639f17a21]
redis-server *:6379(aeMain+0x2b)[0x562639f17eab]
redis-server *:6379(main+0x4db)[0x562639f147eb]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xeb)[0x7f53fa9fb09b]
redis-server *:6379(_start+0x2a)[0x562639f14a7a]

------ INFO OUTPUT ------
# Server
redis_version:6.0.1
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:e02d1d807e41d65
redis_mode:standalone
os:Linux 4.19.76-linuxkit x86_64
arch_bits:64
multiplexing_api:epoll
atomicvar_api:atomic-builtin
gcc_version:8.3.0
process_id:1
run_id:e6a2f85ed2c4e92ed31dcff4906f4328e9323d73
tcp_port:6379
uptime_in_seconds:12
uptime_in_days:0
hz:10
configured_hz:10
lru_clock:9951204
executable:/data/redis-server
config_file:

# Clients
connected_clients:1
client_recent_max_input_buffer:98074634
client_recent_max_output_buffer:0
blocked_clients:0
tracking_clients:0
clients_in_timeout_table:0

# Memory
used_memory:242778632
used_memory_human:231.53M
used_memory_rss:124690432
used_memory_rss_human:118.91M
used_memory_peak:242778632
used_memory_peak_human:231.53M
used_memory_peak_perc:193.70%
used_memory_overhead:105965986
used_memory_startup:7874368
used_memory_dataset:136812646
used_memory_dataset_perc:58.24%
allocator_allocated:109019704
allocator_active:109441024
allocator_resident:129314816
total_system_memory:8353112064
total_system_memory_human:7.78G
used_memory_lua:37888
used_memory_lua_human:37.00K
used_memory_scripts:0
used_memory_scripts_human:0B
number_of_cached_scripts:0
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
allocator_frag_ratio:1.00
allocator_frag_bytes:421320
allocator_rss_ratio:1.18
allocator_rss_bytes:19873792
rss_overhead_ratio:0.96
rss_overhead_bytes:-4624384
mem_fragmentation_ratio:1.15
mem_fragmentation_bytes:16131624
mem_not_counted_for_evict:0
mem_replication_backlog:0
mem_clients_slaves:0
mem_clients_normal:98091618
mem_aof_buffer:0
mem_allocator:jemalloc-5.1.0
active_defrag_running:0
lazyfree_pending_objects:0

# Persistence
loading:0
rdb_changes_since_last_save:0
rdb_bgsave_in_progress:0
rdb_last_save_time:1603786712
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:-1
rdb_current_bgsave_time_sec:-1
rdb_last_cow_size:0
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok
aof_last_cow_size:0
module_fork_in_progress:0
module_fork_last_cow_size:0

# Stats
total_connections_received:1
total_commands_processed:5
instantaneous_ops_per_sec:0
total_net_input_bytes:102773871
total_net_output_bytes:0
instantaneous_input_kbps:51584.85
instantaneous_output_kbps:0.00
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:0
expired_stale_perc:0.00
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:0
evicted_keys:0
keyspace_hits:0
keyspace_misses:0
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:0
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
tracking_total_keys:0
tracking_total_items:0
unexpected_error_replies:0

# Replication
role:master
connected_slaves:0
master_replid:4e792775378397fcff3cb2682c63199109f4eeeb
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:0
master_repl_meaningful_offset:0
second_repl_offset:-1
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

# CPU
used_cpu_sys:0.183659
used_cpu_user:0.277650
used_cpu_sys_children:0.000000
used_cpu_user_children:0.000000

# Modules
module:name=search,ver=999999,api=1,filters=0,usedby=[],using=[],options=[]
module:name=graph,ver=999999,api=1,filters=0,usedby=[],using=[],options=[]
module:name=ReJSON,ver=999999,api=1,filters=0,usedby=[],using=[],options=[]
module:name=rg,ver=999999,api=1,filters=0,usedby=[],using=[ai],options=[]
module:name=bf,ver=999999,api=1,filters=0,usedby=[],using=[],options=[]
module:name=ai,ver=999999,api=1,filters=0,usedby=[rg],using=[],options=[]
module:name=timeseries,ver=999999,api=1,filters=0,usedby=[],using=[],options=[]

# Commandstats
cmdstat_config:calls=1,usec=43,usec_per_call=43.00
cmdstat_info:calls=4,usec=53,usec_per_call=13.25

# Cluster
cluster_enabled:0

# Keyspace

------ CLIENT LIST OUTPUT ------
id=21 addr=172.17.0.1:40400 fd=16 name= age=2 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=102773786 obl=0 oll=0 omem=0 events=r cmd=ai.modelset user=default

------ CURRENT CLIENT INFO ------
id=21 addr=172.17.0.1:40400 fd=16 name= age=2 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=102773786 obl=0 oll=0 omem=0 events=r cmd=ai.modelset user=default
argv[0]: 'AI.MODELSET'
argv[1]: 'imagenet_model'
argv[2]: 'torch'
argv[3]: 'cpu'
argv[4]: 'BLOB'
argv[5]: 'PK'

------ REGISTERS ------
1:M 27 Oct 2020 08:18:45.095 #
RAX:0000000000000000 RBX:000056263c008668
RCX:0000000000000000 RDX:0000000000000000
RDI:00007ffeb5711610 RSI:000056263c008668
RBP:00007ffeb5711890 RSP:00007ffeb5711610
R8 :0000000000000000 R9 :0000000000000001
R10:0000000000000001 R11:0000000000000020
R12:00007ffeb57126d0 R13:0000000000000138
R14:00007ffeb57126f0 R15:00007ffeb57126d0
RIP:00007f53c9dca975 EFL:0000000000010246
CSGSFS:002b000000000033
1:M 27 Oct 2020 08:18:45.095 # (00007ffeb571161f) -> 000056263c005e50
1:M 27 Oct 2020 08:18:45.095 # (00007ffeb571161e) -> 000056263c005e60
1:M 27 Oct 2020 08:18:45.095 # (00007ffeb571161d) -> 00000000000000a0
1:M 27 Oct 2020 08:18:45.095 # (00007ffeb571161c) -> 000056263c005e50
1:M 27 Oct 2020 08:18:45.095 # (00007ffeb571161b) -> 000056263c005e60
1:M 27 Oct 2020 08:18:45.095 # (00007ffeb571161a) -> 0000000000000000
1:M 27 Oct 2020 08:18:45.095 # (00007ffeb5711619) -> 00007f53c79a7e35
1:M 27 Oct 2020 08:18:45.095 # (00007ffeb5711618) -> 00007ffeb5712540
1:M 27 Oct 2020 08:18:45.095 # (00007ffeb5711617) -> 00007ffeb57118d0
1:M 27 Oct 2020 08:18:45.095 # (00007ffeb5711616) -> 0000000000000000
1:M 27 Oct 2020 08:18:45.095 # (00007ffeb5711615) -> 00007f53c9dcaeee
1:M 27 Oct 2020 08:18:45.095 # (00007ffeb5711614) -> 00007ffeb57126d0
1:M 27 Oct 2020 08:18:45.095 # (00007ffeb5711613) -> 00007ffeb57126f0
1:M 27 Oct 2020 08:18:45.095 # (00007ffeb5711612) -> 0000000000000138
1:M 27 Oct 2020 08:18:45.095 # (00007ffeb5711611) -> 00007ffeb57126d0
1:M 27 Oct 2020 08:18:45.095 # (00007ffeb5711610) -> 0000000000000000

------ MODULES INFO OUTPUT ------
# graph_executing commands

# ai_git
ai_git_sha:7a30eb39f3b3ce74bf4427b9c53f0fe6163e0ca2

# ai_load_time_configs
ai_threads_per_queue:1
ai_inter_op_parallelism:0
ai_intra_op_parallelism:0

# ai_cpu
ai_self_used_cpu_sys:0.183659
ai_self_used_cpu_user:0.277918
ai_children_used_cpu_sys:0.000000
ai_children_used_cpu_user:0.000000
ai_queue_CPU_bthread_#1_used_cpu_total:0.000000

------ FAST MEMORY TEST ------
1:M 27 Oct 2020 08:18:45.096 # Bio thread for job type #0 terminated
1:M 27 Oct 2020 08:18:45.096 # Bio thread for job type #1 terminated
1:M 27 Oct 2020 08:18:45.096 # Bio thread for job type #2 terminated
*** Preparing to test memory region 56263a0ac000 (2277376 bytes)
*** Preparing to test memory region 56263b2a7000 (14139392 bytes)
*** Preparing to test memory region 7f53ba79d000 (205553664 bytes)
*** Preparing to test memory region 7f53c6ba5000 (524288 bytes)
*** Preparing to test memory region 7f53c6c25000 (331776 bytes)
*** Preparing to test memory region 7f53d5a28000 (282624 bytes)
*** Preparing to test memory region 7f53d5ff7000 (8192 bytes)
*** Preparing to test memory region 7f53d6000000 (302125056 bytes)
*** Preparing to test memory region 7f53ec023000 (331776 bytes)
*** Preparing to test memory region 7f53ec1f4000 (16384 bytes)
*** Preparing to test memory region 7f53ec3fb000 (8388608 bytes)
*** Preparing to test memory region 7f53ecbfc000 (8388608 bytes)
*** Preparing to test memory region 7f53ed3fd000 (8388608 bytes)
*** Preparing to test memory region 7f53edbfe000 (8388608 bytes)
*** Preparing to test memory region 7f53ee3ff000 (8388608 bytes)
*** Preparing to test memory region 7f53eec00000 (8388608 bytes)
*** Preparing to test memory region 7f53ef400000 (4194304 bytes)
*** Preparing to test memory region 7f53ef815000 (524288 bytes)
*** Preparing to test memory region 7f53ef896000 (8388608 bytes)
*** Preparing to test memory region 7f53f02b4000 (9437184 bytes)
*** Preparing to test memory region 7f53f0bb5000 (8388608 bytes)
*** Preparing to test memory region 7f53f13b6000 (8388608 bytes)
*** Preparing to test memory region 7f53f1bb7000 (8388608 bytes)
*** Preparing to test memory region 7f53f23b8000 (8388608 bytes)
*** Preparing to test memory region 7f53f2bb9000 (8388608 bytes)
*** Preparing to test memory region 7f53f37ef000 (139264 bytes)
*** Preparing to test memory region 7f53f3a25000 (8388608 bytes)
*** Preparing to test memory region 7f53f44c4000 (12288 bytes)
*** Preparing to test memory region 7f53f44c8000 (8388608 bytes)
*** Preparing to test memory region 7f53f4cc9000 (8388608 bytes)
*** Preparing to test memory region 7f53f54ca000 (8388608 bytes)
*** Preparing to test memory region 7f53f5ccb000 (8388608 bytes)
*** Preparing to test memory region 7f53f64cc000 (8388608 bytes)
*** Preparing to test memory region 7f53f6ccd000 (8388608 bytes)
*** Preparing to test memory region 7f53f74ce000 (8388608 bytes)
*** Preparing to test memory region 7f53f7ccf000 (8388608 bytes)
*** Preparing to test memory region 7f53f8d9f000 (16384 bytes)
*** Preparing to test memory region 7f53f8da4000 (8388608 bytes)
*** Preparing to test memory region 7f53f97fc000 (12288 bytes)
*** Preparing to test memory region 7f53f9800000 (8388608 bytes)
*** Preparing to test memory region 7f53fa000000 (8388608 bytes)
*** Preparing to test memory region 7f53fa826000 (180224 bytes)
*** Preparing to test memory region 7f53fa883000 (4096 bytes)
*** Preparing to test memory region 7f53fa8c2000 (4096 bytes)
*** Preparing to test memory region 7f53fa923000 (4096 bytes)
*** Preparing to test memory region 7f53fa9d2000 (20480 bytes)
*** Preparing to test memory region 7f53fab94000 (24576 bytes)
*** Preparing to test memory region 7f53fabb7000 (16384 bytes)
*** Preparing to test memory region 7f53faea0000 (16384 bytes)
*** Preparing to test memory region 7f53fb0c8000 (8192 bytes)
*** Preparing to test memory region 7f53fb0f6000 (4096 bytes)
.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O
Fast memory test PASSED, however your memory can still be broken. Please run a memory test for several hours if possible.

------ DUMPING CODE AROUND EIP ------
Symbol: (null) (base: (nil))
Module: /usr/lib/redis/modules/backends/redisai_torch/lib/libtorch_cpu.so (base 0x7f53c6e9b000)
$ xxd -r -p /tmp/dump.hex /tmp/dump.bin
$ objdump --adjust-vma=(nil) -D -b binary -m i386:x86-64 /tmp/dump.bin
------

=== REDIS BUG REPORT END. Make sure to include from START to END. ===

Hi @manl, RedisAI is currently at PyTorch 1.5 for the TORCH backend.

We are working at upgrading all backends for the upcoming 1.2 RedisAI release. For the time being you’ll need to export from PyTorch 1.5, there’s no way around it.

Thanks @lantiga for your fast response and all of your efforts! I guess the same is true for Tensorflow as well? As I run into a similar problem when trying to infere from a TF model created from TF 2.3.

Thanks a lot and stay healthy, manl

Hey @manl, thanks!
I haven’t had reports about Tensorflow, but it’s very likely that is the case. Currently we run on 1.15, but 2.3 already works and it will be included in 1.2.0.
Actually, if you feel like building RedisAI from source, you can run get_deps.sh and afterwards replace the directories downloded deps with the more up-to-date backends and rebuild.
I know for a fact that TF 2.3 works out of the box, PyTorch 1.6 should work too.

Luca

Maybe it has been my fault. Regarding TensorFlow, I have created a toy example in TensorFlow 2.3 using Keras and exported the model to ONNX as well as saved the model to a frozen graph.

import numpy as np
import onnxmltools
import tensorflow as tf
from tensorflow import keras
from tensorflow.python.framework.convert_to_constants import convert_variables_to_constants_v2


def train_and_export():
	""" Train and export simple CNN

	"""
	# Define data related parameters
	num_classes = 10
	input_shape = (28, 28, 1)

	# Load the data
	(X_train, y_train), (X_test, y_test) = tf.keras.datasets.mnist.load_data()

	# Preprocess data
	x_train = X_train.astype("float32") / 255
	x_train = np.expand_dims(x_train, -1)

	# Convert class vectors to binary class matrices
	y_train = keras.utils.to_categorical(y_train, num_classes)

	# Define model
	model = keras.Sequential(
		[
			keras.Input(shape=input_shape),
			keras.layers.Conv2D(32, kernel_size=(3, 3), activation="relu"),
			keras.layers.MaxPooling2D(pool_size=(2, 2)),
			keras.layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
			keras.layers.MaxPooling2D(pool_size=(2, 2)),
			keras.layers.Flatten(),
			keras.layers.Dropout(0.5),
			keras.layers.Dense(num_classes, activation="softmax"),
		]
	)

	# Define parameters for model fitting and fit model
	batch_size = 128
	epochs = 15

	model.compile(loss="categorical_crossentropy", optimizer="adam", metrics=["accuracy"])

	model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, validation_split=0.1)

	# Save model
	tf.saved_model.save(model, './model')

	# Convert to ONNX
	onnx_model = onnxmltools.convert_keras(model)
	with open('./onnx/model_onnxmltools.onnx', 'wb') as f:
		f.write(onnx_model.SerializeToString())

	# Convert Keras model to ConcreteFunction
	full_model = tf.function(lambda x: model(x))
	full_model = full_model.get_concrete_function(
		x=tf.TensorSpec(model.inputs[0].shape, model.inputs[0].dtype))

	# Get frozen ConcreteFunction
	frozen_func = convert_variables_to_constants_v2(full_model)
	frozen_func.graph.as_graph_def()

	# Save frozen graph from frozen ConcreteFunction to hard drive
	tf.io.write_graph(graph_or_graph_def=frozen_func.graph,
					  logdir='./model_frozen/',
					  name="frozen_graph.pb",
					  as_text=False)


if __name__ == "__main__":
	train_and_export()

The procedure for freezing the graph is following the approach mentioned here.

Drawing inference from the ONNX model works fine while using the TF backend results in a crashing Redis:

=== REDIS BUG REPORT START: Cut & paste starting from here ===
7:M 28 Oct 2020 09:04:49.927 # Redis 6.0.1 crashed by signal: 11
7:M 28 Oct 2020 09:04:49.927 # Crashed running the instruction at: 0x7f6540cb1383
7:M 28 Oct 2020 09:04:49.927 # Accessing address: 0x40
7:M 28 Oct 2020 09:04:49.927 # Failed assertion: <no assertion failed> (<no file>:0)

------ STACK TRACE ------
EIP:
/usr/lib/redis/modules/redisai.so(RAI_TensorGetShallowCopy+0x3)[0x7f6540cb1383]

Backtrace:
redis-server *:6379(logStackTrace+0x32)[0x5580909e5872]
redis-server *:6379(sigsegvHandler+0x9e)[0x5580909e5f4e]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12730)[0x7f6540e9d730]
/usr/lib/redis/modules/redisai.so(RAI_TensorGetShallowCopy+0x3)[0x7f6540cb1383]
/var/opt/redislabs/lib/modules/redisgears.so(+0x108cc0)[0x7f653a36dcc0]
/var/opt/redislabs/lib/modules/redisgears.so(_PyMethodDef_RawFastCallKeywords+0x2d1)[0x7f653a3b6a51]
/var/opt/redislabs/lib/modules/redisgears.so(_PyCFunction_FastCallKeywords+0x25)[0x7f653a3b6b15]
/var/opt/redislabs/lib/modules/redisgears.so(_PyEval_EvalFrameDefault+0x8aa0)[0x7f653a34d990]
/var/opt/redislabs/lib/modules/redisgears.so(+0xded5b)[0x7f653a343d5b]
/var/opt/redislabs/lib/modules/redisgears.so(_PyEval_EvalFrameDefault+0x6a23)[0x7f653a34b913]
/var/opt/redislabs/lib/modules/redisgears.so(+0xded5b)[0x7f653a343d5b]
/var/opt/redislabs/lib/modules/redisgears.so(_PyEval_EvalFrameDefault+0x6a23)[0x7f653a34b913]
/var/opt/redislabs/lib/modules/redisgears.so(+0xded5b)[0x7f653a343d5b]
/var/opt/redislabs/lib/modules/redisgears.so(_PyFunction_FastCallDict+0x2d2)[0x7f653a3b6482]
/var/opt/redislabs/lib/modules/redisgears.so(+0x1104e7)[0x7f653a3754e7]
/var/opt/redislabs/lib/modules/redisgears.so(+0xf2304)[0x7f653a357304]
/var/opt/redislabs/lib/modules/redisgears.so(+0xf0368)[0x7f653a355368]
/var/opt/redislabs/lib/modules/redisgears.so(+0xf228d)[0x7f653a35728d]
/var/opt/redislabs/lib/modules/redisgears.so(+0xf0368)[0x7f653a355368]
/var/opt/redislabs/lib/modules/redisgears.so(+0xf082c)[0x7f653a35582c]
/var/opt/redislabs/lib/modules/redisgears.so(+0xf1fbb)[0x7f653a356fbb]
/var/opt/redislabs/lib/modules/redisgears.so(EPStatus_RunningAction+0x3b)[0x7f653a35713b]
/var/opt/redislabs/lib/modules/redisgears.so(+0xefd45)[0x7f653a354d45]
/var/opt/redislabs/lib/modules/redisgears.so(+0xf263b)[0x7f653a35763b]
/var/opt/redislabs/lib/modules/redisgears.so(+0xfa15e)[0x7f653a35f15e]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7fa3)[0x7f6540e92fa3]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f6540dc14cf]

------ INFO OUTPUT ------
# Server
redis_version:6.0.1
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:e02d1d807e41d65
redis_mode:standalone
os:Linux 4.19.76-linuxkit x86_64
arch_bits:64
multiplexing_api:epoll
atomicvar_api:atomic-builtin
gcc_version:8.3.0
process_id:7
run_id:3ddb25a71ea1c9d19bbecb98a8e9c4af0430dab2
tcp_port:6379
uptime_in_seconds:32
uptime_in_days:0
hz:10
configured_hz:10
lru_clock:10040369
executable:/data/redis-server
config_file:

# Clients
connected_clients:1
client_recent_max_input_buffer:5547218
client_recent_max_output_buffer:0
blocked_clients:0
tracking_clients:0
clients_in_timeout_table:0

# Memory
used_memory:15248024
used_memory_human:14.54M
used_memory_rss:106254336
used_memory_rss_human:101.33M
used_memory_peak:74587592
used_memory_peak_human:71.13M
used_memory_peak_perc:20.44%
used_memory_overhead:5290096
used_memory_startup:5289984
used_memory_dataset:9957928
used_memory_dataset_perc:100.00%
allocator_allocated:15810824
allocator_active:16445440
allocator_resident:23568384
total_system_memory:8353112064
total_system_memory_human:7.78G
used_memory_lua:37888
used_memory_lua_human:37.00K
used_memory_scripts:0
used_memory_scripts_human:0B
number_of_cached_scripts:0
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
allocator_frag_ratio:1.04
allocator_frag_bytes:634616
allocator_rss_ratio:1.43
allocator_rss_bytes:7122944
rss_overhead_ratio:4.51
rss_overhead_bytes:82685952
mem_fragmentation_ratio:7.08
mem_fragmentation_bytes:91248104
mem_not_counted_for_evict:0
mem_replication_backlog:0
mem_clients_slaves:0
mem_clients_normal:0
mem_aof_buffer:0
mem_allocator:jemalloc-5.1.0
active_defrag_running:0
lazyfree_pending_objects:0

# Persistence
loading:0
rdb_changes_since_last_save:8
rdb_bgsave_in_progress:0
rdb_last_save_time:1603875857
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:-1
rdb_current_bgsave_time_sec:-1
rdb_last_cow_size:0
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok
aof_last_cow_size:0
module_fork_in_progress:0
module_fork_last_cow_size:0

# Stats
total_connections_received:4
total_commands_processed:20
instantaneous_ops_per_sec:0
total_net_input_bytes:62125229
total_net_output_bytes:643
instantaneous_input_kbps:0.00
instantaneous_output_kbps:0.00
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:0
expired_stale_perc:0.00
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:0
evicted_keys:0
keyspace_hits:6
keyspace_misses:0
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:0
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
tracking_total_keys:0
tracking_total_items:0
unexpected_error_replies:0

# Replication
role:master
connected_slaves:0
master_replid:904c545495d92f2fbfce8812b792958f8d2142e7
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:0
master_repl_meaningful_offset:0
second_repl_offset:-1
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

# CPU
used_cpu_sys:0.601358
used_cpu_user:0.486182
used_cpu_sys_children:0.969818
used_cpu_user_children:6.960224

# Modules
module:name=graph,ver=20206,api=1,filters=0,usedby=[],using=[],options=[]
module:name=rg,ver=10002,api=1,filters=0,usedby=[],using=[ai],options=[]
module:name=timeseries,ver=10405,api=1,filters=0,usedby=[],using=[],options=[]
module:name=ai,ver=10002,api=1,filters=0,usedby=[rg],using=[],options=[]
module:name=search,ver=20001,api=1,filters=0,usedby=[],using=[],options=[]
module:name=bf,ver=20204,api=1,filters=0,usedby=[],using=[],options=[]
module:name=ReJSON,ver=10006,api=1,filters=0,usedby=[],using=[],options=[]

# Commandstats
cmdstat_xinfo:calls=1,usec=24,usec_per_call=24.00
cmdstat_scan:calls=1,usec=9,usec_per_call=9.00
cmdstat_rg.dumpregistrations:calls=1,usec=12,usec_per_call=12.00
cmdstat_xreadgroup:calls=3,usec=51,usec_per_call=17.00
cmdstat_info:calls=4,usec=62,usec_per_call=15.50
cmdstat_xgroup:calls=1,usec=15,usec_per_call=15.00
cmdstat_ai.modelset:calls=1,usec=71026,usec_per_call=71026.00
cmdstat_ping:calls=2,usec=1,usec_per_call=0.50
cmdstat_config:calls=2,usec=35,usec_per_call=17.50
cmdstat_rg.pyimportreq:calls=2,usec=99333,usec_per_call=49666.50
cmdstat_xadd:calls=1,usec=297,usec_per_call=297.00
cmdstat_rg.pyexecute:calls=1,usec=138889,usec_per_call=138889.00

# Cluster
cluster_enabled:0

# Keyspace
db0:keys=2,expires=0,avg_ttl=0

------ CLIENT LIST OUTPUT ------
id=37 addr=172.17.0.1:40884 fd=13 name= age=0 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=xadd user=default

------ REGISTERS ------
7:M 28 Oct 2020 09:04:49.929 #
RAX:0000000000000000 RBX:00007f6484cba268
RCX:00007f6480000000 RDX:00007f653a692e10
RDI:0000000000000000 RSI:00007f6484c3618c
RBP:0000000000000000 RSP:00007f653925fa78
R8 :0000000000000059 R9 :000000000000000c
R10:00007f6539260ee0 R11:0000000000000006
R12:00007f6484cba248 R13:00007f6532f00d60
R14:00007f653a6b4d98 R15:00007f6540cb1380
RIP:00007f6540cb1383 EFL:0000000000010202
CSGSFS:002b000000000033
7:M 28 Oct 2020 09:04:49.929 # (00007f653925fa87) -> 00007f6484c6b450
7:M 28 Oct 2020 09:04:49.929 # (00007f653925fa86) -> 00007f653788bfc0
7:M 28 Oct 2020 09:04:49.929 # (00007f653925fa85) -> 0000000000000000
7:M 28 Oct 2020 09:04:49.929 # (00007f653925fa84) -> 00007f6500000001
7:M 28 Oct 2020 09:04:49.929 # (00007f653925fa83) -> 00007f6484c6b460
7:M 28 Oct 2020 09:04:49.929 # (00007f653925fa82) -> 00007f653a3b6a51
7:M 28 Oct 2020 09:04:49.929 # (00007f653925fa81) -> 0000000000000001
7:M 28 Oct 2020 09:04:49.929 # (00007f653925fa80) -> 0000000000000000
7:M 28 Oct 2020 09:04:49.929 # (00007f653925fa7f) -> 00007f653a36dbc0
7:M 28 Oct 2020 09:04:49.929 # (00007f653925fa7e) -> 00007f653a629100
7:M 28 Oct 2020 09:04:49.929 # (00007f653925fa7d) -> 00007f653a6b05e0
7:M 28 Oct 2020 09:04:49.929 # (00007f653925fa7c) -> 00007f6537888260
7:M 28 Oct 2020 09:04:49.929 # (00007f653925fa7b) -> 00007f653a6b05e0
7:M 28 Oct 2020 09:04:49.930 # (00007f653925fa7a) -> 00007f6484c1f108
7:M 28 Oct 2020 09:04:49.930 # (00007f653925fa79) -> 00007f653788bfc0
7:M 28 Oct 2020 09:04:49.930 # (00007f653925fa78) -> 00007f653a36dcc0

------ MODULES INFO OUTPUT ------
# graph_executing commands

# ai_git
ai_git_sha:97bedfa60238fa0fa6b318b7d9e5df995fe8b5e2

# ai_load_time_configs
ai_threads_per_queue:1
ai_inter_op_parallelism:0
ai_intra_op_parallelism:0

------ FAST MEMORY TEST ------
7:M 28 Oct 2020 09:04:49.930 # Bio thread for job type #0 terminated
7:M 28 Oct 2020 09:04:49.930 # Bio thread for job type #1 terminated
7:M 28 Oct 2020 09:04:49.930 # Bio thread for job type #2 terminated
*** Preparing to test memory region 558090b30000 (2277376 bytes)
*** Preparing to test memory region 558091dbf000 (14102528 bytes)
*** Preparing to test memory region 7f6480000000 (135168 bytes)
*** Preparing to test memory region 7f6484bff000 (2097152 bytes)
*** Preparing to test memory region 7f6484e00000 (8388608 bytes)
*** Preparing to test memory region 7f6485600000 (2097152 bytes)
*** Preparing to test memory region 7f648597d000 (8388608 bytes)
*** Preparing to test memory region 7f6486439000 (8192 bytes)
*** Preparing to test memory region 7f6486ed7000 (4096 bytes)
*** Preparing to test memory region 7f6487111000 (4096 bytes)
*** Preparing to test memory region 7f64875db000 (8192 bytes)
*** Preparing to test memory region 7f64877fb000 (100663296 bytes)
*** Preparing to test memory region 7f648d7fc000 (75497472 bytes)
*** Preparing to test memory region 7f6491ffd000 (8388608 bytes)
*** Preparing to test memory region 7f64927fe000 (8388608 bytes)
*** Preparing to test memory region 7f6492fff000 (8388608 bytes)
*** Preparing to test memory region 7f6493800000 (8388608 bytes)
*** Preparing to test memory region 7f6494000000 (135168 bytes)
*** Preparing to test memory region 7f6498000000 (135168 bytes)
*** Preparing to test memory region 7f649c000000 (135168 bytes)
*** Preparing to test memory region 7f64a0000000 (135168 bytes)
*** Preparing to test memory region 7f64a4000000 (135168 bytes)
*** Preparing to test memory region 7f64a8000000 (135168 bytes)
*** Preparing to test memory region 7f64adf70000 (49152 bytes)
*** Preparing to test memory region 7f64adffd000 (8388608 bytes)
*** Preparing to test memory region 7f64ae7fe000 (8388608 bytes)
*** Preparing to test memory region 7f64aefff000 (8388608 bytes)
*** Preparing to test memory region 7f64af800000 (8388608 bytes)
*** Preparing to test memory region 7f64b0000000 (135168 bytes)
*** Preparing to test memory region 7f64b4000000 (135168 bytes)
*** Preparing to test memory region 7f64b8000000 (135168 bytes)
*** Preparing to test memory region 7f64bc000000 (135168 bytes)
*** Preparing to test memory region 7f64c0000000 (135168 bytes)
*** Preparing to test memory region 7f64c4000000 (135168 bytes)
*** Preparing to test memory region 7f64c8000000 (135168 bytes)
*** Preparing to test memory region 7f64cc000000 (135168 bytes)
*** Preparing to test memory region 7f64d0000000 (135168 bytes)
*** Preparing to test memory region 7f64d43cb000 (4096 bytes)
*** Preparing to test memory region 7f64d47fa000 (8388608 bytes)
*** Preparing to test memory region 7f64d4ffb000 (8388608 bytes)
*** Preparing to test memory region 7f64d57fc000 (8388608 bytes)
*** Preparing to test memory region 7f64d5ffd000 (8388608 bytes)
*** Preparing to test memory region 7f64d67fe000 (8388608 bytes)
*** Preparing to test memory region 7f64d6fff000 (8388608 bytes)
*** Preparing to test memory region 7f64d7800000 (8388608 bytes)
*** Preparing to test memory region 7f64d8000000 (135168 bytes)
*** Preparing to test memory region 7f64dc000000 (135168 bytes)
*** Preparing to test memory region 7f64e0000000 (135168 bytes)
*** Preparing to test memory region 7f64e4000000 (135168 bytes)
*** Preparing to test memory region 7f64e8000000 (135168 bytes)
*** Preparing to test memory region 7f64ec000000 (135168 bytes)
*** Preparing to test memory region 7f64f07fa000 (8388608 bytes)
*** Preparing to test memory region 7f64f0ffb000 (8388608 bytes)
*** Preparing to test memory region 7f64f17fc000 (8388608 bytes)
*** Preparing to test memory region 7f64f1ffd000 (8388608 bytes)
*** Preparing to test memory region 7f64f27fe000 (8388608 bytes)
*** Preparing to test memory region 7f64f2fff000 (8388608 bytes)
*** Preparing to test memory region 7f64f3800000 (8388608 bytes)
*** Preparing to test memory region 7f64f4000000 (135168 bytes)
*** Preparing to test memory region 7f64f8611000 (135168 bytes)
*** Preparing to test memory region 7f64f863a000 (8388608 bytes)
*** Preparing to test memory region 7f64f8e3b000 (8388608 bytes)
*** Preparing to test memory region 7f64f963c000 (8388608 bytes)
*** Preparing to test memory region 7f64fb82b000 (69632 bytes)
*** Preparing to test memory region 7f6504b86000 (499712 bytes)
*** Preparing to test memory region 7f6504c00000 (188878848 bytes)
*** Preparing to test memory region 7f65141ff000 (8388608 bytes)
*** Preparing to test memory region 7f65149ff000 (35651584 bytes)
*** Preparing to test memory region 7f6516c00000 (8388608 bytes)
*** Preparing to test memory region 7f6517400000 (214044672 bytes)
*** Preparing to test memory region 7f6528400000 (63049728 bytes)
*** Preparing to test memory region 7f6530000000 (4194304 bytes)
*** Preparing to test memory region 7f6530600000 (8388608 bytes)
*** Preparing to test memory region 7f6530e00000 (4194304 bytes)
*** Preparing to test memory region 7f65312fb000 (52428800 bytes)
*** Preparing to test memory region 7f65344fc000 (8388608 bytes)
*** Preparing to test memory region 7f6534cfd000 (8388608 bytes)
*** Preparing to test memory region 7f65354fe000 (8388608 bytes)
*** Preparing to test memory region 7f6535cff000 (8388608 bytes)
*** Preparing to test memory region 7f6536500000 (8388608 bytes)
*** Preparing to test memory region 7f6536d00000 (9437184 bytes)
*** Preparing to test memory region 7f6537604000 (65536 bytes)
*** Preparing to test memory region 7f65377ae000 (16384 bytes)
*** Preparing to test memory region 7f65377e0000 (2621440 bytes)
*** Preparing to test memory region 7f6537a61000 (8388608 bytes)
*** Preparing to test memory region 7f6538262000 (8388608 bytes)
*** Preparing to test memory region 7f6538a63000 (8388608 bytes)
*** Preparing to test memory region 7f6539264000 (8388608 bytes)
*** Preparing to test memory region 7f6539a65000 (8388608 bytes)
*** Preparing to test memory region 7f653a693000 (143360 bytes)
*** Preparing to test memory region 7f653a8ca000 (8388608 bytes)
*** Preparing to test memory region 7f653b0cb000 (8388608 bytes)
*** Preparing to test memory region 7f653b8cc000 (8388608 bytes)
*** Preparing to test memory region 7f653c0cd000 (8388608 bytes)
*** Preparing to test memory region 7f653c8ce000 (8388608 bytes)
*** Preparing to test memory region 7f653d0cf000 (8388608 bytes)
*** Preparing to test memory region 7f653d8d0000 (8388608 bytes)
*** Preparing to test memory region 7f653e0d1000 (8388608 bytes)
*** Preparing to test memory region 7f653f19f000 (16384 bytes)
*** Preparing to test memory region 7f653f1a4000 (8388608 bytes)
*** Preparing to test memory region 7f653fbfc000 (12288 bytes)
*** Preparing to test memory region 7f653fc00000 (8388608 bytes)
*** Preparing to test memory region 7f6540400000 (8388608 bytes)
*** Preparing to test memory region 7f6540c13000 (4096 bytes)
*** Preparing to test memory region 7f6540cc3000 (20480 bytes)
*** Preparing to test memory region 7f6540e85000 (24576 bytes)
*** Preparing to test memory region 7f6540ea8000 (16384 bytes)
*** Preparing to test memory region 7f6541191000 (16384 bytes)
*** Preparing to test memory region 7f65413b9000 (8192 bytes)
*** Preparing to test memory region 7f65413bd000 (4096 bytes)
*** Preparing to test memory region 7f65413e7000 (4096 bytes)
/gears_tmp/custom_run.sh: line 18:     7 Segmentation fault      redis-server --loadmodule /usr/lib/redis/modules/redisai.so --loadmodule /usr/lib/redis/modules/redisearch.so --loadmodule /usr/lib/redis/modules/redisgraph.so --loadmodule /usr/lib/redis/modules/redistimeseries.so --loadmodule /usr/lib/redis/modules/rejson.so --loadmodule /usr/lib/redis/modules/redisbloom.so --loadmodule /var/opt/redislabs/lib/modules/redisgears.so PythonHomeDir /opt/redislabs/lib/modules/python3

According to https://github.com/RedisAI/RedisAI this should work:

Keras and TensorFlow 2.x are supported through graph freezing.

Am I missing something?

Once again, thanks a lot!

Best regards,

manl

Hi @manl, sorry for the wait, somehow I missed your reply.

Typically freezing Tensorflow 2.3 graphs and running 1.15 should work (I haven’t stumbled on a graph that made the backend crash). However, we’ll make sure we release a genuine TF 2.3 backend with 1.2.0.

In your case I’m actually not sure what is going on, because I see a segfault happening during a RAI_TensorGetShallowCopy from within Gears.

Can you try running the same graph from pure RedisAI? Not through Gears, I mean just

MODELSET ...
TENSORSET ... (even without VALUES or BLOB)
MODELRUN

from the CLI, and see what happens, eventually reporting the error?

Thanks

Dear @lantiga,

thank you very much for your feedback. You are right - by using just pure RedisAI it works like a charm. So it seems to be related to Gears.

Best regards,

manl