RediSearch AIX portability - I had a go, but (hopefully) small thing in the way

tw-bert · December 11, 2017, 4:52pm

I am interested in the AIX portability of RediSearch, and am trying to get it working on AIX 7.1 x64.

I got Redis 4.0.6 up and running pretty smoothly.

I ran into a few issues with RediSearch 1.0.1 (1.0.2 was released a few minutes later ), see here for my efforts so far: https://github.com/Panaedra/RediSearch .

So, it does something, and even indexes the documents, but FT.SEARCH finds nothing.

Any idea’s on what’s wrong? Can’t put my finger on it. Any help appreciated.

Cheers, TW

dvirsky · December 12, 2017, 8:56am

This looks like a problem with the encoder/decoder of the inverted index, it uses various schemes of integer encodings, we’re probably relying on endinanness somewhere and this causes the decoded records to be junk.

tw-bert · December 12, 2017, 9:07am

Good info, could well be endianness (AIX=BE).

Perhaps I could do an extra test on Linux zseries (like RHEL on IBM Power), which is Linux big endian. Will take some time to set up though, and has to be worth the effort.

dvirsky · December 12, 2017, 11:03am

It could also be the uint128 bitmask for field ids.
Mark made a chagne to that in the branch you just tested today, you might want to try that on AIX.

tw-bert · December 12, 2017, 12:34pm

Thanks, doesn’t hurt to try. I just merged dmd-improvements into the master of https://github.com/Panaedra/RediSearch .

Alas, same behaviour.

FT.CREATE “ftFoo” STOPWORDS 0 SCHEMA x TEXT

FT.ADD “ftFoo” “ftFoo\x00Bar” 1.0 ‘REPLACE’ ‘FIELDS’ x ‘hello world’

(used null character as extra)

127.0.0.1:6379> FT.SEARCH “ftFoo” world

(integer) 0

127.0.0.1:6379> keys *

“idx:ftFoo”
“ftFoo\x00Bar”
“ft:ftFoo/world”
“ft:ftFoo/hello”

127.0.0.1:6379> FT.INFO ftFoo

index_name
ftFoo
index_options
(empty list or set)
fields
1. 1. x
type
TEXT
WEIGHT
“1”
num_docs
“1”
max_doc_id
“1”
num_terms
“2”
num_records
“1.8446744073709552e+19”
inverted_sz_mb
“17592186044416”
offset_vectors_sz_mb
“1.9073486328125e-06”
doc_table_size_mb
“5.626678466796875e-05”
key_table_size_mb
“4.1961669921875e-05”
records_per_doc_avg
“1.8446744073709552e+19”
bytes_per_record_avg
“1”
offsets_per_term_avg
“1.0842021724855044e-19”
offset_bits_per_record_avg
“8”
gc_stats
1. current_hz
“1”
bytes_collected
“205”
effectiv_cycles_rate
“0.0058139534883720929”

Mark_Nunberg · December 12, 2017, 1:14pm

Our “qint” AKA GroupVarInt codec assumes little endian. We can modify it to be big endian if we had some easy way to test it. consider https://github.com/RedisLabsModules/RediSearch/blob/0dd7d73652f971a5c36df8c06a266c6b1d3346df/src/qint.c#L49

It copies the first byte of the uint32 while the integer’s value is not 0. This works in little endian because the first byte is going to be the the MSB; however in big endian the first byte is actually the LSB, so for example encoding the number 0xFAFBFCFD the following happens in little endian:

Because of byte order, the number is actually 0xFDFCFBFA
The first byte (0xFD) is written
The number is right shifted, the byte layout is now 0xFCFBFA00
The first byte (0xFC) is written
repeat steps 3-4 until the number is 0

With big endian:

The number in memory is 0xFAFBFCFD
The first byte (0xFA) is written
The number is right shifted, the byte layout is now 0x00FAFBFC
The first byte (0x00) is written
… as you can see, this doesn’t encode the correct number.

It should be simple enough to modify this algorithm to work on big endian machines - but as an ifdef. this algorithm is specifically optimized for little endian (because we can just read byte by byte sequentially)

Mark Nunberg | Senior Software Engineer
Redis Labs - home of Redis

Email: mark@redislabs.com

Mark_Nunberg · December 12, 2017, 1:17pm

I got LSB and MSB confused, but you get the idea
Mark Nunberg | Senior Software Engineer
Redis Labs - home of Redis

Email: mark@redislabs.com

tw-bert · December 12, 2017, 1:33pm

Thank you Mark, I kind of suspected a construct like that after Dvir’s remark.

If I’m not misunderstanding, I could try to do about the same in BigEndian, but decrement instead of increment the pointer.

For the ifdef in this scenario, would you use something like this? https://stackoverflow.com/a/2100549/2759336

Mark_Nunberg · December 12, 2017, 1:51pm

gcc has predefined compile time macro variables which take care of determining the byte order: https://stackoverflow.com/questions/8978935/detecting-endianness

As far as the best strategy for fixing this. https://github.com/RedisLabsModules/RediSearch/blob/0dd7d73652f971a5c36df8c06a266c6b1d3346df/src/qint.c#L112 is where the number is actually decoded. As you can see, it assumes that the encoded integer is actually a uint32_t (in memory). This part would also need to be changed, because reading the value “0x01” would mean reading all four bytes, which would cause garbled data at best and invalid memory access at worst. (FWIW I’m not 100% comfortable with this kind of type casting anyway, because technically it might cause a buffer overrun anyway).

What would work best here would be to “normalize" the number into little endian before handling encoding/decoding

In summary, the following changes would need to be made:

In __qint_encode, if big-endian, write the last rather than first byte of the uint32 (converting it to little endian in-situ)
in QINT_DECODE_VALUE, rather than using magic casting, you’d need to combine bits manually. Because the encoding is now always LE, you don’t need to ifdef here, but simply combine bits with this understanding. Hopefully the compiler will still optimize for LE

switch (bits) {
case 2:
lval = *(uint32_t *)ptr & 0xFFFFFF;
nused = 3;
break; \

would become:

case 2: lval = (ptr[0]) | (ptr[1] << 8) …

case 3: lval = (ptr[0]) | (ptr[1] << 8) | (ptr[2] << 16)

Or more generically

nused = 4-(bits-1)

for (size_t ii = 0; ii < nused; ++ii) {

lval |= (ptr[ii] << (8*ii))

or something like that. We have the loop “unrolled" for performance but it’s functionally equivalent to smaller code.

Mark Nunberg | Senior Software Engineer
Redis Labs - home of Redis

Email: mark@redislabs.com

tw-bert · December 12, 2017, 1:59pm

Excellent info, I’ll get back to you on this.

Topic		Replies	Views
RedisSearch - Release - 2.0.0-M1 RediSearch	4	1458	September 1, 2020
Indexing fields with diacritics RediSearch	4	867	December 7, 2020
Tuning redisearch for FT.SEARCH (prefix matching) performance for MINPREFIX 1, 2 or 3 any suggestions? RediSearch redisearch	0	1238	September 18, 2020
[ANN] RediSearch v1.4.0 RediSearch	0	589	August 20, 2018
FT.AGGREGATE Strange Behavior RediSearch	11	1372	July 12, 2023

RediSearch AIX portability - I had a go, but (hopefully) small thing in the way

Related Topics