I am interested in the AIX portability of RediSearch, and am trying to get it working on AIX 7.1 x64.
I got Redis 4.0.6 up and running pretty smoothly.
I ran into a few issues with RediSearch 1.0.1 (1.0.2 was released a few minutes later ), see here for my efforts so far: https://github.com/Panaedra/RediSearch .
So, it does something, and even indexes the documents, but FT.SEARCH finds nothing.
Any idea’s on what’s wrong? Can’t put my finger on it. Any help appreciated.
This looks like a problem with the encoder/decoder of the inverted index, it uses various schemes of integer encodings, we’re probably relying on endinanness somewhere and this causes the decoded records to be junk.
Perhaps I could do an extra test on Linux zseries (like RHEL on IBM Power), which is Linux big endian. Will take some time to set up though, and has to be worth the effort.
It copies the first byte of the uint32 while the integer’s value is not 0. This works in little endian because the first byte is going to be the the MSB; however in big endian the first byte is actually the LSB, so for example encoding the number 0xFAFBFCFD the following happens in little endian:
Because of byte order, the number is actually 0xFDFCFBFA
The first byte (0xFD) is written
The number is right shifted, the byte layout is now 0xFCFBFA00
The first byte (0xFC) is written
repeat steps 3-4 until the number is 0
With big endian:
The number in memory is 0xFAFBFCFD
The first byte (0xFA) is written
The number is right shifted, the byte layout is now 0x00FAFBFC
The first byte (0x00) is written
… as you can see, this doesn’t encode the correct number.
It should be simple enough to modify this algorithm to work on big endian machines - but as an ifdef. this algorithm is specifically optimized for little endian (because we can just read byte by byte sequentially)
Mark Nunberg | Senior Software Engineer Redis Labs - home of Redis
As far as the best strategy for fixing this. https://github.com/RedisLabsModules/RediSearch/blob/0dd7d73652f971a5c36df8c06a266c6b1d3346df/src/qint.c#L112 is where the number is actually decoded. As you can see, it assumes that the encoded integer is actually a uint32_t (in memory). This part would also need to be changed, because reading the value “0x01” would mean reading all four bytes, which would cause garbled data at best and invalid memory access at worst. (FWIW I’m not 100% comfortable with this kind of type casting anyway, because technically it might cause a buffer overrun anyway).
What would work best here would be to “normalize" the number into little endian before handling encoding/decoding
In summary, the following changes would need to be made:
In __qint_encode, if big-endian, write the last rather than first byte of the uint32 (converting it to little endian in-situ)
in QINT_DECODE_VALUE, rather than using magic casting, you’d need to combine bits manually. Because the encoding is now always LE, you don’t need to ifdef here, but simply combine bits with this understanding. Hopefully the compiler will still optimize for LE