What is a "Document" ?

David_Lee · November 24, 2018, 3:18am

This is probably so obvious it needn’t be asked (or documented

Reading the docs, I cannot figure out what is meant by ‘Document’ in the context of what is stored and searched.
The examples are simplistic. There is no reference (that I can find) about document formats, types, encodings etc.

Without other commentary I am guessing “Document” means “Text Only”.

Is this correct ? I.e. would redissearch be able to index and store other types of ‘Documents’ like say MS/Word, XML, PDF, compressed files, images (metadata), structured documents (JSON, YAML …),

different encodings (UTF-8, UTF18, ISO-wtf-windows-usually-uses).

Indexes appear to be word indexes into plain text, is that correct ?

If so, then if one had say a PDF or Word doc then one would first extract out all the ‘text’ and then index that as apposed to indexing the ‘Document’ ?

Thanks for any ideas or references to documentation.

meirsh · November 25, 2018, 8:30am

Hey David

Document is a set of fields names and values, Values type can be one of the following : TEXT, TAG, NUMERIC, and GEO.
So it is not possible to directly index MS/Word, XML or PDF, You will first have to extract the text out of the document and then index it using the FT.ADD command.

Is it answers your question?

Topic		Replies	Views
[survey] Are you storing documents in RediSearch? RediSearch	1	549	December 29, 2020
RedisSearch OOM and proper indexing technique RediSearch	2	938	January 3, 2020
RediSearch tag query performance and use case RediSearch redisearch , ru201	4	1914	July 17, 2020
Is it possible to query doc by id in redisearch index? RediSearch	3	1120	July 31, 2020
FT.SEARCH doesn't return all documents RediSearch redisearch , redisjson	1	767	August 9, 2023

What is a "Document" ?

Related Topics