Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[lucene.xml#query]: [Full Text Index]: Improvements in query and options parameters description #1063

Open
daliboris opened this issue Jan 27, 2025 · 4 comments

Comments

@daliboris
Copy link

Description of the query and option parameters in ft:query() function in the Querying the Index section can be improved

To drill down by a given facet dimension and value, pass a key "facets" in the options map given in the third parameter of ft:query

  • It's not clear that user can combine query in XML with options as a map
  • options parameter as a map in the documentation contains only "facets" as a key, but it can contain other keys, like "default-operator", "leading-wildcard", that correspond to child elements of the <options> element, for example
let $options := map { 
    "default-operator" : "or"
    }
  • Documentation should mention that user can use XML version of the query for full-text search of the fields associated with the element by adding @field attribute to the <term> and others query child elements, except <near>. For example, the following query searches in the entire (dictionary) entry:
$collection//tei:entry[ft:query(., <query><term>dog</term></query>)

In contrast, the following query searches only within the lemma field of the (dictionary) entry:

$collection//tei:entry[ft:query(., <query><term field="lemma">dog</term></query>)

When set to yes, * or ? are allowed as the first character of a PrefixQuery and WildcardQuery. Note that this can produce very slow queries on big indexes.

  • The terms PrefixQuery and WildcardQuery are not mentioned anywhere else on this page and come from the source code. Definition should be simpler, for example:

When set to yes, * or ? are allowed as the first character of a query. Note that this can produce very slow queries on big indexes.

  • From my experience, <leading-wildcard>yes</leading-wildcard> or map { "leading-wildcard": "yes" } has effect only if the query is defined in Lucene format, not in XML format.

For example, following queries returns the same results:

ft:query(., <query><wildcard field="lemma">*epes</wildcard></query>, <options><leading-wildcard>no</leading-wildcard></options>)
ft:query(., <query><wildcard field="lemma">*epes</wildcard></query>, <options><leading-wildcard>yes</leading-wildcard></options>)
ft:query(.,  "lemma:*epes", <options><leading-wildcard>yes</leading-wildcard></options>)

While the following query throws an error (Syntax error in Lucene query string: Cannot parse 'lemma:*epes': '*' or '?' not allowed as first character in WildcardQuery):

ft:query(.,  "lemma:*epes", <options><leading-wildcard>no</leading-wildcard></options>)
  • In the list of elements occurring in query description, the <fuzzy> element is missing. Proposed definition:

Will match terms with an edit distance of at most @max-edits to the term. The value of @max-edits attribute is an integer between 0 and 2, default is 2. The similarity measurement is based on the Damerau-Levenshtein (optimal string alignment) algorithm.

<regex> A regular expression which will be matched against the terms of a document. Can be used instead of a element. For example:

  • Documentation should mention that not all regular expressions are allowed, for example ^ for the beginning of string or $ for the end of string. Link to the Lucene documentation could help with this.

Please provide the following

  • exist-db version: 6.2.0
  • documentation version: 6.2.0, 3Q21
Copy link

welcome bot commented Jan 27, 2025

Thanks for opening your first issue here!

@duncdrum
Copy link
Contributor

Thank you Boris. Would you be able to contribute a PR with these edits? The problem with leading wildcard not being applied in xml syntax smells like a bug to me. If you have an questions feel free to ask, I know that editing the docs can be intimidating first time around.

@daliboris
Copy link
Author

daliboris commented Jan 28, 2025

You're welcome.

I accept your challange and I'll do my best, but there must be a proofreader of my PR to correct my English.

If I understand the repository structure correctly, I should edit the resources in the src/main/xar-resources/data/lucene collection, right?

I'll also prepare unit tests for all newly added code snippets in the eXist-db source code repository.

@duncdrum
Copy link
Contributor

@daliboris yes this article is the right place. we ll certainly review and help with language edits where necessary .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants