Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix parse error of system default /usr/share/nano/*.nanorc #1157

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

snazy
Copy link
Contributor

@snazy snazy commented Jan 20, 2025

(Recent) nano packages in Ubuntu come with some .nanorc files preinstalled.

jline's NanorcParser sadly fails parsing a couple of the regular expressions.

This change translates the regular expressions to Java regular expressions.

The differences are described in org.jline.builtins.SyntaxHighlighter#posixToJavaRegex:

  • The first ] in a bracket expression does not need to be escaped in Posix,translate to \].
  • Same as above for a negating bracket expression like [^][], translate to [^\]\[].
  • Any [ in a bracket expression does not need to be escaped in Posix, translate to \[.
  • Any ] not in a bracket expression is valid in both Posix and Java, no translation.
  • A backslash before the closing bracket like [.f\] is not an escape of the closing bracket, the backslash needs to be escaped for Java, translate to [.f\\].
  • Do not perform the above translations within an escape via \, except for \< and \> to \b.
  • Replace the Posix classes like [:word:] or [:digit:] to Java classes, inside and outside a bracket expression.

Test cases have been added.

There are however two regexes that still don't work, but those look invalid. To let jnano not trip over these, any PatternSyntaxException lets jnano just ignore the particular rule. A warning is logged in such cases.

Fixes #1156

@mattirn
Copy link
Collaborator

mattirn commented Jan 22, 2025

@snazy, I think we should do a more general fix rather than start to resolve patterns one by one.

In java regex characters [ and ] must be always escaped in a bracket expression. In POSIX regex [ character can be added in a bracket expression without escaping. Also ] character can be added without escaping if it is the first character of the bracket expression.

We can add a static method fixRegex(String regex) to Parse class that will escape [ and ] characters in a bracket expression.

@mattirn mattirn modified the milestone: 3.28.1 Jan 22, 2025
@snazy
Copy link
Contributor Author

snazy commented Jan 22, 2025

Wonder whether this gets into parsing the regex itself. However, do you think its good enough to handle the cases that the Character classes chapter in Pattern's javadoc describes? I.e. looking for [ (unless it's not escaped as \[ and then handle the escaping [ ] ?

@mattirn
Copy link
Collaborator

mattirn commented Jan 22, 2025

Wonder whether this gets into parsing the regex itself. However, do you think its good enough to handle the cases that the Character classes chapter in Pattern's javadoc describes?

Regexes in nanorc-files follow POSIX standard. Intersections ([a-z&&[def]]), unions ([a-d[m-p]]) etc descriped in Pattern's javadoc are not valid POSIX regexes.

I.e. looking for [ (unless it's not escaped as \[ and then handle the escaping [ ] ?

In regex the first unescaped '[' starts a bracket expression. In POSIX a bracket expression is closed by unescaped ']' if it is not the next character after '[' which started the bracket expression. Every square bracket between the starting and closing brackets must be escaped in order to obtain valid java pattern.

@snazy snazy force-pushed the nano-json-pattern branch from 4e472b9 to 46280b7 Compare January 23, 2025 12:39
@snazy snazy changed the title Fix parse error of system default /usr/share/nano/json.nanorc Fix parse error of system default /usr/share/nano/*.nanorc Jan 23, 2025
@snazy
Copy link
Contributor Author

snazy commented Jan 23, 2025

Okay, I think I have it now. There are quite some more cases that needed to be handled. I've added real world test cases for those - all escapings yielded by posixToJavaRegex look correct to me. I didn't verify it in a "live jnano" though.

@snazy snazy force-pushed the nano-json-pattern branch 2 times, most recently from 2c9df3b to 4d62902 Compare January 23, 2025 12:47
snazy added 2 commits January 27, 2025 10:08
(Recent) `nano` packages in Ubuntu come with some `.nanorc` files preinstalled.

jline's `NanorcParser` sadly fails parsing a couple of the regular expressions.

This change translates the regular expressions to Java regular expressions.

The differences are described in `org.jline.builtins.SyntaxHighlighter#posixToJavaRegex`:
* The first `]` in a bracket expression does not need to be escaped in Posix,translate to `\]`.
* Same as above for a negating bracket expression like `[^][]`, translate to `[^\]\[]`.
* Any `[` in a bracket expression does not need to be escaped in Posix, translate to `\[`.
* Any `]` not in a bracket expression is valid in both Posix and Java, no translation.
* A backslash before the closing bracket like `[.f\]` is not an escape of the closing bracket, the backslash needs to be escaped for Java, translate to `[.f\\]`.
* Do not perform the above translations within an escape via `\`.
* Do not perform the above translations for Posix "classes" like `[[:word:]]` or `[[:digit:]]` and their negation `[-[:word]]`.
* Do not perform the above translations for single-bracket Posix classes like `[:digit:]`, and handle the case of single-bracket Posix classes inside bracket expressions, like `[[:digit:]-.]`.

Test cases have been added.

There are however two regexes that still don't work, but those look invalid. To let jnano not trip over these, any `PatternSyntaxException` lets jnano just ignore the particular rule. A warning is logged in such cases.

Fixes jline#1156
Pull functionality (`\<`/`\>` and posix class translation) of previous `fixRegexes` into the new one.

Added a "test" (it just parses and prints) to process all locally installed `.nanorc` files.

Adopted existing tests using the previous `fixRegexes` function.

Fixed the wrong handling in the previous commit of Posix classes in bracket expressions.
@snazy snazy force-pushed the nano-json-pattern branch from 4d62902 to 6c26cf7 Compare January 27, 2025 10:44
@snazy
Copy link
Contributor Author

snazy commented Jan 27, 2025

Updated the PR quite a bit in the 2nd commit.

Looking at the regexes and source rules printed by the processLocalNanorcFiles "test", the translations look reasonable.

@snazy snazy force-pushed the nano-json-pattern branch from c7bf80c to 1ba7439 Compare January 27, 2025 10:49
@mattirn mattirn added this to the 3.28.1 milestone Jan 29, 2025
@gnodet gnodet modified the milestones: 3.28.1, 3.28.2 Jan 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Nano syntax highlighter fails to parse pre-installed json.nanorc
3 participants