Clean up syntax error reporting #3278

Swiddis · 2025-01-29T22:40:43Z

Description

While working on fixing PPL AST bugs (#3273), it stood out to me that our error reporting for syntax errors really isn't that good. This PR cleans up the handling of our errors.

Example query that's currently returning syntax errors:

POST _plugins/_ppl
{
  "query": "SOURCE = test_19a673e2 | WHERE x OR y"
}

Before:

{
  "error": {
    "reason": "Invalid Query",
    "details": "Failed to parse query due to offending symbol [OR] at: 'SOURCE = test_19a673e2 | WHERE x OR' <--- HERE... More details: Expecting tokens in {'SEARCH', 'DESCRIBE', 'SHOW', 'FROM', 'WHERE', 'FIELDS', 'RENAME', 'STATS', 'DEDUP', 'SORT', 'EVAL', 'HEAD', 'TOP', 'RARE', 'PARSE', 'METHOD', 'REGEX', 'PUNCT', 'GROK', 'PATTERN', 'PATTERNS', 'NEW_FIELD', 'KMEANS', 'AD', 'ML', 'FILLNULL', 'TRENDLINE', 'SOURCE', 'INDEX', 'D', 'DESC', 'DATASOURCES', 'SORTBY', 'AUTO', 'STR', 'NUM', 'KEEPEMPTY', 'CONSECUTIVE', 'DEDUP_SPLITVALUES', 'PARTITIONS', 'ALLNUM', 'DELIM', 'CENTROIDS', 'ITERATIONS', 'DISTANCE_TYPE', 'NUMBER_OF_TREES', 'SHINGLE_SIZE', 'SAMPLE_SIZE', 'OUTPUT_AFTER', 'TIME_DECAY', 'ANOMALY_RATE', 'CATEGORY_FIELD', 'TIME_FIELD', 'TIME_ZONE', 'TRAINING_DATA_SIZE', 'ANOMALY_SCORE_THRESHOLD', 'TRUE', 'FALSE', 'CONVERT_TZ', 'DATETIME', 'DAY', 'DAY_HOUR', 'DAY_MICROSECOND', 'DAY_MINUTE', 'DAY_OF_YEAR', 'DAY_SECOND', 'HOUR', 'HOUR_MICROSECOND', 'HOUR_MINUTE', 'HOUR_OF_DAY', 'HOUR_SECOND', 'INTERVAL', 'MICROSECOND', 'MILLISECOND', 'MINUTE', 'MINUTE_MICROSECOND', 'MINUTE_OF_DAY', 'MINUTE_OF_HOUR', 'MINUTE_SECOND', 'MONTH', 'MONTH_OF_YEAR', 'QUARTER', 'SECOND', 'SECOND_MICROSECOND', 'SECOND_OF_MINUTE', 'WEEK', 'WEEK_OF_YEAR', 'YEAR', 'YEAR_MONTH', 'IP', '.', '+', '-', '(', '`', 'AVG', 'COUNT', 'DISTINCT_COUNT', 'ESTDC', 'ESTDC_ERROR', 'MAX', 'MEAN', 'MEDIAN', 'MIN', 'MODE', 'RANGE', 'STDEV', 'STDEVP', 'SUM', 'SUMSQ', 'VAR_SAMP', 'VAR_POP', 'STDDEV_SAMP', 'STDDEV_POP', 'PERCENTILE', 'TAKE', 'FIRST', 'LAST', 'LIST', 'VALUES', 'EARLIEST', 'EARLIEST_TIME', 'LATEST', 'LATEST_TIME', 'PER_DAY', 'PER_HOUR', 'PER_MINUTE', 'PER_SECOND', 'RATE', 'SPARKLINE', 'C', 'DC', 'ABS', 'CBRT', 'CEIL', 'CEILING', 'CONV', 'CRC32', 'E', 'EXP', 'FLOOR', 'LN', 'LOG', 'LOG10', 'LOG2', 'MOD', 'PI', 'POSITION', 'POW', 'POWER', 'RAND', 'ROUND', 'SIGN', 'SQRT', 'TRUNCATE', 'ACOS', 'ASIN', 'ATAN', 'ATAN2', 'COS', 'COT', 'DEGREES', 'RADIANS', 'SIN', 'TAN', 'ADDDATE', 'ADDTIME', 'CURDATE', 'CURRENT_DATE', 'CURRENT_TIME', 'CURRENT_TIMESTAMP', 'CURTIME', 'DATE', 'DATEDIFF', 'DATE_ADD', 'DATE_FORMAT', 'DATE_SUB', 'DAYNAME', 'DAYOFMONTH', 'DAYOFWEEK', 'DAYOFYEAR', 'DAY_OF_MONTH', 'DAY_OF_WEEK', 'EXTRACT', 'FROM_DAYS', 'FROM_UNIXTIME', 'GET_FORMAT', 'LAST_DAY', 'LOCALTIME', 'LOCALTIMESTAMP', 'MAKEDATE', 'MAKETIME', 'MONTHNAME', 'NOW', 'PERIOD_ADD', 'PERIOD_DIFF', 'SEC_TO_TIME', 'STR_TO_DATE', 'SUBDATE', 'SUBTIME', 'SYSDATE', 'TIME', 'TIMEDIFF', 'TIMESTAMP', 'TIMESTAMPADD', 'TIMESTAMPDIFF', 'TIME_FORMAT', 'TIME_TO_SEC', 'TO_DAYS', 'TO_SECONDS', 'UNIX_TIMESTAMP', 'UTC_DATE', 'UTC_TIME', 'UTC_TIMESTAMP', 'WEEKDAY', 'YEARWEEK', 'SUBSTR', 'SUBSTRING', 'LTRIM', 'RTRIM', 'TRIM', 'LOWER', 'UPPER', 'CONCAT', 'CONCAT_WS', 'LENGTH', 'STRCMP', 'RIGHT', 'LEFT', 'ASCII', 'LOCATE', 'REPLACE', 'REVERSE', 'CAST', 'LIKE', 'ISNULL', 'ISNOTNULL', 'CIDRMATCH', 'IFNULL', 'NULLIF', 'IF', 'TYPEOF', 'ALLOW_LEADING_WILDCARD', 'ANALYZE_WILDCARD', 'ANALYZER', 'AUTO_GENERATE_SYNONYMS_PHRASE_QUERY', 'BOOST', 'CUTOFF_FREQUENCY', 'DEFAULT_FIELD', 'DEFAULT_OPERATOR', 'ENABLE_POSITION_INCREMENTS', 'ESCAPE', 'FLAGS', 'FUZZY_MAX_EXPANSIONS', 'FUZZY_PREFIX_LENGTH', 'FUZZY_TRANSPOSITIONS', 'FUZZY_REWRITE', 'FUZZINESS', 'LENIENT', 'LOW_FREQ_OPERATOR', 'MAX_DETERMINIZED_STATES', 'MAX_EXPANSIONS', 'MINIMUM_SHOULD_MATCH', 'OPERATOR', 'PHRASE_SLOP', 'PREFIX_LENGTH', 'QUOTE_ANALYZER', 'QUOTE_FIELD_SUFFIX', 'REWRITE', 'SLOP', 'TIE_BREAKER', 'TYPE', 'ZERO_TERMS_QUERY', 'SPAN', 'MS', 'S', 'M', 'H', 'W', 'Q', 'Y', ID, INTEGER_LITERAL, DECIMAL_LITERAL, DQUOTA_STRING, SQUOTA_STRING, BQUOTA_STRING}",
    "type": "SyntaxCheckException"
  },
  "status": 400
}

After:

{
  "error": {
    "reason": "Invalid Query",
    "details": "[OR] is not a valid term at this part of the query: '..._19a673e2 | WHERE x OR' <-- HERE. Expecting one of 284 possible tokens. Some examples: 'SEARCH', 'DESCRIBE', 'SHOW', 'FROM', 'WHERE', ...",
    "type": "SyntaxCheckException"
  },
  "status": 400
}

Even though this particular query should be valid, I think it's much clearer from this what the parser is mad about, and easier to put in a bug report too.

Related Issues

N/A

Check List

New functionality includes testing.
New functionality has been documented.
New functionality has javadoc added.
New functionality has a user manual doc added.
API changes companion pull request created.
Commits are signed per the DCO using --signoff.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Simeon Widdis <[email protected]>

Swiddis · 2025-01-29T22:49:57Z

docs/user/dql/troubleshooting.rst

 	{
-	  "query" : "SELECT * FROM sample:data"
+	  "query" : "SOURCE = test_index | where a > 0)"


This SQL error no longer outputs the same error message (new parsing engine?). I couldn't hit the ANTLR exception with a new SQL query, so I updated it to a PPL one.

dai-chen

So the improvement of this PR is truncate the long error message? If so, is it possible to simplify the changes especially line 60-80?

andy-k-improving · 2025-02-03T23:49:20Z

common/src/main/java/org/opensearch/sql/common/antlr/SyntaxAnalysisErrorListener.java

+    if (contextStartIndex < 3) { // The ellipses won't save us anything below the first 4 characters
+      return query.substring(0, offendingToken.getStopIndex() + 1);
+    }
+    return "..." + query.substring(contextStartIndex, offendingToken.getStopIndex() + 1);


[Optional] Perhaps we should have an unit || IT test for this ... code path, to make sure indeed lengthy log being truncated as expected.

Maybe, though I'm not sure what the value is -- the method has no dependencies and seems unlikely to be changed later on, and if it's being changed they'd almost certainly need to change the associated tests as well so it would only get in the way. If there's a part of the method that's in need of clarification to make it easier for others to parse/modify, let me know

Right, I share your point, let's it as it is then.

common/src/main/java/org/opensearch/sql/common/antlr/SyntaxAnalysisErrorListener.java

Signed-off-by: Simeon Widdis <[email protected]>

Swiddis · 2025-02-04T17:17:18Z

So the improvement of this PR is truncate the long error message? If so, is it possible to simplify the changes especially line 60-80?

Should be better now, refactored it to only do the mapping for the first few tokens so we can just use String.join everywhere

ykmr1224 · 2025-02-04T17:24:30Z

common/src/main/java/org/opensearch/sql/common/antlr/SyntaxAnalysisErrorListener.java

+    IntervalSet followSet = e.getExpectedTokens();
+    Vocabulary vocab = recognizer.getVocabulary();
+    List<String> tokenNames = new ArrayList<>(SUGGESTION_TRUNCATION_THRESHOLD);
+    for (int tokenType :
+        followSet
+            .toList()
+            .subList(0, Math.min(followSet.size(), SUGGESTION_TRUNCATION_THRESHOLD))) {
+      tokenNames.add(vocab.getDisplayName(tokenType));
+    }


Can we extract for readability?

YANG-DB · 2025-02-04T18:08:29Z

common/src/main/java/org/opensearch/sql/common/antlr/SyntaxAnalysisErrorListener.java

-import org.antlr.v4.runtime.RecognitionException;
-import org.antlr.v4.runtime.Recognizer;
-import org.antlr.v4.runtime.Token;
+import org.antlr.v4.runtime.*;


can u replace the * import with explicit classes

YANG-DB · 2025-02-04T18:09:37Z

integ-test/src/test/java/org/opensearch/sql/ppl/QueryAnalysisIT.java

@@ -82,25 +82,25 @@ public void queryShouldBeCaseInsensitiveInKeywords() {
  @Test
  public void queryNotStartingWithSearchCommandShouldFailSyntaxCheck() {
    String query = "fields firstname";
-    queryShouldThrowSyntaxException(query, "Failed to parse query due to offending symbol");
+    queryShouldThrowSyntaxException(query, "is not a valid term at this part of the query");


extract this string into a const

Swiddis added 3 commits January 29, 2025 14:25

Clean up syntax error reporting

9000314

Signed-off-by: Simeon Widdis <[email protected]>

Reuse contextStartIndex variable

6642475

Signed-off-by: Simeon Widdis <[email protected]>

Update tests for new error message

b12a37d

Signed-off-by: Simeon Widdis <[email protected]>

Swiddis requested review from ps48, kavithacm, derek-ho, joshuali925, dai-chen, YANG-DB, mengweieric, penghuo, seankao-az, MaxKsyunz, Yury-Fridlyand, anirudha, forestmvey, acarbonetto, GumpacG, ykmr1224, LantaoJin and noCharger as code owners January 29, 2025 22:40

Swiddis added the enhancement New feature or request label Jan 29, 2025

Update a comment

a47be17

Signed-off-by: Simeon Widdis <[email protected]>

Swiddis force-pushed the feature/clean-up-syntax-errors branch from 31649f1 to a47be17 Compare January 29, 2025 22:47

Swiddis commented Jan 29, 2025

View reviewed changes

dai-chen reviewed Jan 30, 2025

View reviewed changes

andy-k-improving reviewed Feb 3, 2025

View reviewed changes

common/src/main/java/org/opensearch/sql/common/antlr/SyntaxAnalysisErrorListener.java Outdated Show resolved Hide resolved

Simplify details construction

13a548d

Signed-off-by: Simeon Widdis <[email protected]>

ykmr1224 reviewed Feb 4, 2025

View reviewed changes

YANG-DB reviewed Feb 4, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean up syntax error reporting #3278

Clean up syntax error reporting #3278

Swiddis commented Jan 29, 2025 •

edited

Loading

Swiddis Jan 29, 2025

dai-chen left a comment

andy-k-improving Feb 3, 2025

Swiddis Feb 4, 2025

andy-k-improving Feb 4, 2025

Swiddis commented Feb 4, 2025

ykmr1224 Feb 4, 2025

YANG-DB Feb 4, 2025

YANG-DB Feb 4, 2025

Clean up syntax error reporting #3278

Are you sure you want to change the base?

Clean up syntax error reporting #3278

Conversation

Swiddis commented Jan 29, 2025 • edited Loading

Description

Related Issues

Check List

Swiddis Jan 29, 2025

Choose a reason for hiding this comment

dai-chen left a comment

Choose a reason for hiding this comment

andy-k-improving Feb 3, 2025

Choose a reason for hiding this comment

Swiddis Feb 4, 2025

Choose a reason for hiding this comment

andy-k-improving Feb 4, 2025

Choose a reason for hiding this comment

Swiddis commented Feb 4, 2025

ykmr1224 Feb 4, 2025

Choose a reason for hiding this comment

YANG-DB Feb 4, 2025

Choose a reason for hiding this comment

YANG-DB Feb 4, 2025

Choose a reason for hiding this comment

Swiddis commented Jan 29, 2025 •

edited

Loading