Skip to content

Hana - Language for translating all syntax and commands are expressed in the Korean language to Python style code

Notifications You must be signed in to change notification settings

jeyangg/COMS4115-PLT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

88 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

COMS4115-PLT

Hana - Language for translating all syntax and commands are expressed in the Korean language to Python style code

Please cd 1_LexicalAnalyzer or direct to 1_LexicalAnalyzer directory for Readme on Lexer.

Please cd 2_SyntaticAnalyzer or direct to 2_SyntaticAnalyzer directory for Readme on Parser.

Please cd 3_CodeGen or direct to 3_CodeGen directory for Readme on Code Generation.

Token Types

1. Keyword Tokens

These tokens correspond to the following keywords in Python:

  • 함수 (function)
  • 만약에 (if)
  • 아니면 (else)
  • 동안에 (while)
  • 반환 (return)
  • 출력 (print)
  • 진실 (true)
  • 거짓 (false)
  • (null)

Data Structure

  • 배열 (Array): 길이 (len()), 추가 (append), 뽑기 (pop), 확장 (extend), 정렬 (sort)
  • 딕셔너리 (Dictionary): (key()), 아이템 (item())

Math Function

  • 랜덤 (random)
  • 절댓값 (abs)
  • 최소값 (min)
  • 최대값 (max)

2. Identifier Tokens

We are allowing identifiers to consist of valid Hangul characters, digits, and underscores. The digits must precede with an underscore. These identifiers will include Hangul Jamo, Compatibility Jamo, and pre-composed Hangul syllables. The Unicode ranges for each of these types of characters are:

  • \u1100-\u11FF: Hangul Jamo (used for composing Hangul syllables).
  • \u3130-\u318F: Hangul Compatibility Jamo (used for compatibility with older encodings).
  • \uAC00-\uD7AF: Pre-composed Hangul syllables (the most commonly used Korean characters in modern texts). We also allow:
  • Digits (0-9) only after an underscore.
  • Underscore (_) only between two valid Hangul characters or digits.

Regular Expression: [\u1100-\u11FF\u3130-\u318F\uAC00-\uD7AF][[_][0-9]*[\u1100-\u11FF\u3130-\u318F\uAC00-\uD7AF]*]* Valid examples: 한국어_123 한국어_1_한 한국어_한_1

3. Operator Tokens:

  • '+', '-', '*', '**', '/', '%', '=', '==', '!=', '<', '>', '<=', '>='

Arithmetic Operators

  • Order of Evaluation Arithmetic operators follow the PEMDAS rule. This means parentheses () have the highest precedence, followed by multiplication *, division /, and modulus %, which take precedence over addition + and subtraction -.
  • Addition and Subtraction We permit the use of the addition and subtraction operator between two variables of type int or float, or between two expressions that result in int or float. The operands on both sides of the operator can be of different type (i.e. can be int + float). We do not support the shorthand operators ++ or -- for incrementing or decrementing by 1.
  • Multiplication and Division Only allow int and float (traditional mathematical multiplication and division) Division: quotient remain and remainder is removed
  • Modulo Modulo (%) returns the division’s remainder. Only use int.

Boolean/Equivalence Operators: <, >, <=, >=, !=, ==

Logical Operators: 그리고 (and), 이거나 (or)

4. Delimiter Tokens

  • (: Open Parenthesis
  • ): Close Parenthesis
  • {: Open Brace
  • }: Close Brace
  • ,: Comma
  • :: Colon (used after control flow statements like 만약 or after function definitions like 함수)
  • ;: Semicolon (optional terminator, for clarity or separation)

Regex Rule: [\(\)\{\},:;]

5. Comments/String Tokens

  • Comment Tokens: Single-line Comments: In Hana, comments can start with a hash # (similar to Python). Everything following # on that line is ignored by the interpreter. Pattern: #.* (matches everything after # to the end of the line).
  • String Tokens: String Literals: Strings are enclosed in double quotes ("). All characters within the quotes are considered part of the string until the closing quote. Pattern: "(?:[^"\\]|\.)*" (matches any sequence of characters between double quotes, allowing escaped characters like ").

6. Number Tokens

  • Integer: Sequence of digits: '[0-9]+'
  • Float: Sequence of digits, decimal point, sequence of digits: '[0-9]+.[0-9]+'

Je Yang (jy3342) and Ella Kim (yk3040)

About

Hana - Language for translating all syntax and commands are expressed in the Korean language to Python style code

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published