From 7671e44e331b4a3ec6947c3865e78e6016f9af56 Mon Sep 17 00:00:00 2001 From: mwu Date: Thu, 2 Apr 2020 04:00:34 +0200 Subject: [PATCH 1/8] first draft, wip --- docs/connections.md | 249 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 249 insertions(+) create mode 100644 docs/connections.md diff --git a/docs/connections.md b/docs/connections.md new file mode 100644 index 0000000000..fadebf8cfd --- /dev/null +++ b/docs/connections.md @@ -0,0 +1,249 @@ +# General Rules +This section attempts to describe how the identifiers are introduced and +resolved in the language. + + +## Definition +Definition is any binding introduced using assignment (`=` operator) syntax. +Definition may be optionally preceeded by a signature in a form `name:type`. + +For example, the following is definition: +``` +add : Int -> Int -> Int +add a b = a + b +``` + +or this is a definition with no signature: + +``` +five = 5 +``` + +Definitions can be also introduced within a nested code block: +``` +main = + a = 2 + b = 2 + add a b = a + b + add a b +``` + +Here `main` is a definition and has nested definitions of `a`, `b` and `add`. + +### Term discrepancy with IDE codebase +Note: this document uses term "definition" in a wider term than usually IDE +codebase does. Here any assignment-binding is a definition, no matter if it has +any arguments or not. (i.e. this definition is either gui-definition or a gui-node) + + +## Scope +Scope is the span in the code which shares the bound identifiers set. Scopes can +be seen as span-stree, creating a hierarchical, nested structure. + +Nested scope is allowed to: +* access identifiers from the outer scope; +* shadow identifiers from nested scope by introducing new bindings. + +Otherwise, it is illegal to bind twice the same identifier within a scope. + +// TODO weren't we supposed to support overloading? + +// TODO what is duplicate definition and what is unification? Is there a +difference between `=` and `:` after all? + +The identifier is bound scope-wide. It is visible and usable in the lines before +the actual binding occurs. Some monads (like IO) can introduce order-dependent +behavior, however. This is not something that GUI can tell by looking at AST. + +Symbols introduced into a scope is not visible outside the scope's subtree. + +Scopes are introduced by: +* module/file (the root scope); +* code blocks following operators; +* module-level definitions: for both signature (if present) and assignment-binding; +* `->` operator for its right-hand side. + +Example: +``` +main : a +main = + succ = b -> b+1 + succ 0 +``` + +The sample code is a module with a single top-level definition. + +Here we have four scopes, each one nested in the previous one: +* root scope (consisting of unindented lines), where `main` is visible; +* definition `main` scope, where `main` and `a` are visible; +* code block scope (all lines equally indented after + `main =`), where both `main`, `a` and `succ` are visible +* lambda body scope (right-hand side after `a ->`), where all `main`, + `succ`, `a` and `b` are visible. + +Example: +``` +test = a -> b -> + sum = a + b +``` + +Here we have root scope, `test`'s scope, then scopes of lambdas (after `a ->` +and after `b ->`) and finally scope for code block. + +// TODO what difference this make if code block introduces its own scope? + + +## Contexts +There are two kinds of context: pattern context and non-pattern context. + +Each position in code is either in a pattern context or not. By default code is +in non-pattern context. Pattern context is introduced locally by certain +language constructs: `=`, `:` and `->` operators (i.e. definition bindings, signature type ascription and lambdas). + +Each usage of a variable name (identifier starting with a lower case letter) +actually binds this identifier with whatever it is being matched to. The bound +identifier is visible and usable within the target scope. + +// TODO are operator identifiers bindable? Likely they require special rules. + +Pattern context always has the single target scope, where the identifiers are +introduced into. What is the target scope depends on the operator that +introduced pattern context. + +Pattern context is introduced within: +* left-hand side of assignment operator, e.g. `main` in `main = println + "Hello"`; +* right-hand side of a colon operator, e.g. `a` in `foo:a`; +* left-hand side of an arrow operator, e.g. `a` in `a -> a + 1`. + +Both `=` and `:` introduce identifiers into the scope where they occur, as they +do not introduce any new scope of their own. + +The `->` operator introduces identifiers only into the scope of their right-hand +side, if the lambda is not introduced in what is already a pattern context. + +Example: +``` +succ = a -> a + 1 +foo = a +``` + +Here lambda introduces `a` only into its right-hand side. The `a` that is being +assigned to `foo` is not the same `a` as in lambda — it must be defined +elsewhere in the module or the code will fail. + +However, if `->` appears in a pattern context, its left-hand side identifiers +are introduced into the scope targeted by the outer pattern context. + +Example: +``` +(a -> b) -> a.default +``` + +If not for this second rule, the `a` would be visible only in place of +expression `b`. However, now it is visible in the outer lambda body and can be +accessed. + + +# IDE Connection Discovery +IDE presents a definition body as a graph. Code lines of the body of the +definition are displayed as nodes (unless they're definitions). + +We want to display connection between nodes, if an identifier introduced by one +node into the graph's scope is used in another node's expression. + +Some simplifications are currently assumed: +* Connections care only about usage of symbols introduced by assignment + definition. For example, variables introduced by `:` operator's right side do + not form connections. Same for lambda arguments. +* we care only about identifiers introduced into graph's scope: anything that + appears in subscopes can be disregarded. However, IDE must be aware of + shadowing to properly tell if an identifier usage actually refers to an + identifier from graph's scope. +* There is no graph for the module's root scope, so any special rules for the + root scope might be irrelevant. +* IDE is concerned about producing correct results for correct programs. It does + not care about diagnosing ill-formed programs, quite the opposite. We want to + keep output as similar to the correct one as possible. (we will often + visualize programs that are in progress of editing) +* For the first release IDE can disregard the type ascription operator (`:`). + + +// TODO: what is graph's scope? Is this a definition's scope (if there's such +thing) or code block's scope? + +Basically, the problem can be reduced to being able to describe for any line in +code block the list of identifiers it introduces into the graph's scope and the +list of identifiers from graph's scope that it uses. + + +## Connection + +Connection is an ordered pair of endpoints: source and destination. Endpoint is +pair of node ID and crumbs. Source endpoint identifiers the node which +introduces the identifier (source of data), and crumb describes the identifier +position in the node's assignment's left-hand side. Destination endpoint +similarly describes position in node's expression where the identifier is used. + + + + + + + +--- + +# TO BE REWRITTEN + + +``` +a -> a -> b +``` + +Here the first `a` and second `a` are separate identifiers, the latter shadowing +the first one. If one wanted to express that both arguments are of the same +type, `a -> A -> b` should have been used. + +--- + + + +``` +a -> b = c +``` +Does this introduce the `a` into the module's scope? + +(rules say "only if inline `=` does not introduce a new scope) + +--- + +``` +a = Int +foo = 5:a +``` + +Does `a` in `5:a` refers to the previous line's `a` or is separate? + +Marcin: na pewno nie shadowują, mogą się unifikowac lub kolidowac jako +redefinicja + + + +Co jeżeli +``` +a = Int +foo = Int : a +``` + + + +TODO +W top levelu jaki jest dokładnie obszar scope'u definicji? +Czy obejmuje sygnaturę? + +Czy może być wiele sygnatur do definicji? +Jak dać sygnaturę do czegoś co nie ma żadnej nazwy lub ma wiele nazw? + +Różnica między pattern-matchingiem a typowaniem? +Różnica między `a = 5` oraz `5:a`. Co jest wartością, co jest typem? +Sygnatura bez definicji? \ No newline at end of file From e5de56a9a9270712a051da2e4e311d35cbb0bbd9 Mon Sep 17 00:00:00 2001 From: mwu Date: Thu, 2 Apr 2020 13:35:23 +0200 Subject: [PATCH 2/8] wip --- docs/connections.md | 97 +++++++++++++++++++++++++-------------------- 1 file changed, 53 insertions(+), 44 deletions(-) diff --git a/docs/connections.md b/docs/connections.md index fadebf8cfd..2180ceec27 100644 --- a/docs/connections.md +++ b/docs/connections.md @@ -59,7 +59,7 @@ Symbols introduced into a scope is not visible outside the scope's subtree. Scopes are introduced by: * module/file (the root scope); -* code blocks following operators; +* code blocks (i.e. the block that follows the line with an trailing operator); * module-level definitions: for both signature (if present) and assignment-binding; * `->` operator for its right-hand side. @@ -100,9 +100,9 @@ Each position in code is either in a pattern context or not. By default code is in non-pattern context. Pattern context is introduced locally by certain language constructs: `=`, `:` and `->` operators (i.e. definition bindings, signature type ascription and lambdas). -Each usage of a variable name (identifier starting with a lower case letter) -actually binds this identifier with whatever it is being matched to. The bound -identifier is visible and usable within the target scope. +Indide pattern context each usage of a variable name (identifier starting with a +lower case letter) actually binds this identifier with whatever it is being +matched to. The bound identifier is visible and usable within the target scope. // TODO are operator identifiers bindable? Likely they require special rules. @@ -144,6 +144,55 @@ If not for this second rule, the `a` would be visible only in place of expression `b`. However, now it is visible in the outer lambda body and can be accessed. +## Examples +Unless otherwise stated, it should be assumed that given examples are lines +occurring within a definition's body code block. + + +``` +a -> a -> b +``` + +Here the first `a` and second `a` are separate identifiers, the latter shadowing +the first one. If one wanted to express that both arguments are of the same +type, `a -> A -> b` would have been used. `b` refers to an identifier from +graph's scope. + +--- + + +``` +a -> b = c +``` + +If such line occurs on the top-level, `a` and `b` are introduced into the +definition scope. Otherwisee, they are introduced into the parent scope. + +Does this introduce the `a` into the module's scope? + +(rules say "only if inline `=` does not introduce a new scope <=> on the top level) + +--- + +``` +a = Int +foo = 5:a +``` + +Does `a` in `5:a` refers to the previous line's `a` or is separate? + +Marcin: na pewno nie shadowują, mogą się unifikowac lub kolidowac jako +redefinicja + + + +Co jeżeli +``` +a = Int +foo = Int : a +``` + + # IDE Connection Discovery IDE presents a definition body as a graph. Code lines of the body of the @@ -196,46 +245,6 @@ similarly describes position in node's expression where the identifier is used. # TO BE REWRITTEN -``` -a -> a -> b -``` - -Here the first `a` and second `a` are separate identifiers, the latter shadowing -the first one. If one wanted to express that both arguments are of the same -type, `a -> A -> b` should have been used. - ---- - - - -``` -a -> b = c -``` -Does this introduce the `a` into the module's scope? - -(rules say "only if inline `=` does not introduce a new scope) - ---- - -``` -a = Int -foo = 5:a -``` - -Does `a` in `5:a` refers to the previous line's `a` or is separate? - -Marcin: na pewno nie shadowują, mogą się unifikowac lub kolidowac jako -redefinicja - - - -Co jeżeli -``` -a = Int -foo = Int : a -``` - - TODO W top levelu jaki jest dokładnie obszar scope'u definicji? From e5c6f0edbb9d93094230f94dd5c227378adb700b Mon Sep 17 00:00:00 2001 From: mwu Date: Fri, 3 Apr 2020 03:59:15 +0200 Subject: [PATCH 3/8] [wip] --- docs/connections.md | 459 ++++++++++++++++++++++++++++++++++++-------- 1 file changed, 376 insertions(+), 83 deletions(-) diff --git a/docs/connections.md b/docs/connections.md index 2180ceec27..52f5d8a3f0 100644 --- a/docs/connections.md +++ b/docs/connections.md @@ -2,60 +2,46 @@ This section attempts to describe how the identifiers are introduced and resolved in the language. +## Identifier +Identifier is a name that may denote value or type. Syntactically we recognize: +* variables, being names starting with a lower-case character, like `foo` or + `main`. +* constructors, being names starting with an upper-case character, like `Foo` or + `Option`. +* operators, being special symbols like `+` or `<$>`. -## Definition -Definition is any binding introduced using assignment (`=` operator) syntax. -Definition may be optionally preceeded by a signature in a form `name:type`. +In non-pattern contexts, referring to an existing identifier is +case-insensitive. So `foo` can be referred to as `Foo`. -For example, the following is definition: -``` -add : Int -> Int -> Int -add a b = a + b -``` - -or this is a definition with no signature: - -``` -five = 5 -``` - -Definitions can be also introduced within a nested code block: -``` -main = - a = 2 - b = 2 - add a b = a + b - add a b -``` - -Here `main` is a definition and has nested definitions of `a`, `b` and `add`. - -### Term discrepancy with IDE codebase -Note: this document uses term "definition" in a wider term than usually IDE -codebase does. Here any assignment-binding is a definition, no matter if it has -any arguments or not. (i.e. this definition is either gui-definition or a gui-node) +In pattern context, lower-cased names are used to introduce a binding (or a +constraint), while upper-cased name will refer to an already bound identifier. +Operators behave as variables in prefix position (e.g. `+` in `+ a b`) or as +constructors in an infix position (e.g. `,` in `a,b`). ## Scope Scope is the span in the code which shares the bound identifiers set. Scopes can -be seen as span-stree, creating a hierarchical, nested structure. +be seen as a span-stree structure, creating a hierarchical structure. Nested scope is allowed to: * access identifiers from the outer scope; -* shadow identifiers from nested scope by introducing new bindings. - -Otherwise, it is illegal to bind twice the same identifier within a scope. +* shadow identifiers from nested scope by introducing new bindings; +* introduce new constraints on the identifiers from parent (or own) scopes. -// TODO weren't we supposed to support overloading? - -// TODO what is duplicate definition and what is unification? Is there a -difference between `=` and `:` after all? +The same identifier may be bound to multiple times in the same scope +(overloading). It is allowed only for method overloads that differ in the type of +the `this` parameter. The identifier is bound scope-wide. It is visible and usable in the lines before the actual binding occurs. Some monads (like IO) can introduce order-dependent -behavior, however. This is not something that GUI can tell by looking at AST. +behavior, however. This is not something that IDE is (or can be) concerned about +when figuring out connections. + +Identifier is bound by using a variable-type identifier in the pattern context. +Exact behavior depends on the language construction that was to introduce the identifier. -Symbols introduced into a scope is not visible outside the scope's subtree. +Identifier introduced into a scope is visible only in the scope's subtree +(lexical scoping). Scopes are introduced by: * module/file (the root scope); @@ -63,6 +49,9 @@ Scopes are introduced by: * module-level definitions: for both signature (if present) and assignment-binding; * `->` operator for its right-hand side. +### Examples + + Example: ``` main : a @@ -90,25 +79,26 @@ test = a -> b -> Here we have root scope, `test`'s scope, then scopes of lambdas (after `a ->` and after `b ->`) and finally scope for code block. -// TODO what difference this make if code block introduces its own scope? +Example: -## Contexts -There are two kinds of context: pattern context and non-pattern context. +``` +main = + foo : a + foo = 2 -Each position in code is either in a pattern context or not. By default code is -in non-pattern context. Pattern context is introduced locally by certain -language constructs: `=`, `:` and `->` operators (i.e. definition bindings, signature type ascription and lambdas). + bar : a + bar = 3 +``` -Indide pattern context each usage of a variable name (identifier starting with a -lower case letter) actually binds this identifier with whatever it is being -matched to. The bound identifier is visible and usable within the target scope. +While `main` as root-level has its own scope, `foo` and `bar` do not. `a` +introduced by type signatures belongs to the `main`'s scope, and is shared by both +nested definitions. -// TODO are operator identifiers bindable? Likely they require special rules. - -Pattern context always has the single target scope, where the identifiers are -introduced into. What is the target scope depends on the operator that -introduced pattern context. +## Patterns +Patterns are context in the code where variables can be used to introduce new +identifiers into some scope. Constructors (that also include literals) are used +to pattern match against and potentially destructure more complex values. Pattern context is introduced within: * left-hand side of assignment operator, e.g. `main` in `main = println @@ -116,10 +106,131 @@ Pattern context is introduced within: * right-hand side of a colon operator, e.g. `a` in `foo:a`; * left-hand side of an arrow operator, e.g. `a` in `a -> a + 1`. -Both `=` and `:` introduce identifiers into the scope where they occur, as they -do not introduce any new scope of their own. +Details will follow with description of these operators. + + + +## Assignment +The assignment operator `=` is deeply magical. Its basic form is `name = +body`, where it introduces `name` into the parent scope. + +Example: +``` +five = 5 +``` + +Introduces the name `five` into the parent scope. + +Assignment operator is also used to define functions, extension methods and +perform pattern matching. For each of these cases appropriate desugaring is +applied. See sections below for details for particular cases. + +Roughly speaking, if the name is a variable, it is introduced and its arguments +(if present) are visible only in the definition body. If the name is +constructor, it will be pattern-matched and any variables used +for constructor arguments will be bound. + +If any macros are used in the definition, it is assumed that if it appears in a +pattern context, their vars introduce variables — or otherwise use variables. +Basically, it is similar to if a grouped expression with tokens matched by a +macro was in place. + +In any place where variable is used in the pattern, it can be substituted by +underscore `_` to disregard the value without introducing any identifier. + + +Examples: +``` +foo a b = a + b +``` +introduces name `foo` + +--- + +``` +Foo a b = bar +``` +introduces names `a` and `b` + +--- + +``` +a.hello = print "Hello" +``` +introduces name `hello` -The `->` operator introduces identifiers only into the scope of their right-hand + +### Function definitions +If the assignment's left-hand side is a prefix application chain, where the +left-most name (i.e. the function name) is a variable, the assignment is said to +be a function definition. Each prefix argument is converted into a lambda +argument. + +``` +log_name object = print object.name +``` +is desugared into: +``` +log_name = object -> print object.name +``` + +This desugaring shows why only `log_name` is introduced into the scope, while +`object` is visible only in the definition's body. + +If the operator appears in the function name position, it can be defined as +well: +``` +^ a n = a * a ^ (n-1) +``` + +### Pattern matching +If the assignment's left-hand side is a prefix application chain where the +left-most name is a constructor, it will be desugared into a pattern match. + +Example: +``` +Some value = get_opt +``` + +will be desugared into: + +``` +value = case get_opt of + Some b -> b + _ -> error +``` + +Therefore, `value` will be introduced into the parent scope. + +Using operators in the infix position will also attempt to pattern match its +operands. For example: +``` +x,y = get_position # introduces `x` and `y` +``` + +### Extensions methods +If the application target uses accessor operator `.`, e.g. `Int.add`, the last +segment is the introduced indentifier and the previous segments are used to type +the implicit `this` parameter. + +For example: +``` +Foo.bar = 5 +``` +translated to: +``` +bar this:Foo = 5 +``` + +Which is then desugared into a lambda. The introduced name is only `bar`. + + +## Lambdas +`arg -> value` is the syntax for lambdas. Left-hand side is a pattern for the +argument (lambdas are always unary) and the right-hand side is its body. Lambda +body has its own scope. + +The `->` pattern introduces identifiers only into the scope of their right-hand side, if the lambda is not introduced in what is already a pattern context. Example: @@ -144,54 +255,85 @@ If not for this second rule, the `a` would be visible only in place of expression `b`. However, now it is visible in the outer lambda body and can be accessed. -## Examples -Unless otherwise stated, it should be assumed that given examples are lines -occurring within a definition's body code block. -``` -a -> a -> b -``` +TODO -Here the first `a` and second `a` are separate identifiers, the latter shadowing -the first one. If one wanted to express that both arguments are of the same -type, `a -> A -> b` would have been used. `b` refers to an identifier from -graph's scope. +## Type ascription +The type ascription operator `:` introduces pattern scope for its right hand +side. The basic form is `value:type`. The type identifiers used in the +right-hand side will be constrained to include appropriate values in their value +set. ---- +It is legal to assign constraints on an identifier using `:` multiple times in +any of the scopes where identifier is visible. + +When variable name appears in type pattern, the type denoted by this identifier +will be required to contain given `value`. If variable does not denote any type +visible in the scope, the identifier will be introduced into the current scope. +TODO: Open design question — perhaps variables after `:` should be only allowed +to introduce new identifiers but not to constrain existing ones. + +TODO examples: ``` -a -> b = c +a : 5 +A : 5 +5 : a +5 : A ``` -If such line occurs on the top-level, `a` and `b` are introduced into the -definition scope. Otherwisee, they are introduced into the parent scope. +TODO: Open question: does empty type exist? Apparently it makes more sense in +lazy languages, rather then strict ones. -Does this introduce the `a` into the module's scope? -(rules say "only if inline `=` does not introduce a new scope <=> on the top level) +TODO TODO TODO + + +TODO signatures and their relation with scoping. Difference for root and +non-root definitions. + +Examples: ---- ``` -a = Int -foo = 5:a +add : Int -> Int -> Int +add a b = a + b ``` -Does `a` in `5:a` refers to the previous line's `a` or is separate? +--- + +## Current engine limitations +Note: "current" means "in the scope of the first alpha release of enso", +not "at the moment of writing this document". -Marcin: na pewno nie shadowują, mogą się unifikowac lub kolidowac jako -redefinicja +### Extension methods +The extensions methods (taking `this` as the first parameter) can be defined +only using the sugared syntax. +While both +``` +foo this:a = print "hello" +``` +and -Co jeżeli ``` -a = Int -foo = Int : a +a.foo = print "hello" ``` +are equivalent, engine currently supports only the latter. + +// TODO what if non-first argument is named `this` ? Is the magic happening only +for this particuar name? + + +### Type ascription +The type ascription and signatures are not properly supported. IDE should +disregard them for the time being. + + # IDE Connection Discovery @@ -203,7 +345,7 @@ node into the graph's scope is used in another node's expression. Some simplifications are currently assumed: * Connections care only about usage of symbols introduced by assignment - definition. For example, variables introduced by `:` operator's right side do + definition. For example, symbols introduced by `:` operator's right side do not form connections. Same for lambda arguments. * we care only about identifiers introduced into graph's scope: anything that appears in subscopes can be disregarded. However, IDE must be aware of @@ -226,6 +368,8 @@ code block the list of identifiers it introduces into the graph's scope and the list of identifiers from graph's scope that it uses. + + ## Connection Connection is an ordered pair of endpoints: source and destination. Endpoint is @@ -235,16 +379,165 @@ position in the node's assignment's left-hand side. Destination endpoint similarly describes position in node's expression where the identifier is used. +--- +--- +--- + +# TO BE REWRITTEN --- # TO BE REWRITTEN +## Assignments +Assignment operator is used to define identifiers. Its left-hand side is a pattern +context. Pattern context means that usage of a variable name (identifier +starting with a lower case letter) actually binds this identifier with whatever +it is being matched to. The bound identifier is visible and usable within the +target scope. + +When upper-cased variable is used in a pattern context, it must refer to an +existing identifier and will perform pattern matching. Example: + +``` +Some a = foo +``` + +This introduces `a`, while using `Some` and `foo`. + +// TODO jak to dokładnie ma się po zdesugorawoaniu? +`Some = a -> foo` ? + + +Assignments are used to bind values to identifiers. For example: + +``` +foo = 5 +``` +This introduces an identifier `foo` into the containing scope. + +If `foo` was already introduced by a parent scope, it will be shadowed. + +Example: +``` +foo = 5 +main = + foo = 5 # this is a nested scope, shadowing occurs +``` + + +If `foo` was already introduced by the current scope, error will be raised. + +Example: + +``` +foo = 5 +foo = 5 # error, symbol defined twice in the same scope +``` + + + +## Contexts +There are two kinds of context: pattern context and non-pattern context. + +Each position in code is either in a pattern context or not. By default code is +in non-pattern context. Pattern context is introduced locally by certain +language constructs: `=`, `:` and `->` operators (i.e. definition bindings, signature type ascription and lambdas). + +Inside a pattern context each usage of a variable name (identifier starting with a +lower case letter) actually binds this identifier with whatever it is being +matched to. The bound identifier is visible and usable within the target scope. + +Pattern context always has the single target scope, where the identifiers are +introduced into. What is the target scope depends on the operator that +introduced pattern context. + +Pattern context is introduced within: +* left-hand side of assignment operator, e.g. `main` in `main = println + "Hello"`; +* right-hand side of a colon operator, e.g. `a` in `foo:a`; +* left-hand side of an arrow operator, e.g. `a` in `a -> a + 1`. + +Both `=` and `:` introduce identifiers into the scope where they occur, as they +do not introduce any new scope of their own. + + +## Examples +Unless otherwise stated, it should be assumed that given examples are lines +occurring within a definition's body code block. + + +``` +a -> a -> b +``` + +Here the first `a` and second `a` are separate identifiers, the latter shadowing +the first one. If one wanted to express that both arguments are of the same +type, `a -> A -> b` would have been used. `b` refers to an identifier from +graph's scope. + +OPEN QUESTION: actually it might be "nice" to have both `a` refer to the same +identifier here. + +--- + +Overloading. + +``` +# root scope + +foo this:a = … + +foo this:a = … +``` + +In this case the `a` for each `foo` definition will be inferred by the compiler. +If the `a` ends up being different for them, they are valid overloads. +Otherwise, it is an error of having multiple definitions for the same name. + + +--- + + +``` +a -> b = c +``` + +If such line occurs on the top-level, `a` and `b` are introduced into the +definition scope. Otherwisee, they are introduced into the parent scope. + +Does this introduce the `a` into the module's scope? + +(rules say "only if inline `=` does not introduce a new scope <=> on the top level) + +Nie moze być `->` po lewej. + +--- + +``` +a = Int +foo = 5:a +``` + +What if + +``` +a = Int +foo = Int : a +``` + + + + + + + + TODO W top levelu jaki jest dokładnie obszar scope'u definicji? From 0840f81738e2ec27aed43761bc92be8538a8295a Mon Sep 17 00:00:00 2001 From: mwu Date: Sat, 4 Apr 2020 03:54:28 +0200 Subject: [PATCH 4/8] updates --- docs/connections.md | 512 ++++++++++++++++++++++---------------------- 1 file changed, 253 insertions(+), 259 deletions(-) diff --git a/docs/connections.md b/docs/connections.md index 52f5d8a3f0..da6dd40718 100644 --- a/docs/connections.md +++ b/docs/connections.md @@ -2,38 +2,59 @@ This section attempts to describe how the identifiers are introduced and resolved in the language. +The purpose is not to specify the whole language. Just enough for an IDE team +members to be able to reason where identifiers are introduced and what entity +identifier usage refers to. + +This is the base allowing IDE to describe what connections are in the displayed graph. + + ## Identifier -Identifier is a name that may denote value or type. Syntactically we recognize: -* variables, being names starting with a lower-case character, like `foo` or - `main`. -* constructors, being names starting with an upper-case character, like `Foo` or - `Option`. -* operators, being special symbols like `+` or `<$>`. +Identifier is a name that may denote value of some type. Syntactically we recognize: +* variables, being names that do not contain upper-cased characters, like `foo2` + or `make_new`; +* constructors, which are like variables but with first character and every + character directly following underscore upper-cased (e.g. `Foo2` or + `Make_New`); +* operators, being names consisting of operator symbols (e.g. `+` or `<$>`). + Specifically, operator name may contain following characters + `!$%&*+-/<>?^~|:\,.()[]{}=`. Not every sequence of these characters is a valid + operator name, as they could collide with other language constructs. + +Any other names not matching requirements above, like `HTTP`, `foO` or +`Make_new` are not allowed. In non-pattern contexts, referring to an existing identifier is -case-insensitive. So `foo` can be referred to as `Foo`. +case-insensitive. So `foo` can be referred to as `Foo`. Note that `fOo` or `FOO` +are not valid identifiers, as upper-cased letter may appear only as the first +letter or after underscore (e.g. `Make_Request`). In pattern context, lower-cased names are used to introduce a binding (or a constraint), while upper-cased name will refer to an already bound identifier. +Binding means introducing an identifier into scope and associating it with some +value. Identifier can be introduced also without binding it to any specific +value (e.g. as type constraint). + Operators behave as variables in prefix position (e.g. `+` in `+ a b`) or as constructors in an infix position (e.g. `,` in `a,b`). ## Scope -Scope is the span in the code which shares the bound identifiers set. Scopes can -be seen as a span-stree structure, creating a hierarchical structure. +Scope is the span in the code which shares the available identifiers set. Scopes +can be seen as a span-tree structure, covering the whole program code. Nested scope is allowed to: * access identifiers from the outer scope; -* shadow identifiers from nested scope by introducing new bindings; +* shadow identifiers from nested scope with a new binding; * introduce new constraints on the identifiers from parent (or own) scopes. The same identifier may be bound to multiple times in the same scope (overloading). It is allowed only for method overloads that differ in the type of -the `this` parameter. +the `this` parameter. This limitation may be relaxed in the future, if proper +motivating use-cases are found. -The identifier is bound scope-wide. It is visible and usable in the lines before -the actual binding occurs. Some monads (like IO) can introduce order-dependent +The identifier is always accessible scope-wide, before and after the line +introducing it. Some monadic contexts (like IO) can introduce order-dependent behavior, however. This is not something that IDE is (or can be) concerned about when figuring out connections. @@ -46,17 +67,20 @@ Identifier introduced into a scope is visible only in the scope's subtree Scopes are introduced by: * module/file (the root scope); * code blocks (i.e. the block that follows the line with an trailing operator); -* module-level definitions: for both signature (if present) and assignment-binding; * `->` operator for its right-hand side. -### Examples +Also some other constructs seemingly introduce scope (like function +definitions) but this is because they are desugared into some construct that +introduces scope (like lambdas). +TODO: Consider if there are any special rules for signatures on definitions, or +is this just type ascription next to a definition. +### Examples Example: ``` -main : a main = - succ = b -> b+1 + succ = a -> a+1 succ 0 ``` @@ -64,11 +88,10 @@ The sample code is a module with a single top-level definition. Here we have four scopes, each one nested in the previous one: * root scope (consisting of unindented lines), where `main` is visible; -* definition `main` scope, where `main` and `a` are visible; -* code block scope (all lines equally indented after - `main =`), where both `main`, `a` and `succ` are visible +* definition and code block (all lines equally indented after + `main =`) scopes, where both `main` and `succ` are visible * lambda body scope (right-hand side after `a ->`), where all `main`, - `succ`, `a` and `b` are visible. + `succ` and `a` are visible. Example: ``` @@ -76,7 +99,7 @@ test = a -> b -> sum = a + b ``` -Here we have root scope, `test`'s scope, then scopes of lambdas (after `a ->` +Here we have root scope, then scopes of lambdas (after `a ->` and after `b ->`) and finally scope for code block. @@ -91,7 +114,8 @@ main = bar = 3 ``` -While `main` as root-level has its own scope, `foo` and `bar` do not. `a` +While `main` as root-level has its own scope (as a definition in root it is +treated as method and desugared to lambda), `foo` and `bar` do not. `a` introduced by type signatures belongs to the `main`'s scope, and is shared by both nested definitions. @@ -108,6 +132,17 @@ Pattern context is introduced within: Details will follow with description of these operators. +// TODO What about `case … of` ? + +# Introducing identifiers +Common notation used in the examples uses French quotation marks as following: +* `«name»` for names introduced into the graph's scope. They are potential + source endpoints of the connections in the graph. +* `»name«` for names used from graph's scope. They are potential destination + endpoints of the connections in the graph. + +Before running code, these «» markers should be removed, they are just to +quickly convey expected results in the code sample content. ## Assignment @@ -116,19 +151,18 @@ body`, where it introduces `name` into the parent scope. Example: ``` -five = 5 +«five» = 5 ``` -Introduces the name `five` into the parent scope. +The name `five` introduced into the parent scope is bound to a value of expression `5`. -Assignment operator is also used to define functions, extension methods and -perform pattern matching. For each of these cases appropriate desugaring is -applied. See sections below for details for particular cases. +Assignment operator is used to define aliases, functions, extension methods and +perform pattern matching. For different cases appropriate desugaring is +applied. See sections below for details for particular details. -Roughly speaking, if the name is a variable, it is introduced and its arguments -(if present) are visible only in the definition body. If the name is -constructor, it will be pattern-matched and any variables used -for constructor arguments will be bound. +Roughly speaking, if the name is a variable identifier, it is introduced and its +arguments (if present) are visible only in the definition body. If the name is +constructor identifier, it will pattern-match on variables in its arguments positions. If any macros are used in the definition, it is assumed that if it appears in a pattern context, their vars introduce variables — or otherwise use variables. @@ -138,27 +172,71 @@ macro was in place. In any place where variable is used in the pattern, it can be substituted by underscore `_` to disregard the value without introducing any identifier. +Single line can contain at most one assignment. + +If the name introduced by assignment is already visible from parent scope, it +will be shadowed. + +Example: +``` +«foo» = 2 +bar = + «foo» = 5 # shadowing + a = »foo« + 5 # refers to `foo` from line above +``` + +If the name was already assigned to in the current scope, it is not allowed to +bind it again. + +Example: + +``` +«foo» = 5 +«foo» = 5 # error, symbol defined twice in the same scope +``` + +### Specific cases overview Examples: ``` -foo a b = a + b +«foo» a b = a + b ``` -introduces name `foo` + +Only the "base" name (of the prefix application chain) is introuced. Arguments +are visible in the body scope. Therefore, `a` and `b` in the body scope refer to +the function arguments and not to variables from the parent scope. --- ``` -Foo a b = bar +Foo «a» «b» = »bar« ``` -introduces names `a` and `b` +Here we perform pattern matching to introduce `a` and `b`, fields of constructor +`Foo`. The `bar` refers to some identifier from the parent scope which should be +already defined. --- ``` -a.hello = print "Hello" +a.«hello» = »print« "Hello" ``` -introduces name `hello` +Introduces name `hello` being an extension method defined on `a`. In this +position `a` will denote practically "any type" but is visible only in the +definition body (as it appears as the type of implicit `this` parameter). + +--- +TODO: +Example: +```foo = a -> + a = 5 +``` +TODO: Does the `a = 5` is shadowing? Or is this multiple definition error? If +the block introduces scope, it should shadow. However, it is not clear if the +block's scope should be truly separate from lambda's body scope. +Or perhaps assignment should be allowed to shadow lambda-introduced identifiers? + +--- ### Function definitions If the assignment's left-hand side is a prefix application chain, where the @@ -167,11 +245,15 @@ be a function definition. Each prefix argument is converted into a lambda argument. ``` -log_name object = print object.name +«log_name» object = »print« object.»name« ``` is desugared into: ``` -log_name = object -> print object.name +«log_name» = object -> »print« object.»name« +``` +which in turn can be desugared into: +``` +«log_name» = object -> »print« (»name« object) ``` This desugaring shows why only `log_name` is introduced into the scope, while @@ -180,32 +262,36 @@ This desugaring shows why only `log_name` is introduced into the scope, while If the operator appears in the function name position, it can be defined as well: ``` -^ a n = a * a ^ (n-1) +«^» a n = a * a ^ (n - 1) ``` +This introduces name `^` into the scope. It uses already defined `*` and `-` +operators. (to avoid clutter the operators are not marked with »«) + ### Pattern matching If the assignment's left-hand side is a prefix application chain where the left-most name is a constructor, it will be desugared into a pattern match. Example: ``` -Some value = get_opt +»Some« «value» = »get_opt« ``` will be desugared into: ``` -value = case get_opt of - Some b -> b +«value» = case »get_opt« of + »Some« b -> b _ -> error ``` -Therefore, `value` will be introduced into the parent scope. +Therefore, only `value` will be introduced into the parent scope. `Some` and +`get_opt` must be defined, the former being an atom with at least single field. Using operators in the infix position will also attempt to pattern match its operands. For example: ``` -x,y = get_position # introduces `x` and `y` +«x»,«y» = »get_position« ``` ### Extensions methods @@ -215,28 +301,48 @@ the implicit `this` parameter. For example: ``` -Foo.bar = 5 +»Foo«.«bar» = 5 ``` translated to: ``` -bar this:Foo = 5 +«bar» this:»Foo« = 5 ``` Which is then desugared into a lambda. The introduced name is only `bar`. +### Overloading +Only the methods that take `this` as the first parameter can be overloaded. Each +overload of the given name must have different type of `this`. + +However, the type of `this` will be often inferred by the typechecker and it +IDE cannot tell if given overloads are valid or not. + +Example: + +``` +«foo» this:«a» = »body1« + +«foo» this:«b» = »body2« +``` + +In this case `a` and `b` for each `foo` definition will be inferred by the +compiler. If they end up being different types, overloads are valid. If they are +the same, an error will be raised. + + ## Lambdas `arg -> value` is the syntax for lambdas. Left-hand side is a pattern for the argument (lambdas are always unary) and the right-hand side is its body. Lambda -body has its own scope. +body introduces its own scope. The `->` pattern introduces identifiers only into the scope of their right-hand side, if the lambda is not introduced in what is already a pattern context. Example: ``` -succ = a -> a + 1 -foo = a +«succ» = a -> a + 1 +«foo» = »a« ``` Here lambda introduces `a` only into its right-hand side. The `a` that is being @@ -248,67 +354,115 @@ are introduced into the scope targeted by the outer pattern context. Example: ``` -(a -> b) -> a.default +(a -> b) -> a.»default« ``` If not for this second rule, the `a` would be visible only in place of expression `b`. However, now it is visible in the outer lambda body and can be -accessed. +accessed. The only externally provided identifier must be `default` method. + +--- +Lambdas may not appear in the assignment's pattern (i.e. values cannot be +pattern-matched into lambda). + +So the following is not valid: +``` +a -> b = foo +``` +--- +Example: +``` +a -> a -> b +``` + +Here the first `a` and second `a` are separate identifiers, the latter shadowing +the first one. If one wanted to express that both arguments are of the same +type, `a -> A -> b` would have been used. `b` refers to an identifier from +graph's scope (it is in the body's position, not pattern). + +OPEN QUESTION: actually it might be "nice" to have both `a` unified in such +case. -TODO ## Type ascription The type ascription operator `:` introduces pattern scope for its right hand -side. The basic form is `value:type`. The type identifiers used in the -right-hand side will be constrained to include appropriate values in their value -set. +side. The basic form is `value:type`. It says that `value` be of the given +`type`, i.e. that all its possible values belong to the set of atoms represented +by type. + +The effect of this can be two-fold. If `value` is of (at least partially) known +type, appropriate constraints will be introduced on the types denoted by +variable names appearing in the pattern context. If the identifier was not +defined, it will appear in the current scope. -It is legal to assign constraints on an identifier using `:` multiple times in -any of the scopes where identifier is visible. +For example: +``` +5 : «a»? +``` -When variable name appears in type pattern, the type denoted by this identifier -will be required to contain given `value`. If variable does not denote any type -visible in the scope, the identifier will be introduced into the current scope. +This introduces constraint on the type `a` that its value set must include atom +`5`. If the `a` is already visible in the scope, this constraint will be added. +Otherwise, `a` will be introduced into the current scope with that constraint. -TODO: Open design question — perhaps variables after `:` should be only allowed -to introduce new identifiers but not to constrain existing ones. +// TODO: What if parent scope only ascribes identifier with type constraint but +only nested scope assigns to it? +// TODO: Open design question, if `a` should modify existing variable or should +always try to shadow it. What is the difference between `5 : a` and `5 : A`? +(except the latter not being able to ever introduce a new identifier) + + +When type is known, the type ascription can be used to constrain type of the +value: -TODO examples: ``` -a : 5 -A : 5 -5 : a -5 : A +»a« : 5 ``` -TODO: Open question: does empty type exist? Apparently it makes more sense in -lazy languages, rather then strict ones. +This says that value of `a` is of type `5`. Type `5` has only a single allowed +value: `5`. This will tell compiler to error out if program tries to bind `5` +with any value that is not known to be `5`. + +However, this example refers to some `a` already being visible in scope and does +not introduce any identifiers. -TODO TODO TODO +The `type` in this expression is pattern context and can be used to constrain +the type variables. It is legal to assign constraints on an identifier using `:` +multiple times in any of the scopes where identifier is visible. +Signatures are just type ascriptions that happen to preceed the assignments. +They have no special rules currently defined. This area needs further design +work. + +TODO: Open question: does empty type (`Void`) exists in the language? TODO signatures and their relation with scoping. Difference for root and non-root definitions. Examples: - ``` -add : Int -> Int -> Int +add : a -> a -> a add a b = a + b ``` ---- +TODO: +* With current rules `a` from the signature gets introduced into parent scope + will be unified with other uses of `a` in other definitions. +* or, actually, we want this to happen only in the root scope. When in + definition body, the `a` actually should be unified between signatures. + Doesn't sound that clean though. +* Does argument-introduced `a` shadows the signature-introduced `a`? -## Current engine limitations + +# Current engine limitations Note: "current" means "in the scope of the first alpha release of enso", not "at the moment of writing this document". -### Extension methods +## Extension methods The extensions methods (taking `this` as the first parameter) can be defined only using the sugared syntax. @@ -325,17 +479,21 @@ a.foo = print "hello" are equivalent, engine currently supports only the latter. +IDE can assume that all extension methods will be introduced using the +`Type.name` syntax sugar. + // TODO what if non-first argument is named `this` ? Is the magic happening only -for this particuar name? +for this particuar name? Is it sensitive for its position in the arguments list? +// TODO What happens if the `this`-taking function in defined in the root scope +where already `this` is implicitly provided? What about taking `this` in a +method defined using the sugared syntax? (e.g. `Int.print this:Int = ...`) -### Type ascription +## Type ascription The type ascription and signatures are not properly supported. IDE should disregard them for the time being. - - # IDE Connection Discovery IDE presents a definition body as a graph. Code lines of the body of the definition are displayed as nodes (unless they're definitions). @@ -347,27 +505,31 @@ Some simplifications are currently assumed: * Connections care only about usage of symbols introduced by assignment definition. For example, symbols introduced by `:` operator's right side do not form connections. Same for lambda arguments. -* we care only about identifiers introduced into graph's scope: anything that +* We care only about identifiers introduced into graph's scope: anything that appears in subscopes can be disregarded. However, IDE must be aware of shadowing to properly tell if an identifier usage actually refers to an identifier from graph's scope. * There is no graph for the module's root scope, so any special rules for the root scope might be irrelevant. * IDE is concerned about producing correct results for correct programs. It does - not care about diagnosing ill-formed programs, quite the opposite. We want to - keep output as similar to the correct one as possible. (we will often - visualize programs that are in progress of editing) + not care about diagnosing ill-formed or "not yet supported" programs, quite + the opposite. We want to keep output as similar to the correct one as + possible. (we will often visualize programs that are in progress of editing). * For the first release IDE can disregard the type ascription operator (`:`). +// TODO: Specify what is exactly the graph's scope? Is this a lambda body scope +or the scope introduced by the block following the lambda? Likely both of these +should be somehow coalesced, to avoid issues with definitions with inline bodies. -// TODO: what is graph's scope? Is this a definition's scope (if there's such -thing) or code block's scope? +// TODO: Actually, can we display graphs for argument-less blocks being node +bodies? Scoping could get quite strange then. Basically, the problem can be reduced to being able to describe for any line in code block the list of identifiers it introduces into the graph's scope and the list of identifiers from graph's scope that it uses. - +If the identifier is introduced by assignment's left-hand side and is used in +the other node's expression, the connection should be recognized. ## Connection @@ -378,174 +540,6 @@ introduces the identifier (source of data), and crumb describes the identifier position in the node's assignment's left-hand side. Destination endpoint similarly describes position in node's expression where the identifier is used. - ---- ---- ---- - - - - - - -# TO BE REWRITTEN ---- - -# TO BE REWRITTEN - - -## Assignments -Assignment operator is used to define identifiers. Its left-hand side is a pattern -context. Pattern context means that usage of a variable name (identifier -starting with a lower case letter) actually binds this identifier with whatever -it is being matched to. The bound identifier is visible and usable within the -target scope. - -When upper-cased variable is used in a pattern context, it must refer to an -existing identifier and will perform pattern matching. Example: - -``` -Some a = foo -``` - -This introduces `a`, while using `Some` and `foo`. - -// TODO jak to dokładnie ma się po zdesugorawoaniu? -`Some = a -> foo` ? - - -Assignments are used to bind values to identifiers. For example: - -``` -foo = 5 -``` -This introduces an identifier `foo` into the containing scope. - -If `foo` was already introduced by a parent scope, it will be shadowed. - -Example: -``` -foo = 5 -main = - foo = 5 # this is a nested scope, shadowing occurs -``` - - -If `foo` was already introduced by the current scope, error will be raised. - -Example: - -``` -foo = 5 -foo = 5 # error, symbol defined twice in the same scope -``` - - - -## Contexts -There are two kinds of context: pattern context and non-pattern context. - -Each position in code is either in a pattern context or not. By default code is -in non-pattern context. Pattern context is introduced locally by certain -language constructs: `=`, `:` and `->` operators (i.e. definition bindings, signature type ascription and lambdas). - -Inside a pattern context each usage of a variable name (identifier starting with a -lower case letter) actually binds this identifier with whatever it is being -matched to. The bound identifier is visible and usable within the target scope. - -Pattern context always has the single target scope, where the identifiers are -introduced into. What is the target scope depends on the operator that -introduced pattern context. - -Pattern context is introduced within: -* left-hand side of assignment operator, e.g. `main` in `main = println - "Hello"`; -* right-hand side of a colon operator, e.g. `a` in `foo:a`; -* left-hand side of an arrow operator, e.g. `a` in `a -> a + 1`. - -Both `=` and `:` introduce identifiers into the scope where they occur, as they -do not introduce any new scope of their own. - - -## Examples -Unless otherwise stated, it should be assumed that given examples are lines -occurring within a definition's body code block. - - -``` -a -> a -> b -``` - -Here the first `a` and second `a` are separate identifiers, the latter shadowing -the first one. If one wanted to express that both arguments are of the same -type, `a -> A -> b` would have been used. `b` refers to an identifier from -graph's scope. - -OPEN QUESTION: actually it might be "nice" to have both `a` refer to the same -identifier here. - ---- - -Overloading. - -``` -# root scope - -foo this:a = … - -foo this:a = … -``` - -In this case the `a` for each `foo` definition will be inferred by the compiler. -If the `a` ends up being different for them, they are valid overloads. -Otherwise, it is an error of having multiple definitions for the same name. - - ---- - - -``` -a -> b = c -``` - -If such line occurs on the top-level, `a` and `b` are introduced into the -definition scope. Otherwisee, they are introduced into the parent scope. - -Does this introduce the `a` into the module's scope? - -(rules say "only if inline `=` does not introduce a new scope <=> on the top level) - -Nie moze być `->` po lewej. - ---- - -``` -a = Int -foo = 5:a -``` - -What if - -``` -a = Int -foo = Int : a -``` - - - - - - - - - -TODO -W top levelu jaki jest dokładnie obszar scope'u definicji? -Czy obejmuje sygnaturę? - -Czy może być wiele sygnatur do definicji? -Jak dać sygnaturę do czegoś co nie ma żadnej nazwy lub ma wiele nazw? - -Różnica między pattern-matchingiem a typowaniem? -Różnica między `a = 5` oraz `5:a`. Co jest wartością, co jest typem? -Sygnatura bez definicji? \ No newline at end of file +Later, higher layers will GUI shall merge this information with the "span tree" +describing the structure of the node's pattern and body. (the purpose is +observing connections on the flattened port layout) From 00f67398856a2e048657cdae56cb01bcb60d7d69 Mon Sep 17 00:00:00 2001 From: mwu Date: Mon, 6 Apr 2020 01:34:13 +0200 Subject: [PATCH 5/8] done one more pass --- docs/connections.md | 236 +++++++++++++++++++++++++++----------------- 1 file changed, 144 insertions(+), 92 deletions(-) diff --git a/docs/connections.md b/docs/connections.md index da6dd40718..22cf0ca858 100644 --- a/docs/connections.md +++ b/docs/connections.md @@ -1,80 +1,78 @@ # General Rules -This section attempts to describe how the identifiers are introduced and -resolved in the language. +This document describes language rules that are relevant to connection discovery +and operations. The purpose is not to specify the whole language. Just enough +for IDE team members to be able to reason where identifiers are introduced and +what entity identifier usage refers to. -The purpose is not to specify the whole language. Just enough for an IDE team -members to be able to reason where identifiers are introduced and what entity -identifier usage refers to. - -This is the base allowing IDE to describe what connections are in the displayed graph. +This is the base allowing IDE to describe what connections are in the displayed +graph. The covered topics are mostly identifiers, scopes and their interactions. ## Identifier -Identifier is a name that may denote value of some type. Syntactically we recognize: -* variables, being names that do not contain upper-cased characters, like `foo2` +Identifier is a name that denotes a value, i.e. that is bound to a value. The +compiler also keeps track of its type, i.e. the set of possible values. Syntactically we recognize: +* variable names, being names that do not contain upper-cased characters, like `foo2` or `make_new`; -* constructors, which are like variables but with first character and every +* constructor names, which are like variables but with first character and every character directly following underscore upper-cased (e.g. `Foo2` or `Make_New`); -* operators, being names consisting of operator symbols (e.g. `+` or `<$>`). - Specifically, operator name may contain following characters - `!$%&*+-/<>?^~|:\,.()[]{}=`. Not every sequence of these characters is a valid - operator name, as they could collide with other language constructs. +* operator names, being built solely from operator symbols (e.g. `+` or `<$>`). + Specifically, an operator name may contain only characters from the following + set: `!$%&*+-/<>?^~|:\,.()[]{}=`. Not every sequence of these characters is a valid operator name, as they can collide with other language constructs. -Any other names not matching requirements above, like `HTTP`, `foO` or -`Make_new` are not allowed. +Any other names that do not match requirements above, like `HTTP`, `foO` or +`Make_new` are not allowed and their behavior will not be specified. -In non-pattern contexts, referring to an existing identifier is -case-insensitive. So `foo` can be referred to as `Foo`. Note that `fOo` or `FOO` -are not valid identifiers, as upper-cased letter may appear only as the first -letter or after underscore (e.g. `Make_Request`). +In non-pattern contexts, referring to the existing identifier is +case-insensitive. So `foo` can be referred to either as `foo` or `Foo` with no +difference. In pattern context, lower-cased names are used to introduce a +binding (or a constraint), while upper-cased name will refer to an already bound +identifier. In short, variable name is allowed to shadow, while constructor +always unambiguously refers to an externally visible identifier. -In pattern context, lower-cased names are used to introduce a binding (or a -constraint), while upper-cased name will refer to an already bound identifier. +Operator names behave either as: +* variable names when placed in a prefix position (e.g. `+` in `+ ab`); +* or as constructor names when placed in an infix position (e.g. `,` in `a,b`). -Binding means introducing an identifier into scope and associating it with some -value. Identifier can be introduced also without binding it to any specific -value (e.g. as type constraint). +Number and text literals (like `5` or `"Hello"`) are treated as +constructor names. -Operators behave as variables in prefix position (e.g. `+` in `+ a b`) or as -constructors in an infix position (e.g. `,` in `a,b`). +Identifiers can be introduced by a binding (e.g. using assignment or lambda +argument matching) or when adding type constraints (using type ascriptions). ## Scope -Scope is the span in the code which shares the available identifiers set. Scopes -can be seen as a span-tree structure, covering the whole program code. +Scope is the span in the code which shares the identifiers set of visible, +available identifiers. Scopes are a span-tree structure, covering the whole program code. Nested scope is allowed to: * access identifiers from the outer scope; * shadow identifiers from nested scope with a new binding; * introduce new constraints on the identifiers from parent (or own) scopes. -The same identifier may be bound to multiple times in the same scope -(overloading). It is allowed only for method overloads that differ in the type of -the `this` parameter. This limitation may be relaxed in the future, if proper -motivating use-cases are found. +The same identifier may be bound multiple times in the same scope (overloading). +It is allowed only for method overloads that differ in the type of the `this` +parameter. This limitation may be relaxed in the future, if proper motivating use-cases are found. -The identifier is always accessible scope-wide, before and after the line +The identifier is always accessible scope-wide, both before and after the line introducing it. Some monadic contexts (like IO) can introduce order-dependent behavior, however. This is not something that IDE is (or can be) concerned about when figuring out connections. -Identifier is bound by using a variable-type identifier in the pattern context. -Exact behavior depends on the language construction that was to introduce the identifier. +Identifier is bound by using a variable name in the pattern context. Exact behavior depends on the language construction that was to introduce the identifier. Identifier introduced into a scope is visible only in the scope's subtree (lexical scoping). -Scopes are introduced by: -* module/file (the root scope); -* code blocks (i.e. the block that follows the line with an trailing operator); -* `->` operator for its right-hand side. +Scopes, in the core language, are introduced by: +* module (file), i.e. the root scope; +* code blocks, i.e. indented blocks that follow a trailing operator; +* arrow (lambda) `->` operator for its right-hand side. -Also some other constructs seemingly introduce scope (like function -definitions) but this is because they are desugared into some construct that -introduces scope (like lambdas). +Some other constructs seemingly introduce scope (like function definitions) but +this is because they are desugared into some construct that introduces scope (like lambdas). TODO: Consider if there are any special rules for signatures on definitions, or -is this just type ascription next to a definition. +is this just type ascription lying next to a definition. ### Examples Example: @@ -116,15 +114,15 @@ main = While `main` as root-level has its own scope (as a definition in root it is treated as method and desugared to lambda), `foo` and `bar` do not. `a` -introduced by type signatures belongs to the `main`'s scope, and is shared by both -nested definitions. +introduced by type signatures belongs to the `main`'s scope, and is shared by +both nested definitions. ## Patterns -Patterns are context in the code where variables can be used to introduce new -identifiers into some scope. Constructors (that also include literals) are used -to pattern match against and potentially destructure more complex values. +Patterns are scopes in the code where variables can be used to introduce new +identifiers into some scope. Constructors names are used to pattern match +against and potentially destructure more complex values. -Pattern context is introduced within: +The following scopes are treated as patterns: * left-hand side of assignment operator, e.g. `main` in `main = println "Hello"`; * right-hand side of a colon operator, e.g. `a` in `foo:a`; @@ -132,7 +130,7 @@ Pattern context is introduced within: Details will follow with description of these operators. -// TODO What about `case … of` ? +// TODO What about `case … of` ? Are there any other pattern-introducing constructs? # Introducing identifiers Common notation used in the examples uses French quotation marks as following: @@ -141,41 +139,49 @@ Common notation used in the examples uses French quotation marks as following: * `»name«` for names used from graph's scope. They are potential destination endpoints of the connections in the graph. -Before running code, these «» markers should be removed, they are just to -quickly convey expected results in the code sample content. +Before "executing" code, these «» markers should be removed. They are just to +quickly convey expected results in the code sample content, and not repeat +"introduces `name` and uses `name2`" for each line description. ## Assignment The assignment operator `=` is deeply magical. Its basic form is `name = -body`, where it introduces `name` into the parent scope. +body`. It introduces `name` into the parent scope, by binding it with a value `body`. Example: ``` «five» = 5 ``` -The name `five` introduced into the parent scope is bound to a value of expression `5`. +The name `five` introduced into the parent scope is bound to a value of +expression `5`. -Assignment operator is used to define aliases, functions, extension methods and +Assignment operator is also used to define functions, extension methods and perform pattern matching. For different cases appropriate desugaring is applied. See sections below for details for particular details. -Roughly speaking, if the name is a variable identifier, it is introduced and its +Roughly speaking, if a name is a variable name, it is introduced and its arguments (if present) are visible only in the definition body. If the name is -constructor identifier, it will pattern-match on variables in its arguments positions. +constructor name though, it will pattern-match on variables in its arguments positions. -If any macros are used in the definition, it is assumed that if it appears in a -pattern context, their vars introduce variables — or otherwise use variables. +If macros are used in the definition, it is assumed that if it appears in a +pattern context, variable names it matched shall be bound — or otherwise just +used from the containing scope. Basically, it is similar to if a grouped expression with tokens matched by a macro was in place. -In any place where variable is used in the pattern, it can be substituted by -underscore `_` to disregard the value without introducing any identifier. +In any place where variable name is used in a pattern, it can be substituted by +an underscore `_` to disregard the value without introducing any identifier. -Single line can contain at most one assignment. +A single line can contain at most one assignment. The following code is not +valid: +``` +foo = bar = baz # invalid! +``` -If the name introduced by assignment is already visible from parent scope, it -will be shadowed. + +If a name introduced by an assignment is already available in the parent scope, +it becomes shadowed. Example: ``` @@ -183,6 +189,7 @@ Example: bar = «foo» = 5 # shadowing a = »foo« + 5 # refers to `foo` from line above +# here `foo` is `2` again ``` If the name was already assigned to in the current scope, it is not allowed to @@ -195,9 +202,16 @@ Example: «foo» = 5 # error, symbol defined twice in the same scope ``` -### Specific cases overview +The excpetion to this rule are function overloads, described in a separate +section later. + +### Non-trivial cases overview +These include functions (variable name followed by arguments), pattern matching +(constructor name with optional arguments) and extension methods. Each of these +is described in a greater detail below, here are just a few quick examples. + +--- -Examples: ``` «foo» a b = a + b ``` @@ -228,21 +242,23 @@ definition body (as it appears as the type of implicit `this` parameter). --- TODO: Example: -```foo = a -> +``` +foo = a -> a = 5 ``` -TODO: Does the `a = 5` is shadowing? Or is this multiple definition error? If -the block introduces scope, it should shadow. However, it is not clear if the -block's scope should be truly separate from lambda's body scope. -Or perhaps assignment should be allowed to shadow lambda-introduced identifiers? +TODO: Does the `a = 5` is shadowing `a` from lambda argument? Or is this +multiple definition error? If the block introduces scope, it should shadow. If +however lambda body is THE scope where argument is introduced and is same as +lambda code block, it should be an error. +Or should it get some special rule? --- ### Function definitions If the assignment's left-hand side is a prefix application chain, where the -left-most name (i.e. the function name) is a variable, the assignment is said to +left-most element (the callable) is a variable name, the assignment is said to be a function definition. Each prefix argument is converted into a lambda -argument. +argument in the assignment body. ``` «log_name» object = »print« object.»name« @@ -270,7 +286,7 @@ operators. (to avoid clutter the operators are not marked with »«) ### Pattern matching If the assignment's left-hand side is a prefix application chain where the -left-most name is a constructor, it will be desugared into a pattern match. +function is a constructor name, it will be desugared into a pattern match. Example: ``` @@ -294,10 +310,14 @@ operands. For example: «x»,«y» = »get_position« ``` +TODO Describe what it gets desugared to when there is more than one variable +name introduced? Now it is inconsistent, if we say that only canonical form of +assignment is `variable_name = value`. + ### Extensions methods If the application target uses accessor operator `.`, e.g. `Int.add`, the last -segment is the introduced indentifier and the previous segments are used to type -the implicit `this` parameter. +segment of target is the introduced indentifier and the previous segments are +used to type the implicit `this` parameter. For example: ``` @@ -310,6 +330,9 @@ translated to: Which is then desugared into a lambda. The introduced name is only `bar`. +If there are any prefix application arguments following the accessor-style +target, they will be treated as arguments following implicit `this`. + ### Overloading Only the methods that take `this` as the first parameter can be overloaded. Each overload of the given name must have different type of `this`. @@ -329,15 +352,25 @@ In this case `a` and `b` for each `foo` definition will be inferred by the compiler. If they end up being different types, overloads are valid. If they are the same, an error will be raised. +### Root scope assignments + +Any binding in the rootscope gets an implicit `this` parameter that describes a +module. Example: +``` +main = print "Hello" +``` +`main` here is a function binding that is desugared to lambda. As such its body +has its own, separate scope. That would not have been a case, if such lines +appeared in any non-root code block. ## Lambdas `arg -> value` is the syntax for lambdas. Left-hand side is a pattern for the -argument (lambdas are always unary) and the right-hand side is its body. Lambda -body introduces its own scope. +argument (lambdas are always unary) and the right-hand side is its body. +Right-hand side, i.e. the lambda body, introduces its own scope. -The `->` pattern introduces identifiers only into the scope of their right-hand -side, if the lambda is not introduced in what is already a pattern context. +The `->`'s pattern introduces identifiers only into the scope of the right-hand +side — if the lambda is not introduced in what is already a pattern context. Example: ``` @@ -413,6 +446,8 @@ only nested scope assigns to it? always try to shadow it. What is the difference between `5 : a` and `5 : A`? (except the latter not being able to ever introduce a new identifier) +// TODO: Is it legal or sensible to ascribe a variable when there is no binding? + When type is known, the type ascription can be used to constrain type of the value: @@ -428,7 +463,6 @@ with any value that is not known to be `5`. However, this example refers to some `a` already being visible in scope and does not introduce any identifiers. - The `type` in this expression is pattern context and can be used to constrain the type variables. It is legal to assign constraints on an identifier using `:` multiple times in any of the scopes where identifier is visible. @@ -457,6 +491,7 @@ TODO: Doesn't sound that clean though. * Does argument-introduced `a` shadows the signature-introduced `a`? +TODO: `b:A` — does this add constraint to `a` or just `b` ? # Current engine limitations Note: "current" means "in the scope of the first alpha release of enso", @@ -494,13 +529,31 @@ The type ascription and signatures are not properly supported. IDE should disregard them for the time being. -# IDE Connection Discovery +# Node Connections in IDE IDE presents a definition body as a graph. Code lines of the body of the definition are displayed as nodes (unless they're definitions). We want to display connection between nodes, if an identifier introduced by one node into the graph's scope is used in another node's expression. +## Connection +Connection is an ordered pair of endpoints: source and destination. Endpoint is +pair of node ID and crumbs. Source endpoint identifiers the node which +introduces the identifier (source of data), and crumb describes the identifier +position in the node's assignment's left-hand side. Destination endpoint +similarly describes position in node's expression body where the identifier is used. + +Later, higher layers will GUI shall merge this information with the "span tree" +describing the structure of the node's pattern and body. The low level +"double-representation" deals only with AST and is not concerned with view-level +datastructures like expresion span-stree. + + +## Discovery rules +Connections when identifiers from graph scope are used in node expressions. They +are between node that introduces identifier and node that uses identifier. + + Some simplifications are currently assumed: * Connections care only about usage of symbols introduced by assignment definition. For example, symbols introduced by `:` operator's right side do @@ -532,14 +585,13 @@ If the identifier is introduced by assignment's left-hand side and is used in the other node's expression, the connection should be recognized. -## Connection +## Connection operations +// TODO: complete in future, when implementing them -Connection is an ordered pair of endpoints: source and destination. Endpoint is -pair of node ID and crumbs. Source endpoint identifiers the node which -introduces the identifier (source of data), and crumb describes the identifier -position in the node's assignment's left-hand side. Destination endpoint -similarly describes position in node's expression where the identifier is used. +Because definitions can be sensitive about their order (e.g. because of IO +monadic context), when creating connections, lines should be reordered to match +the order of topologically sorted nodes from the graph. (when possible) + +In future this behavior should be depend on definition's monadic context +provided by the language server. -Later, higher layers will GUI shall merge this information with the "span tree" -describing the structure of the node's pattern and body. (the purpose is -observing connections on the flattened port layout) From 77506c5e72dd212783e3333a742f1bd58ea72759 Mon Sep 17 00:00:00 2001 From: mwu Date: Mon, 6 Apr 2020 02:47:07 +0200 Subject: [PATCH 6/8] typos --- docs/connections.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/docs/connections.md b/docs/connections.md index 22cf0ca858..588c1068bd 100644 --- a/docs/connections.md +++ b/docs/connections.md @@ -202,7 +202,7 @@ Example: «foo» = 5 # error, symbol defined twice in the same scope ``` -The excpetion to this rule are function overloads, described in a separate +The exception to this rule are function overloads, described in a separate section later. ### Non-trivial cases overview @@ -216,7 +216,7 @@ is described in a greater detail below, here are just a few quick examples. «foo» a b = a + b ``` -Only the "base" name (of the prefix application chain) is introuced. Arguments +Only the "base" name (of the prefix application chain) is introduced. Arguments are visible in the body scope. Therefore, `a` and `b` in the body scope refer to the function arguments and not to variables from the parent scope. @@ -316,7 +316,7 @@ assignment is `variable_name = value`. ### Extensions methods If the application target uses accessor operator `.`, e.g. `Int.add`, the last -segment of target is the introduced indentifier and the previous segments are +segment of target is the introduced identifier and the previous segments are used to type the implicit `this` parameter. For example: @@ -354,7 +354,7 @@ the same, an error will be raised. ### Root scope assignments -Any binding in the rootscope gets an implicit `this` parameter that describes a +Any binding in the root scope gets an implicit `this` parameter that describes a module. Example: ``` main = print "Hello" @@ -369,8 +369,8 @@ appeared in any non-root code block. argument (lambdas are always unary) and the right-hand side is its body. Right-hand side, i.e. the lambda body, introduces its own scope. -The `->`'s pattern introduces identifiers only into the scope of the right-hand -side — if the lambda is not introduced in what is already a pattern context. +If the lambda is *not* introduced in what is already a pattern context, the `->`'s pattern introduces identifiers into the scope of the right-hand +side. Example: ``` @@ -467,7 +467,7 @@ The `type` in this expression is pattern context and can be used to constrain the type variables. It is legal to assign constraints on an identifier using `:` multiple times in any of the scopes where identifier is visible. -Signatures are just type ascriptions that happen to preceed the assignments. +Signatures are just type ascriptions that happen to precede the assignments. They have no special rules currently defined. This area needs further design work. @@ -518,7 +518,7 @@ IDE can assume that all extension methods will be introduced using the `Type.name` syntax sugar. // TODO what if non-first argument is named `this` ? Is the magic happening only -for this particuar name? Is it sensitive for its position in the arguments list? +for this particular name? Is it sensitive for its position in the arguments list? // TODO What happens if the `this`-taking function in defined in the root scope where already `this` is implicitly provided? What about taking `this` in a method defined using the sugared syntax? (e.g. `Int.print this:Int = ...`) @@ -546,7 +546,7 @@ similarly describes position in node's expression body where the identifier is u Later, higher layers will GUI shall merge this information with the "span tree" describing the structure of the node's pattern and body. The low level "double-representation" deals only with AST and is not concerned with view-level -datastructures like expresion span-stree. +data structures like expression span-tree. ## Discovery rules From 9d2c6b8a38d8e1a241e186d376e73e9dea5f4294 Mon Sep 17 00:00:00 2001 From: mwu Date: Wed, 8 Apr 2020 13:08:09 +0200 Subject: [PATCH 7/8] cr feedback, minor fix --- docs/connections.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/connections.md b/docs/connections.md index 588c1068bd..69710efb1c 100644 --- a/docs/connections.md +++ b/docs/connections.md @@ -118,11 +118,11 @@ introduced by type signatures belongs to the `main`'s scope, and is shared by both nested definitions. ## Patterns -Patterns are scopes in the code where variables can be used to introduce new -identifiers into some scope. Constructors names are used to pattern match +Pattern contexts are spans in the code where variables can be used to introduce +new identifiers into some scope. Constructors names are used to pattern match against and potentially destructure more complex values. -The following scopes are treated as patterns: +The following spans are pattern contexts: * left-hand side of assignment operator, e.g. `main` in `main = println "Hello"`; * right-hand side of a colon operator, e.g. `a` in `foo:a`; @@ -188,7 +188,7 @@ Example: «foo» = 2 bar = «foo» = 5 # shadowing - a = »foo« + 5 # refers to `foo` from line above + «a» = »foo« + 5 # refers to `foo` from line above # here `foo` is `2` again ``` @@ -420,7 +420,7 @@ case. ## Type ascription -The type ascription operator `:` introduces pattern scope for its right hand +The type ascription operator `:` introduces pattern context for its right hand side. The basic form is `value:type`. It says that `value` be of the given `type`, i.e. that all its possible values belong to the set of atoms represented by type. From 1189e266768fe91b1754355d4337681844a5e9d4 Mon Sep 17 00:00:00 2001 From: mwu Date: Fri, 10 Apr 2020 01:07:15 +0200 Subject: [PATCH 8/8] updates based on Ara's feedback and call with Wojciech --- docs/connections.md | 280 ++++++++++++++++++++++++++++++++++---------- 1 file changed, 218 insertions(+), 62 deletions(-) diff --git a/docs/connections.md b/docs/connections.md index 69710efb1c..8904e808b1 100644 --- a/docs/connections.md +++ b/docs/connections.md @@ -10,15 +10,17 @@ graph. The covered topics are mostly identifiers, scopes and their interactions. ## Identifier Identifier is a name that denotes a value, i.e. that is bound to a value. The -compiler also keeps track of its type, i.e. the set of possible values. Syntactically we recognize: -* variable names, being names that do not contain upper-cased characters, like `foo2` - or `make_new`; +compiler also keeps track of its type, i.e. the set of possible values. +Syntactically we recognize: +* variable names, being names that do not contain upper-cased characters, like + `foo2` or `make_new`; * constructor names, which are like variables but with first character and every character directly following underscore upper-cased (e.g. `Foo2` or `Make_New`); * operator names, being built solely from operator symbols (e.g. `+` or `<$>`). Specifically, an operator name may contain only characters from the following - set: `!$%&*+-/<>?^~|:\,.()[]{}=`. Not every sequence of these characters is a valid operator name, as they can collide with other language constructs. + set: `!$%&*+-/<>?^~|:\,.()[]{}=`. Not every sequence of these characters is a + valid operator name, as they can collide with other language constructs. Any other names that do not match requirements above, like `HTTP`, `foO` or `Make_new` are not allowed and their behavior will not be specified. @@ -42,7 +44,8 @@ argument matching) or when adding type constraints (using type ascriptions). ## Scope Scope is the span in the code which shares the identifiers set of visible, -available identifiers. Scopes are a span-tree structure, covering the whole program code. +available identifiers. Scopes are a span-tree structure, covering the whole +program code. Nested scope is allowed to: * access identifiers from the outer scope; @@ -51,14 +54,17 @@ Nested scope is allowed to: The same identifier may be bound multiple times in the same scope (overloading). It is allowed only for method overloads that differ in the type of the `this` -parameter. This limitation may be relaxed in the future, if proper motivating use-cases are found. +parameter. This limitation may be relaxed in the future, if proper motivating +use-cases are found. The identifier is always accessible scope-wide, both before and after the line introducing it. Some monadic contexts (like IO) can introduce order-dependent behavior, however. This is not something that IDE is (or can be) concerned about when figuring out connections. -Identifier is bound by using a variable name in the pattern context. Exact behavior depends on the language construction that was to introduce the identifier. +Identifier is bound by using a variable name in the pattern context. Exact +behavior depends on the language construction that was to introduce the +identifier. Identifier introduced into a scope is visible only in the scope's subtree (lexical scoping). @@ -66,13 +72,23 @@ Identifier introduced into a scope is visible only in the scope's subtree Scopes, in the core language, are introduced by: * module (file), i.e. the root scope; * code blocks, i.e. indented blocks that follow a trailing operator; -* arrow (lambda) `->` operator for its right-hand side. +* arrow `->` operator for its both operands. + +Arrow operator creates a scope both when used to define lambda-expressions and +when used as one of `case … of` arms. Some other constructs seemingly introduce scope (like function definitions) but -this is because they are desugared into some construct that introduces scope (like lambdas). +this is because they are desugared into some construct that introduces scope +(like lambdas): +* any assignment in the root scope (desugared to a mathod on a moduke); +* any function, i.e. non-nullary assignment: a new scope for each parameter + (desugared to lambda); +* any method, including extension methods (as above). TODO: Consider if there are any special rules for signatures on definitions, or -is this just type ascription lying next to a definition. +is this just type ascription lying next to a definition. Is there (and should +there be?) a mechanism that makes identifiers defined in the function body +visible in its signature? ### Examples Example: @@ -86,9 +102,11 @@ The sample code is a module with a single top-level definition. Here we have four scopes, each one nested in the previous one: * root scope (consisting of unindented lines), where `main` is visible; -* definition and code block (all lines equally indented after - `main =`) scopes, where both `main` and `succ` are visible -* lambda body scope (right-hand side after `a ->`), where all `main`, +* `main` definition's scope, where `main` is visible (and an implicit `here` + parameter that we'll ignore for now); +* `main` definition code block's (all lines equally indented after + `main =`) scope, where both `main` and `succ` are visible +* lambda body scope (expression `a -> a+1`), where all `main`, `succ` and `a` are visible. Example: @@ -97,8 +115,8 @@ test = a -> b -> sum = a + b ``` -Here we have root scope, then scopes of lambdas (after `a ->` -and after `b ->`) and finally scope for code block. +Here we have root scope, definition scope of `test`, then scopes of botg lambdas + (`a -> …` and `b -> …`) and finally scope of the code block. Example: @@ -112,15 +130,20 @@ main = bar = 3 ``` -While `main` as root-level has its own scope (as a definition in root it is -treated as method and desugared to lambda), `foo` and `bar` do not. `a` +While `main` as root-level has its own scope, `foo` and `bar` do not. `a` introduced by type signatures belongs to the `main`'s scope, and is shared by both nested definitions. ## Patterns Pattern contexts are spans in the code where variables can be used to introduce -new identifiers into some scope. Constructors names are used to pattern match -against and potentially destructure more complex values. +new identifiers into the containing scope. + +Within a pattern context: +* variable names are matched against corresponding parts of the expression and + are introduced into the scope; +* constructor names require that the matched value is of a given constructor and + allows matching fields recursively; +* any literals (numbers, strings) behave as constructors. The following spans are pattern contexts: * left-hand side of assignment operator, e.g. `main` in `main = println @@ -128,9 +151,15 @@ The following spans are pattern contexts: * right-hand side of a colon operator, e.g. `a` in `foo:a`; * left-hand side of an arrow operator, e.g. `a` in `a -> a + 1`. +In the core language actually both non-trivial lambdas and assignments are +desugared to the trivial lambdas, monadic bindings and `case … of` expressions. + Details will follow with description of these operators. -// TODO What about `case … of` ? Are there any other pattern-introducing constructs? +Other language constructs also introduces pattern contexts, like `case +expression of`, where each variant's arm is of form `pattern -> body`. + +// TODO Any other core/language (or builtin) pattern-introducing constructs? # Introducing identifiers Common notation used in the examples uses French quotation marks as following: @@ -145,8 +174,11 @@ quickly convey expected results in the code sample content, and not repeat ## Assignment -The assignment operator `=` is deeply magical. Its basic form is `name = -body`. It introduces `name` into the parent scope, by binding it with a value `body`. +The assignment operator `=` is deeply magical. Its left-hand side introduces a +pattern that is matched against the value of its right side. + +Assignment can appear as the root expression in the block's line, be it a root +module's block or a nested code block. Example: ``` @@ -156,13 +188,14 @@ Example: The name `five` introduced into the parent scope is bound to a value of expression `5`. -Assignment operator is also used to define functions, extension methods and +Assignment operator is used to define functions, extension methods and perform pattern matching. For different cases appropriate desugaring is applied. See sections below for details for particular details. Roughly speaking, if a name is a variable name, it is introduced and its arguments (if present) are visible only in the definition body. If the name is -constructor name though, it will pattern-match on variables in its arguments positions. +constructor name though, it will pattern-match on variables in its arguments +positions. If macros are used in the definition, it is assumed that if it appears in a pattern context, variable names it matched shall be bound — or otherwise just @@ -172,11 +205,23 @@ macro was in place. In any place where variable name is used in a pattern, it can be substituted by an underscore `_` to disregard the value without introducing any identifier. +Any line with expression `foo` can be replaced with `_ = foo`, unless its value +was used. + +The assignment expression does not yield the value it assigned. A single line can contain at most one assignment. The following code is not valid: ``` -foo = bar = baz # invalid! +# invalid! +foo = bar = baz +``` + +Also, assignment can appear only as the root expression in the line. The +following is not valid: +``` +# invalid! +(foo = bar) + 2 ``` @@ -187,8 +232,8 @@ Example: ``` «foo» = 2 bar = - «foo» = 5 # shadowing - «a» = »foo« + 5 # refers to `foo` from line above + foo = 5 # shadowing + a = foo + 5 # refers to `foo` from line above # here `foo` is `2` again ``` @@ -239,20 +284,6 @@ Introduces name `hello` being an extension method defined on `a`. In this position `a` will denote practically "any type" but is visible only in the definition body (as it appears as the type of implicit `this` parameter). ---- -TODO: -Example: -``` -foo = a -> - a = 5 -``` -TODO: Does the `a = 5` is shadowing `a` from lambda argument? Or is this -multiple definition error? If the block introduces scope, it should shadow. If -however lambda body is THE scope where argument is introduced and is same as -lambda code block, it should be an error. -Or should it get some special rule? - ---- ### Function definitions If the assignment's left-hand side is a prefix application chain, where the @@ -291,18 +322,21 @@ function is a constructor name, it will be desugared into a pattern match. Example: ``` »Some« «value» = »get_opt« +tail… ``` will be desugared into: ``` -«value» = case »get_opt« of - »Some« b -> b - _ -> error +»get_opt« >>= case of + »Some« «value» -> tail… + _ -> error ``` Therefore, only `value` will be introduced into the parent scope. `Some` and `get_opt` must be defined, the former being an atom with at least single field. +"At least", because language allows omitting ignored trailing fields of the +constructor. i.e. matching `Some` is the same as matching `Some _`. Using operators in the infix position will also attempt to pattern match its operands. For example: @@ -310,10 +344,6 @@ operands. For example: «x»,«y» = »get_position« ``` -TODO Describe what it gets desugared to when there is more than one variable -name introduced? Now it is inconsistent, if we say that only canonical form of -assignment is `variable_name = value`. - ### Extensions methods If the application target uses accessor operator `.`, e.g. `Int.add`, the last segment of target is the introduced identifier and the previous segments are @@ -328,7 +358,8 @@ translated to: «bar» this:»Foo« = 5 ``` -Which is then desugared into a lambda. The introduced name is only `bar`. +Which is then desugared into a lambda. Only the `bar` identifier is introduced +to the graph's scope. If there are any prefix application arguments following the accessor-style target, they will be treated as arguments following implicit `this`. @@ -352,9 +383,12 @@ In this case `a` and `b` for each `foo` definition will be inferred by the compiler. If they end up being different types, overloads are valid. If they are the same, an error will be raised. +Argument named `this` may appear only as the first argument. It is not allowed +to explicitly use it when using the extension method syntax. + ### Root scope assignments -Any binding in the root scope gets an implicit `this` parameter that describes a +Any binding in the root scope gets an implicit `here` parameter that describes a module. Example: ``` main = print "Hello" @@ -369,8 +403,8 @@ appeared in any non-root code block. argument (lambdas are always unary) and the right-hand side is its body. Right-hand side, i.e. the lambda body, introduces its own scope. -If the lambda is *not* introduced in what is already a pattern context, the `->`'s pattern introduces identifiers into the scope of the right-hand -side. +If the lambda is *not* introduced in what is already a pattern context, the +`->`'s pattern introduces identifiers into the scope of the right-hand side. Example: ``` @@ -395,14 +429,33 @@ expression `b`. However, now it is visible in the outer lambda body and can be accessed. The only externally provided identifier must be `default` method. --- -Lambdas may not appear in the assignment's pattern (i.e. values cannot be -pattern-matched into lambda). + +Example: +``` +foo = a -> + a = 5 +``` + +Here we have lambda taking parameter `a` and shadowing in its body. Because +lambda's block is a scope of its own, the argument can be shadowed. + +--- + +Lambdas may not appear in the pattern position — so they cannot appear on the +left-hand side of an arrow or assignment operator. So the following is not valid: ``` +# invalid a -> b = foo ``` +Nor is this: +``` +# invalid +(a -> b) -> a.default +``` + --- Example: ``` @@ -420,6 +473,9 @@ case. ## Type ascription +NOTE: The type ascription operator is not supported in the first release +timeline and its exact specification is still work in progress. + The type ascription operator `:` introduces pattern context for its right hand side. The basic form is `value:type`. It says that `value` be of the given `type`, i.e. that all its possible values belong to the set of atoms represented @@ -493,6 +549,104 @@ TODO: TODO: `b:A` — does this add constraint to `a` or just `b` ? +# Advanced desugaring +This section provides more examples of desugaring for various code constructs. +This should give a better understanding of why the rules are as presented. + +Note that the desugaring translated below is very low-level. + +All assignments and code blocks are removed. + + +## Out of order variable usage in block +If within a block an identifier is used in the line before it is assigned, the +`fix` function appropriate for the block's monadic context will be introduced. + +For example: + +``` +test a = + f y + y = g a +``` + +Here `y` is used in the line before it is evaluated. This code after all +desugaring can be treated as following: + +``` +test = a -> + fix + (y' -> + f y' + g a + ) +``` + +## Code blocks +Code blocks are desugared into chains of monadic binds. + +``` +foo = + a = expr1 + expr2 +``` + +Is equivalent to: +``` +foo = + expr1 >>= (a -> expr2) +``` + +Where `>>=` is the monadic bind operator (as in Haskell). + +--- + +If the first line in block is not assignment, it is treated as if it was +assigning into the underscore pattern. + +``` +test = + expr1 + expr2 +``` + +Is same as: + +``` +test = + _ = expr1 + expr2 +``` + +And can be desugared to: + +``` +test = + expr1 >>= (_ -> expr2) +``` + +Which can be also written as +``` +test = + expr1 >> expr2 +``` + +--- + + +If the trailing block line is assignment, it will be bound into `Nothing`: +``` +foo = + pat1 = expr1 +``` +Translates into: + +``` +foo = + expr1 >>= (pat1 -> Nothing) +``` + + # Current engine limitations Note: "current" means "in the scope of the first alpha release of enso", not "at the moment of writing this document". @@ -512,13 +666,14 @@ and a.foo = print "hello" ``` -are equivalent, engine currently supports only the latter. +are equivalent, engine currently supports only the latter. IDE can assume that + all extension methods will be introduced using the `Type.name` syntax sugar. + +Also, engine currently requires that all methods are defined in the module's +root scope. -IDE can assume that all extension methods will be introduced using the -`Type.name` syntax sugar. +// TODO interaction between implicit `this` and implicit `here`. -// TODO what if non-first argument is named `this` ? Is the magic happening only -for this particular name? Is it sensitive for its position in the arguments list? // TODO What happens if the `this`-taking function in defined in the root scope where already `this` is implicitly provided? What about taking `this` in a method defined using the sugared syntax? (e.g. `Int.print this:Int = ...`) @@ -570,9 +725,9 @@ Some simplifications are currently assumed: possible. (we will often visualize programs that are in progress of editing). * For the first release IDE can disregard the type ascription operator (`:`). -// TODO: Specify what is exactly the graph's scope? Is this a lambda body scope -or the scope introduced by the block following the lambda? Likely both of these -should be somehow coalesced, to avoid issues with definitions with inline bodies. + +Graph's scope is either scope of the code block (if there is one) or of the +lambda. // TODO: Actually, can we display graphs for argument-less blocks being node bodies? Scoping could get quite strange then. @@ -595,3 +750,4 @@ the order of topologically sorted nodes from the graph. (when possible) In future this behavior should be depend on definition's monadic context provided by the language server. +