NOTES

This project has been going under the eponymous "scheme in c" name for a while
now, but I think it's time I gave it a cooler codename. This is, of course,
better to refer to and like such as.

The problem is that I am absolutely terrible at naming things. After a not-so-
brief stint with naming all my computers after elements, I seem to have
settled on the places-in-songs theme for my computers.

The problem is that this is not a computer, and should not be named as such.

Although - going through songs might not be a bad idea for names.

Possibilities:
- zap (after EJ's song - short and poignant)

Actually, I like zap. This project is now known as Zap/zap. I should decide
on capitalization. Since this isn't a serious project, a serious name doesn't
work. Lower z it is.


Yikes -- apparently someone is watching this on GitHub. I don't really know
what that means. Hopefully there's nothing too embarassing in here, but it is
a rather open journal of my random thoughts. It appears that he also started
watching two other scheme projects, which indicates that I'm not singled out.

I do like the new reader that I wrote (it's spun off into parse.c) and it
seems fairly robust for my purposes. The only improvements I can see are
adding support for more types (which is a must) and potentially allowing
extension of the reader from inside scheme, like the read macros of CL - but I
think that's a long way off.

Now, back to continuations...


OK, so I'm back on Valencia - albeit with a complete disk wipe. I'd been
thinking about various continuation implementation strategies before, but
after reading K&R I've decided to go for some revamps of non-continuation
related areas of this program. Specifically, making the reader more robust.
K&R introduced some functions for reading in strings that could prove more
useful and robust.

Specifically, of course, I'm talking about the scanf function. It already
ignores whitespace, and it might be able to be used to get the tokens from
the string.

Here is a basic outline for a more "robust" reader function:

- read a line
- check () balance (and "" balance)
	- if too many (, read another line
	- if too many ), error out
- flatten whitespace (perhaps insert \0)
- flatten comments (from ; to \n)
- convert case (toupper/tolower)
- begin recursive parse

legit input to the parser will be of the form

<token> 		:= '()' | '(' <token-list> ')'
<token-list> 	:= <token> | <token> ' ' <token-list>

in other words, spaces only appear where necessary. is this the best approach?
I mentioned an idea - inserting \0 to delimit tokens - this would include
the ( and ). This would probably be the best bet, actually

- parsing will be able to be guaranteed a "legit" input
- parsing will no longer rely on manipulating the char**, instead a
  global variable will be used. 

(define (square x) (* x x))


I had thought about doing a read, formatting the buffer, then doing a parse.
It might be better to have parse call read as needed. This has the advantage
that the buffer can be modified at will. However, the current parse relies on
the ability to point to arbitrary objects returned from past read calls. So
no, it would not be a good idea to merge these two into one.

The read goes something like this:

- we read in characters from the input buffer, then write them to the output
  buffer
- each time we go through the loop, we process one character of input - would
  it be better to use something like getchar()? Perhaps so.
- so we get the next character, perform some special formatting on the output
  buffer, and write what we want in there
- each loop pass will assume that the buffer is in a position to be written
  directly to - no formatting will be needed to pad the output at the front
- so we loop while... what?
	- if we don't start with a paren, we're reading in an atom
	- if we do start with a paren, we need to read until we reach its closing
	  paren
	- these can be generalized into one statement:
	  (c = getchar()) != EOF && 
	
	looping continues while:
	- we got a real char (exit with error if we don't)
	- we are in parens OR we read a non-space OR we are in a string
	
	(c != EOF) && (parens || string || !isspace(c))
	
	
	int c = getchar();
	
	(c = getchar()) != EOF
	
	while ( (c = getchar()) != EOF && (parens || string || !isspace(c)) ) {
		
		
	}
	/* check for breaking condition */
	if (c == EOF && (paren || string)) {
		
		/* error, we read to the end and we're still inside a
		   paren/string */
	}
	else {
		/* we assume we're simply done reading */
		
	
	}


OK, I need to slow down for a minute - what I need to do is figure out how
to handle the spacing issue for the parenthesis. I want to be as generous
as possible when it comes to spacing, so an expression like this:
(+(* 2 3)4)
should return 10. the lack of spaces around (* 2 3) should be acceptable.

so this means that we have to have some interesting parsing rules when it
comes to handling parens. first, a paren needs to be padded, but simply
inserting "\0(\0" for ( is not possible.

one thing that I did "standardize" on is that every pass through the loop
the write pointer is pointing at a slot to write the next item (not null)

I have also incorporated some "look ahead" into the algorithm already, and
am largely convinced that this is how to implement the rest of this.

- read next char into c
- check if we're in a string
	- if c is \, look ahead into c
		- if EOF break
		- else write both chars, continue
	- else write char, continue
- check if whitespace
	- continue
- check if char is ( or )
	- write and pad right
- check if char is "
	- set string to !string
	- write char
- check next char
	- if whitespace or ( or ), pad
	
	
OK, fuck it all, let's go back to a completely generative approach
this will actually fit well with my parse algorithm - all I need to do is
replace next with a suitable replacement. unfortunately, the current parse
assumes that it can keep the old pointers while having next generate the new
ones. what this means is that I'll have to either rewrite my parse to avoid
this behavior (perhaps copying the string to local storage) or take care not
to rewrite the old value

so this is what we have to do
we say [read/parse x]
we read the first token from x and pass it to parse
parse can, at its discretion, read more tokens from x


so we have basically 3 functions:

parse - parses the token, optionally getting more
read  - reads the first token and passes it to parse
next  - gets next token

atom *read(FILE *input) [sets global FILE*]

atom *parse() [uses global char*]

void *next() [uses global FILE*, sets global char*]

FILE *i;
char *p;

atom *read(FILE *input) {
	i = input;
	next();
	return parse();
}

char readchar() {
	int c = getc(i);
	if (c != EOF)
		return c;
	printf("read error: unexpected EOF\n");
	exit(1);
}


char *next() {
	static char buff[BUFFSIZE];
	char *c = buff;
	while (isspace(*c = readchar()));
	
	if (c != ')' && c != '(')
		while( !isspace(*(c+1) = readchar()) )
			c++;
	
	else if (c == '"')
		while ( (*(++c) = readchar()) != '"' )
			if (*c == '\\')
				*(++c) = readchar();
	
	*(c+1) = '\0';

	return buff;
}


what this allows us to do is separate


what we're going to do is have


Now that I've had a disk crash, I have the opportunity to rethink the entire
architecture of this program.

First, though, I want to understand continuations.

OK, so the continuation represents the future of the computation. However,
there are examples that clearly show that a continuation also holds some state
as well.

OK, I just got it. This is the example I've been mulling over:

(define mondo-bizarro
  (let ((k (call/cc (λ (c) c))))
    (write 1)
    (call/cc (λ (c) (k c)))
    (write 2)
    (call/cc (λ (c) (k c)))
    (write 3)))

We can conceptualize each of these continuations individually, but it makes
much more sense when we realize that the continuation captured by every single
call/cc (except the initial one) is really capturing the same thread of
computation at different points.

k <- callcc(c -> c)

OK, k is now, for lack of a better term, itself. Call k with anything and you
get a new thread of computation where k is bound to that value.

All right:

write(1)

Writes "1" on the screen, no biggie.

callcc(c -> k(c))

What this does is it takes the current continuation and calls k with it. So
now what's happened is that we've entered a new thread of computation where
k is the "main" continuation.

write(1)

Same as before

callcc(c -> k(c))

OK, here what we've done is we're switching back to the main thread of
computation, because we're calling k with c

This is a critical point - NOTHING IS DONE WITH THE CURRENT CONTINUATION
Once we switch back to the main thread, that "alternate reality" doesn't
exist.

write(2)

Straightforward

callcc(c -> k(c))

This is, largely, the same as before - we enter another reality, print 1, and
reenter this reality.

write(3)

Yeah

And we're done


OK, so now that we've had a fresh dose of reality, what is required to
implement this behavior?

First, let's identify what we needed to keep track of. What this example
dramatically shows is a value having different results in different
continuations. Since the ccc happens within the let, this doesn't appear to be
an exception to the concept that the continuation represents the future of a
computation - the let binding happens automatically and doesn't need to be
checked within the continuation itself.

So what exactly is the future of the computation? It's the recursive call
you're stuck under and your position in a body of expressions (such as begin
or let), but I suspect that given a prudent implementation of begin and let
these two are really one and the same.

What if, on the way down a recursive call, we create the necessary
continuations? This is probably the only way to actually do this. Instead of
"returning" a value from a function, we just call the current continuation
with our result.

Example:

(define x 2)
(define y 3)
(+ x y)

This brings up a good point - at what point can we safely ignore the c? Atomic
level operations will have to be provided with the cc, but as they are atomic
they should be able to resolve their values without needing the cc. In fact,
the atomic operations are really the ones doing the heavy c lifting.

atom *cons(acons *args, nspace *n, cont *c) {

	So the idea is that cons evaluates its two arguments
	and returns a cons cell with them inside. What this means
	is that the call to eval must be accompanied by the cc, here
	*c, which indicates that it is in the middle of a cons
	operation.
	
	What seems inevitable is that each atomic operation must
	also have continuation-compatible counterparts. As in, this
	operation, cons, also has perhaps cons-car and cons-cdr, each
	taking 1 argument.
	
	
	Another option, perhaps, is to just fake all of this. We could
	just keep a tether to the toplevel populated by data structures
	that contain all the calling information. This way, if the
	continuation is "invoked" the entire tree can be recalculated.
	This is not a good idea and won't work.
	
	
	The continuation has to be a function only in the sense that it
	has to be invokable (with 1 argument).
	
}

invoke(cont *c, atom *val) {

	ok, so we're invoking a continuation, what does it return? 


}


Perhaps attempting to meld continuations into my existing infrastructure won't
work.

I think the only option I really have is to implement all of my functions in
the aforementioned continuation-friendly style.__
 

This is quite frustrating - I know what a continuation is, I just don't know
the best way to implement it!

A continuation represents the future of the computation.

A continuation, when invoked, always replaces the current continuation. This
means that it happens at the toplevel.

(+ 2 (* 3 (fact 4)))
          ^

at (fact 4) the continuation is (lambda (x) (+ 2 (* 3 x)))

yes, yes

this might work - just use the existing S-expressions?

Maybe I should focus on capturing the continuation now, and on using it later.
Or maybe not. This provides us with the context, but it does nothing for
modifying the recursive stack used.

Well I looked at the Wikipedia article for CPS and it seems that my concerns
are more or less justified. Continuations are hard to do right. Should my
lisp not include them? I'm not ready to make that sacrifice just yet.

OK, of all the ideas I've had so far, I like going ahead and modifying the
axioms to generate continuations. I probably should add some basic creation
functions for them as well.


It's actually not too bad to recursively create functions that illustrate the
current stack.

Order of parameter evaluation is deliberately unspecified in Scheme - there
should be no continuations that rely on it.


atom *cons(acons *args, nspace *n, cont *c) {

	atom	*car = eval(args->car, n),
			*cdr = eval(args->cdr, n);

	return c(newcons(car, cdr));
	
}

atom *cons-car(atom *car, nspace *n, cont *c) {

	atom *cdr = eval(args->cdr, n, c, etc);
	
	return c(newcons(car, cdr));

}

atom *cons-cdr(atom *cdr, nspace *n, cont *c) {


}

The relationship between eval and continuations

Without a call to eval, there is no recursive evaluation, and thus no
continuation. I suspect that this relationship is meaningful.

Here's what I worry about:


Scratch all that above - it seems that C is kind enough to provide some
functions setjmp and longjmp that enable us to save the runtime stack and
jump back into it when we want to. This is the basic behavior of the
continuation. One thing that I will have to consider is the return value of
this.

call/cc -> setjmp
[invoking cc] -> longjmp

call/cc:
- set var 0
- setjmp
- check cc->var
	- if 0, proceed
	- if 1, we're really in a longjmp
- set cc->var 1
- return eval of provided fun with the cc

[invoke]
- get & eval arg to be returned
- put in cc->arg
- longjmp
- check cc->var
	- if 0, we're really in call/cc
	- if 1, proceed
- 


TODO create scheme init file, move all possible functions there
TODO increase reader robustness (newline)
TODO add support for rationals, floating point, and bigint
TODO add support for continuations
TODO add support for macros (common lisp style)
TODO add garbage collection


Macros

In this implementation, macros can be represented as functions. The trick is
to pass in the appropriate namespace. I think I'll just add a tmac 


An analysis of the shortcuts taken so far:

If this is to evolve into an efficient interpreter, I will need to know which
portions I've cut corners on and attempt to make them more efficient.

Perhaps the most glaring aspect is the fact that every time an object is allocated, malloc() is called. In addition to this, you won't find a call to
free() anywhere. Programs that rely on the efficiency of the garbage collector
have already lost the efficiency battle, but we can still make sure that
memory usage remains reasonable.

In order to create a new object, that object has to come from somewhere. There
are two options in this regard: obtain memory from the OS every time, or keep
a pool of objects that can be used.