-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathNOTES
548 lines (335 loc) · 14.8 KB
/
NOTES
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
This project has been going under the eponymous "scheme in c" name for a while
now, but I think it's time I gave it a cooler codename. This is, of course,
better to refer to and like such as.
The problem is that I am absolutely terrible at naming things. After a not-so-
brief stint with naming all my computers after elements, I seem to have
settled on the places-in-songs theme for my computers.
The problem is that this is not a computer, and should not be named as such.
Although - going through songs might not be a bad idea for names.
Possibilities:
- zap (after EJ's song - short and poignant)
Actually, I like zap. This project is now known as Zap/zap. I should decide
on capitalization. Since this isn't a serious project, a serious name doesn't
work. Lower z it is.
Yikes -- apparently someone is watching this on GitHub. I don't really know
what that means. Hopefully there's nothing too embarassing in here, but it is
a rather open journal of my random thoughts. It appears that he also started
watching two other scheme projects, which indicates that I'm not singled out.
I do like the new reader that I wrote (it's spun off into parse.c) and it
seems fairly robust for my purposes. The only improvements I can see are
adding support for more types (which is a must) and potentially allowing
extension of the reader from inside scheme, like the read macros of CL - but I
think that's a long way off.
Now, back to continuations...
OK, so I'm back on Valencia - albeit with a complete disk wipe. I'd been
thinking about various continuation implementation strategies before, but
after reading K&R I've decided to go for some revamps of non-continuation
related areas of this program. Specifically, making the reader more robust.
K&R introduced some functions for reading in strings that could prove more
useful and robust.
Specifically, of course, I'm talking about the scanf function. It already
ignores whitespace, and it might be able to be used to get the tokens from
the string.
Here is a basic outline for a more "robust" reader function:
- read a line
- check () balance (and "" balance)
- if too many (, read another line
- if too many ), error out
- flatten whitespace (perhaps insert \0)
- flatten comments (from ; to \n)
- convert case (toupper/tolower)
- begin recursive parse
legit input to the parser will be of the form
<token> := '()' | '(' <token-list> ')'
<token-list> := <token> | <token> ' ' <token-list>
in other words, spaces only appear where necessary. is this the best approach?
I mentioned an idea - inserting \0 to delimit tokens - this would include
the ( and ). This would probably be the best bet, actually
- parsing will be able to be guaranteed a "legit" input
- parsing will no longer rely on manipulating the char**, instead a
global variable will be used.
(define (square x) (* x x))
I had thought about doing a read, formatting the buffer, then doing a parse.
It might be better to have parse call read as needed. This has the advantage
that the buffer can be modified at will. However, the current parse relies on
the ability to point to arbitrary objects returned from past read calls. So
no, it would not be a good idea to merge these two into one.
The read goes something like this:
- we read in characters from the input buffer, then write them to the output
buffer
- each time we go through the loop, we process one character of input - would
it be better to use something like getchar()? Perhaps so.
- so we get the next character, perform some special formatting on the output
buffer, and write what we want in there
- each loop pass will assume that the buffer is in a position to be written
directly to - no formatting will be needed to pad the output at the front
- so we loop while... what?
- if we don't start with a paren, we're reading in an atom
- if we do start with a paren, we need to read until we reach its closing
paren
- these can be generalized into one statement:
(c = getchar()) != EOF &&
looping continues while:
- we got a real char (exit with error if we don't)
- we are in parens OR we read a non-space OR we are in a string
(c != EOF) && (parens || string || !isspace(c))
int c = getchar();
(c = getchar()) != EOF
while ( (c = getchar()) != EOF && (parens || string || !isspace(c)) ) {
}
/* check for breaking condition */
if (c == EOF && (paren || string)) {
/* error, we read to the end and we're still inside a
paren/string */
}
else {
/* we assume we're simply done reading */
}
OK, I need to slow down for a minute - what I need to do is figure out how
to handle the spacing issue for the parenthesis. I want to be as generous
as possible when it comes to spacing, so an expression like this:
(+(* 2 3)4)
should return 10. the lack of spaces around (* 2 3) should be acceptable.
so this means that we have to have some interesting parsing rules when it
comes to handling parens. first, a paren needs to be padded, but simply
inserting "\0(\0" for ( is not possible.
one thing that I did "standardize" on is that every pass through the loop
the write pointer is pointing at a slot to write the next item (not null)
I have also incorporated some "look ahead" into the algorithm already, and
am largely convinced that this is how to implement the rest of this.
- read next char into c
- check if we're in a string
- if c is \, look ahead into c
- if EOF break
- else write both chars, continue
- else write char, continue
- check if whitespace
- continue
- check if char is ( or )
- write and pad right
- check if char is "
- set string to !string
- write char
- check next char
- if whitespace or ( or ), pad
OK, fuck it all, let's go back to a completely generative approach
this will actually fit well with my parse algorithm - all I need to do is
replace next with a suitable replacement. unfortunately, the current parse
assumes that it can keep the old pointers while having next generate the new
ones. what this means is that I'll have to either rewrite my parse to avoid
this behavior (perhaps copying the string to local storage) or take care not
to rewrite the old value
so this is what we have to do
we say [read/parse x]
we read the first token from x and pass it to parse
parse can, at its discretion, read more tokens from x
so we have basically 3 functions:
parse - parses the token, optionally getting more
read - reads the first token and passes it to parse
next - gets next token
atom *read(FILE *input) [sets global FILE*]
atom *parse() [uses global char*]
void *next() [uses global FILE*, sets global char*]
FILE *i;
char *p;
atom *read(FILE *input) {
i = input;
next();
return parse();
}
char readchar() {
int c = getc(i);
if (c != EOF)
return c;
printf("read error: unexpected EOF\n");
exit(1);
}
char *next() {
static char buff[BUFFSIZE];
char *c = buff;
while (isspace(*c = readchar()));
if (c != ')' && c != '(')
while( !isspace(*(c+1) = readchar()) )
c++;
else if (c == '"')
while ( (*(++c) = readchar()) != '"' )
if (*c == '\\')
*(++c) = readchar();
*(c+1) = '\0';
return buff;
}
what this allows us to do is separate
what we're going to do is have
Now that I've had a disk crash, I have the opportunity to rethink the entire
architecture of this program.
First, though, I want to understand continuations.
OK, so the continuation represents the future of the computation. However,
there are examples that clearly show that a continuation also holds some state
as well.
OK, I just got it. This is the example I've been mulling over:
(define mondo-bizarro
(let ((k (call/cc (λ (c) c))))
(write 1)
(call/cc (λ (c) (k c)))
(write 2)
(call/cc (λ (c) (k c)))
(write 3)))
We can conceptualize each of these continuations individually, but it makes
much more sense when we realize that the continuation captured by every single
call/cc (except the initial one) is really capturing the same thread of
computation at different points.
k <- callcc(c -> c)
OK, k is now, for lack of a better term, itself. Call k with anything and you
get a new thread of computation where k is bound to that value.
All right:
write(1)
Writes "1" on the screen, no biggie.
callcc(c -> k(c))
What this does is it takes the current continuation and calls k with it. So
now what's happened is that we've entered a new thread of computation where
k is the "main" continuation.
write(1)
Same as before
callcc(c -> k(c))
OK, here what we've done is we're switching back to the main thread of
computation, because we're calling k with c
This is a critical point - NOTHING IS DONE WITH THE CURRENT CONTINUATION
Once we switch back to the main thread, that "alternate reality" doesn't
exist.
write(2)
Straightforward
callcc(c -> k(c))
This is, largely, the same as before - we enter another reality, print 1, and
reenter this reality.
write(3)
Yeah
And we're done
OK, so now that we've had a fresh dose of reality, what is required to
implement this behavior?
First, let's identify what we needed to keep track of. What this example
dramatically shows is a value having different results in different
continuations. Since the ccc happens within the let, this doesn't appear to be
an exception to the concept that the continuation represents the future of a
computation - the let binding happens automatically and doesn't need to be
checked within the continuation itself.
So what exactly is the future of the computation? It's the recursive call
you're stuck under and your position in a body of expressions (such as begin
or let), but I suspect that given a prudent implementation of begin and let
these two are really one and the same.
What if, on the way down a recursive call, we create the necessary
continuations? This is probably the only way to actually do this. Instead of
"returning" a value from a function, we just call the current continuation
with our result.
Example:
(define x 2)
(define y 3)
(+ x y)
This brings up a good point - at what point can we safely ignore the c? Atomic
level operations will have to be provided with the cc, but as they are atomic
they should be able to resolve their values without needing the cc. In fact,
the atomic operations are really the ones doing the heavy c lifting.
atom *cons(acons *args, nspace *n, cont *c) {
So the idea is that cons evaluates its two arguments
and returns a cons cell with them inside. What this means
is that the call to eval must be accompanied by the cc, here
*c, which indicates that it is in the middle of a cons
operation.
What seems inevitable is that each atomic operation must
also have continuation-compatible counterparts. As in, this
operation, cons, also has perhaps cons-car and cons-cdr, each
taking 1 argument.
Another option, perhaps, is to just fake all of this. We could
just keep a tether to the toplevel populated by data structures
that contain all the calling information. This way, if the
continuation is "invoked" the entire tree can be recalculated.
This is not a good idea and won't work.
The continuation has to be a function only in the sense that it
has to be invokable (with 1 argument).
}
invoke(cont *c, atom *val) {
ok, so we're invoking a continuation, what does it return?
}
Perhaps attempting to meld continuations into my existing infrastructure won't
work.
I think the only option I really have is to implement all of my functions in
the aforementioned continuation-friendly style.__
This is quite frustrating - I know what a continuation is, I just don't know
the best way to implement it!
A continuation represents the future of the computation.
A continuation, when invoked, always replaces the current continuation. This
means that it happens at the toplevel.
(+ 2 (* 3 (fact 4)))
^
at (fact 4) the continuation is (lambda (x) (+ 2 (* 3 x)))
yes, yes
this might work - just use the existing S-expressions?
Maybe I should focus on capturing the continuation now, and on using it later.
Or maybe not. This provides us with the context, but it does nothing for
modifying the recursive stack used.
Well I looked at the Wikipedia article for CPS and it seems that my concerns
are more or less justified. Continuations are hard to do right. Should my
lisp not include them? I'm not ready to make that sacrifice just yet.
OK, of all the ideas I've had so far, I like going ahead and modifying the
axioms to generate continuations. I probably should add some basic creation
functions for them as well.
It's actually not too bad to recursively create functions that illustrate the
current stack.
Order of parameter evaluation is deliberately unspecified in Scheme - there
should be no continuations that rely on it.
atom *cons(acons *args, nspace *n, cont *c) {
atom *car = eval(args->car, n),
*cdr = eval(args->cdr, n);
return c(newcons(car, cdr));
}
atom *cons-car(atom *car, nspace *n, cont *c) {
atom *cdr = eval(args->cdr, n, c, etc);
return c(newcons(car, cdr));
}
atom *cons-cdr(atom *cdr, nspace *n, cont *c) {
}
The relationship between eval and continuations
Without a call to eval, there is no recursive evaluation, and thus no
continuation. I suspect that this relationship is meaningful.
Here's what I worry about:
Scratch all that above - it seems that C is kind enough to provide some
functions setjmp and longjmp that enable us to save the runtime stack and
jump back into it when we want to. This is the basic behavior of the
continuation. One thing that I will have to consider is the return value of
this.
call/cc -> setjmp
[invoking cc] -> longjmp
call/cc:
- set var 0
- setjmp
- check cc->var
- if 0, proceed
- if 1, we're really in a longjmp
- set cc->var 1
- return eval of provided fun with the cc
[invoke]
- get & eval arg to be returned
- put in cc->arg
- longjmp
- check cc->var
- if 0, we're really in call/cc
- if 1, proceed
-
TODO create scheme init file, move all possible functions there
TODO increase reader robustness (newline)
TODO add support for rationals, floating point, and bigint
TODO add support for continuations
TODO add support for macros (common lisp style)
TODO add garbage collection
Macros
In this implementation, macros can be represented as functions. The trick is
to pass in the appropriate namespace. I think I'll just add a tmac
An analysis of the shortcuts taken so far:
If this is to evolve into an efficient interpreter, I will need to know which
portions I've cut corners on and attempt to make them more efficient.
Perhaps the most glaring aspect is the fact that every time an object is allocated, malloc() is called. In addition to this, you won't find a call to
free() anywhere. Programs that rely on the efficiency of the garbage collector
have already lost the efficiency battle, but we can still make sure that
memory usage remains reasonable.
In order to create a new object, that object has to come from somewhere. There
are two options in this regard: obtain memory from the OS every time, or keep
a pool of objects that can be used.