From 8bc6202428e86ea3abc9ca31231a50d06aa915cd Mon Sep 17 00:00:00 2001 From: Anne van Kesteren Date: Thu, 14 Nov 2024 08:27:12 +0000 Subject: [PATCH] Define isomorphic string And define it alongside ASCII and scalar value strings. --- infra.bs | 59 +++++++++++++++++++++++++++++++------------------------- 1 file changed, 33 insertions(+), 26 deletions(-) diff --git a/infra.bs b/infra.bs index 88c9929..6561dd5 100644 --- a/infra.bs +++ b/infra.bs @@ -827,11 +827,13 @@ following steps return true:
+

To isomorphic decode a byte sequence input, return a string whose code point length is equal to input's length and whose code points have the same values as the values of input's bytes, in the same order. +

Code points

@@ -946,13 +948,25 @@ leaving them effectively as-is. would consist of the code points U+1F4A9 and U+D800.

A string's -length +length is the number of code units it contains.

A string's -code point length is the number +code point length is the number of code points it contains. +


+ +

To signify strings with additional restrictions on the code points they can contain +this specification defines ASCII strings, isomorphic strings, and +scalar value strings. Using these improves clarity in specifications. + +

An ASCII string is a string whose code points are all +ASCII code points. + +

An isomorphic string is a string whose code points are all in the +range U+0000 NULL to U+00FF (ÿ), inclusive. +

A scalar value string is a string whose code points are all scalar values. @@ -960,6 +974,8 @@ of code points it contains. where UTF-8 encode comes into play. +


+

To convert a string into a scalar value string, replace any surrogates with U+FFFD (�). @@ -1186,22 +1202,15 @@ from start to the end of a string string is the


-

To isomorphic encode a string input, run these steps:

- -
    -
  1. Assert: input contains no code points greater than U+00FF. - -

  2. Return a byte sequence whose length is equal to - input's code point length and whose bytes have the same - values as the values of input's - code points, in the same order. -

+
+

To isomorphic encode an isomorphic string input: return a +byte sequence whose length is equal to input's +code point length and whose bytes have the same values as +the values of input's code points, in the same order. +


-

An ASCII string is a string whose code points are all -ASCII code points. -

To ASCII lowercase a string, replace all ASCII upper alphas in the string with their corresponding code point in ASCII lower alpha. @@ -1213,28 +1222,26 @@ the string with their corresponding code point in ASCII upper a ASCII lowercase of B. -

To ASCII encode a string input, run these steps: - -

    -
  1. Assert: input is an ASCII string. - -

    Note: This precondition ensures that isomorphic encode and - UTF-8 encode return the same byte sequence for this input. +

    +

    To ASCII encode an ASCII string input: return the +isomorphic encoding of input. -

  2. Return the isomorphic encoding of input. -

+

Isomorphic encode and UTF-8 encode return the same byte sequence +for input. + +

To ASCII decode a byte sequence input, run these steps:

    -
  1. Assert: All bytes in input are ASCII bytes. +

  2. Assert: all bytes in input are ASCII bytes.

    Note: This precondition ensures that isomorphic decode and UTF-8 decode return the same string for this input.

  3. Return the isomorphic decoding of input.

- +