UTF2(4) MachTen Programmer’s Manual UTF2(4)
NAME
UTF2 - Universal character set Transformation Format
encoding of runes
SYNOPSIS
ENCODING "UTF2"
DESCRIPTION
The UTF2 encoding is based on a proposed X-Open multibyte
FSS-UCS-TF
(File System Safe Universal Character Set Transformation
Format) encoding
as used in Plan 9 from Bell Labs. Although it is capable of
representing
more than 16 bits, the current implementation is limited to
16 bits as
defined by the Unicode Standard.
UTF2 representation is backwards
compatible with ASCII, so 0x00-0x7f re-
fer to the ASCII character set. The multibyte encoding of
runes between
0x0080 and 0xffff consist entirely of bytes whose high order
bit is set.
The actual encoding is represented by the following
table:
[0x0000 - 0x007f]
[00000000.0bbbbbbb] -> 0bbbbbbb
[0x0080 - 0x03ff] [00000bbb.bbbbbbbb] -> 110bbbbb,
10bbbbbb
[0x0400 - 0xffff] [bbbbbbbb.bbbbbbbb] -> 1110bbbb,
10bbbbbb, 10bbbbbb
If more than a single
representation of a value exists (for example,
0x00; 0xC0 0x80; 0xE0 0x80 0x80) the shortest representation
is always
used (but the longer ones will be correctly decoded).
The final three encodings provided by X-Open:
[00000000.000bbbbb.bbbbbbbb.bbbbbbbb]
->
11110bbb, 10bbbbbb, 10bbbbbb, 10bbbbbb
[000000bb.bbbbbbbb.bbbbbbbb.bbbbbbbb]
->
111110bb, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb
[0bbbbbbb.bbbbbbbb.bbbbbbbb.bbbbbbbb]
->
1111110b, 10bbbbbb, 10bbbbbb, 10bbbbbb, 10bbbbbb,
10bbbbbb
which provides for the entire
proposed ISO-10646 31 bit standard are cur-
rently not implemented.
SEE ALSO
mklocale(1), setlocale(3)
4.4BSD September 10, 1996 1