Procedures and constants to support text-encoding in the `UTF-8` character encoding.

Collection Info

View Source
Collection
core
Path
unicode/utf8
Entries
59

Source Files

Constants

26

LOCB #

Source
LOCB :: 0b1000_0000

The default lowest and highest continuation byte.

SURROGATE_HIGH_MAX #

Source
SURROGATE_HIGH_MAX :: 0xdbff

A high/leading surrogate is in range SURROGATE_MIN..SURROGATE_HIGH_MAX, A low/trailing surrogate is in range SURROGATE_LOW_MIN..SURROGATE_MAX.

ZERO_WIDTH_JOINER #

Source
ZERO_WIDTH_JOINER :: unicode.ZERO_WIDTH_JOINER

Types

4

Procedures

23

decode_grapheme_clusters #

Source
@(require_results)
decode_grapheme_clusters :: proc(str: string, track_graphemes: bool = true, allocator := context.allocator) -> (graphemes: [dynamic]Grapheme, grapheme_count: int, rune_count: int, width: int) {…}

Decode the individual graphemes in a UTF-8 string. *Allocates Using Provided Allocator* Inputs: - str: The input string. - track_graphemes: Whether or not to allocate and return `graphemes` with extra data about each grapheme. - allocator: (default: context.allocator) Returns: - graphemes: Extra data about each grapheme. - grapheme_count: The number of graphemes in the string. - rune_count: The number of runes in the string. - width: The width of the string in number of monospace cells.

decode_last_rune_in_bytes #

Source
@(require_results)
decode_last_rune_in_bytes :: proc "contextless" (s: []u8) -> (rune, int) {…}

decode_last_rune_in_string #

Source
@(require_results)
decode_last_rune_in_string :: proc "contextless" (s: string) -> (rune, int) {…}

decode_rune_in_bytes #

Source
@(require_results)
decode_rune_in_bytes :: proc "contextless" (s: []u8) -> (rune, int) {…}

full_rune_in_bytes #

Source
@(require_results)
full_rune_in_bytes :: proc "contextless" (b: []u8) -> bool {…}

full_rune_in_bytes reports if the bytes in b begin with a full utf-8 encoding of a rune or not An invalid encoding is considered a full rune since it will convert as an error rune of width 1 (RUNE_ERROR)

full_rune_in_string #

Source
@(require_results)
full_rune_in_string :: proc "contextless" (s: string) -> bool {…}

full_rune_in_string reports if the bytes in s begin with a full utf-8 encoding of a rune or not An invalid encoding is considered a full rune since it will convert as an error rune of width 1 (RUNE_ERROR)

grapheme_count #

Source
@(require_results)
grapheme_count :: proc(str: string) -> (graphemes, runes, width: int) {…}

Count the individual graphemes in a UTF-8 string. Inputs: - str: The input string. Returns: - graphemes: The number of graphemes in the string. - runes: The number of runes in the string. - width: The width of the string in number of monospace cells.

rune_count_in_bytes #

Source
@(require_results)
rune_count_in_bytes :: proc "contextless" (s: []u8) -> int {…}

rune_offset #

Source
@(require_results)
rune_offset :: proc "contextless" (s: string, pos: int, start: int = 0) -> int {…}

Returns the byte position of rune at position pos in s with an optional start byte position. Returns -1 if it runs out of the string.

Procedure Groups

4

full_rune #

Source
full_rune :: proc{
	full_rune_in_bytes,
	full_rune_in_string,
}

full_rune reports if the bytes in b begin with a full utf-8 encoding of a rune or not An invalid encoding is considered a full rune since it will convert as an error rune of width 1 (RUNE_ERROR)

Variables

2

accept_ranges #

Source
accept_ranges: [5]Accept_Range = [5]Accept_Range{{0x80, 0xbf}, {0xa0, 0xbf}, {0x80, 0x9f}, {0x90, 0xbf}, {0x80, 0x8f}}

accept_sizes #

Source
accept_sizes: [256]u8 = [256]u8{0x00 ..= 0x7f = 0xf0, 0x80 ..= 0xc1 = 0xf1, 0xc2 ..= 0xdf = 0x02, 0xe0 = 0x13, 0xe1 ..= 0xec = 0x03, 0xed = 0x23, 0xee ..= 0xef = 0x03, 0xf0 = 0x34, 0xf1 ..= 0xf3 = 0x04, 0xf4 = 0x44, 0xf5 ..= 0xff = 0xf1}