A scanner and tokenizer for UTF-8-encoded text. It takes a string providing the source, which then can be tokenized through repeated calls to the scan procedure. For compatibility with existing tooling and languages, the NUL character is not allowed. If an UTF-8 encoded byte order mark (BOM) is the first character in the source, it will be discarded. By default, a Scanner skips white space and Odin comments and recognizes all literals defined by the Odin programming language specification. A Scanner may be customized to recognize only a subset of those literals and to recognize different identifiers and white space characters.

Collection Info

View Source
Collection
core
Path
text/scanner
Entries
29

Source Files

Constants

12

C_Like_Tokens #

Source
C_Like_Tokens :: Scan_Flags{.Scan_Idents, .Scan_Ints, .Scan_C_Int_Prefixes, .Scan_Floats, .Scan_Chars, .Scan_Strings, .Scan_Raw_Strings, .Scan_Comments, .Skip_Comments}

C_Whitespace #

Source
C_Whitespace :: Whitespace{'\t', '\n', '\r', '\v', '\f', ' '}

Odin_Like_Tokens #

Source
Odin_Like_Tokens :: Scan_Flags{.Scan_Idents, .Scan_Ints, .Scan_Floats, .Scan_Chars, .Scan_Strings, .Scan_Raw_Strings, .Scan_Comments, .Skip_Comments}

Odin_Whitespace #

Source
Odin_Whitespace :: Whitespace{'\t', '\n', '\r', ' '}

Odin_Whitespace is the default value for the Scanner's whitespace field

Types

5

Scanner #

Source
Scanner :: Scanner

Scanner allows for the reading of Unicode characters and tokens from a string

Procedures

12

init #

Source
init :: proc(s: ^Scanner, src: string, filename: string = "") -> ^Scanner {…}

init initializes a scanner with a new source and returns itself. error_count is set to 0, flags is set to Odin_Like_Tokens, whitespace is set to Odin_Whitespace

next #

Source
next :: proc(s: ^Scanner) -> rune {…}

next reads and returns the next Unicode character. It returns EOF at the end of the source. next does not update the Scanner's pos field. Use 'position(s)' to get the current position

peek #

Source
@(require_results)
peek :: proc(s: ^Scanner, n: int = 0) -> (ch: rune) {…}

peek returns the next Unicode character in the source without advancing the scanner It returns EOF if the scanner's position is at least the last character of the source if n > 0, it call next n times and return the nth Unicode character and then restore the Scanner's state

peek_token #

Source
@(require_results)
peek_token :: proc(s: ^Scanner, n: int = 0) -> (tok: rune) {…}

peek returns the next token in the source It returns EOF if the scanner's position is at least the last character of the source if n > 0, it call next n times and return the nth token and then restore the Scanner's state

position #

Source
@(require_results)
position :: proc(s: ^Scanner) -> Position {…}

position returns the position of the character immediately after the character or token returns by the previous call to next or scan Use the Scanner's position field for the most recently scanned token position

position_is_valid #

Source
@(require_results)
position_is_valid :: proc(pos: Position) -> bool {…}

position_is_valid reports where the position is valid

position_to_string #

Source
@(require_results)
position_to_string :: proc(pos: Position, allocator := context.temp_allocator) -> string {…}

scan #

Source
scan :: proc(s: ^Scanner) -> (tok: rune) {…}

scan reads the next token or Unicode character from source and returns it It only recognizes tokens for which the respective flag that is set It returns EOF at the end of the source It reports Scanner errors by calling s.error, if not nil; otherwise it will print the error message to os.stderr

token_string #

Source
@(require_results)
token_string :: proc(tok: rune, allocator: Allocator) -> string {…}

token_string returns a printable string for a token or Unicode character By default, it uses the context.temp_allocator to produce the string

token_text #

Source
@(require_results)
token_text :: proc(s: ^Scanner) -> string {…}

token_text returns the string of the most recently scanned token