core/text/scanner package | Odin Programming Language

A scanner and tokenizer for UTF-8-encoded text. It takes a string providing the source, which then can be tokenized through repeated calls to the scan procedure. For compatibility with existing tooling and languages, the NUL character is not allowed. If an UTF-8 encoded byte order mark (BOM) is the first character in the source, it will be discarded. By default, a Scanner skips white space and Odin comments and recognizes all literals defined by the Odin programming language specification. A Scanner may be customized to recognize only a subset of those literals and to recognize different identifiers and white space characters.

Collection Info

Collection: core
Path: text/scanner
Entries: 29

Constants 12

Jump to Constants

C_Like_Tokens C_Whitespace Char Comment EOF Float Ident Int Odin_Like_Tokens Odin_Whitespace Raw_String String

Types 5

Position Scan_Flag Scan_Flags Scanner Whitespace

Procedures 12

Jump to Procedures

error errorf init next peek peek_token position position_is_valid position_to_string scan token_string token_text

Procedure Groups 0

Variables 0

Source Files

scanner.odin

Constants

12

C_Like_Tokens #

C_Like_Tokens :: Scan_Flags{.Scan_Idents, .Scan_Ints, .Scan_C_Int_Prefixes, .Scan_Floats, .Scan_Chars, .Scan_Strings, .Scan_Raw_Strings, .Scan_Comments, .Skip_Comments}

C_Whitespace #

C_Whitespace :: Whitespace{'\t', '\n', '\r', '\v', '\f', ' '}

Char #

Char :: -5

Comment #

Comment :: -8

EOF #

EOF :: -1

Float #

Float :: -4

Ident #

Ident :: -2

Int #

Int :: -3

Odin_Like_Tokens #

Odin_Like_Tokens :: Scan_Flags{.Scan_Idents, .Scan_Ints, .Scan_Floats, .Scan_Chars, .Scan_Strings, .Scan_Raw_Strings, .Scan_Comments, .Skip_Comments}

Odin_Whitespace #

Odin_Whitespace :: Whitespace{'\t', '\n', '\r', ' '}

Odin_Whitespace is the default value for the Scanner's whitespace field

Raw_String #

Raw_String :: -7

String #

String :: -6

Types

5

Position #

Position :: Position

Position represents a source position A position is valid if line > 0

Scan_Flag #

Scan_Flag :: Scan_Flag

Scan_Flags #

Scan_Flags :: distinct Scan_Flags

Scanner #

Scanner :: Scanner

Scanner allows for the reading of Unicode characters and tokens from a string

Whitespace #

Whitespace :: distinct Whitespace

Only allows for ASCII whitespace

Procedures

12

error #

error :: proc(s: ^Scanner, msg: string) {…}

errorf #

errorf :: proc(s: ^Scanner, format: string, args: ..any) {…}

init #

init :: proc(s: ^Scanner, src: string, filename: string = "") -> ^Scanner {…}

init initializes a scanner with a new source and returns itself. error_count is set to 0, flags is set to Odin_Like_Tokens, whitespace is set to Odin_Whitespace

next #

next :: proc(s: ^Scanner) -> rune {…}

next reads and returns the next Unicode character. It returns EOF at the end of the source. next does not update the Scanner's pos field. Use 'position(s)' to get the current position

peek #

@(require_results)

peek :: proc(s: ^Scanner, n: int = 0) -> (ch: rune) {…}

peek returns the next Unicode character in the source without advancing the scanner It returns EOF if the scanner's position is at least the last character of the source if n > 0, it call next n times and return the nth Unicode character and then restore the Scanner's state

peek_token #

@(require_results)

peek_token :: proc(s: ^Scanner, n: int = 0) -> (tok: rune) {…}

peek returns the next token in the source It returns EOF if the scanner's position is at least the last character of the source if n > 0, it call next n times and return the nth token and then restore the Scanner's state

position #

@(require_results)

position :: proc(s: ^Scanner) -> Position {…}

position returns the position of the character immediately after the character or token returns by the previous call to next or scan Use the Scanner's position field for the most recently scanned token position

position_is_valid #

@(require_results)

position_is_valid :: proc(pos: Position) -> bool {…}

position_is_valid reports where the position is valid

position_to_string #

@(require_results)

position_to_string :: proc(pos: Position, allocator := context.temp_allocator) -> string {…}

scan #

scan :: proc(s: ^Scanner) -> (tok: rune) {…}

scan reads the next token or Unicode character from source and returns it It only recognizes tokens for which the respective flag that is set It returns EOF at the end of the source It reports Scanner errors by calling s.error, if not nil; otherwise it will print the error message to os.stderr

token_string #

@(require_results)

token_string :: proc(tok: rune, allocator: Allocator) -> string {…}

token_string returns a printable string for a token or Unicode character By default, it uses the context.temp_allocator to produce the string

token_text #

@(require_results)

token_text :: proc(s: ^Scanner) -> string {…}

token_text returns the string of the most recently scanned token