A complete suite for using Regular Expressions to match and capture text. Regular expressions are used to describe how a piece of text can match to another, using a pattern language. Odin's regex library implements the following features: Alternation: `apple|cherry` Classes: `[0-9_]` Classes, negated: `[^0-9_]` Shorthands: `\d\s\w` Shorthands, negated: `\D\S\W` Wildcards: `.` Repeat, optional: `a*` Repeat, at least once: `a+` Repetition: `a{1,2}` Optional: `a?` Group, capture: `([0-9])` Group, non-capture: `(?:[0-9])` Start & End Anchors: `^hello$` Word Boundaries: `\bhello\b` Non-Word Boundaries: `hello\B` These specifiers can be composed together, such as an optional group: `(?:hello)?` This package also supports the non-greedy variants of the repeating and optional specifiers by appending a `?` to them. Of the shorthand classes that are supported, they are all ASCII-based, even when compiling in Unicode mode. This is for the sake of general performance and simplicity, as there are thousands of Unicode codepoints which would qualify as either a digit, space, or word character which could be irrelevant depending on what is being matched. Here are the shorthand class equivalencies: \d: [0-9] \s: [\t\n\f\r ] \w: [0-9A-Z_a-z] If you need your own shorthands, you can compose strings together like so: MY_HEX :: "[0-9A-Fa-f]" PATTERN :: MY_HEX + "-" + MY_HEX The compiler will handle turning multiple identical classes into references to the same set of matching runes, so there's no penalty for doing it like this. ``Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.'' - Jamie Zawinski Regular expressions have gathered a reputation over the decades for often being chosen as the wrong tool for the job. Here, we will clarify a few cases in which RegEx might be good or bad. **When is it a good time to use RegEx?** - You don't know at compile-time what patterns of text the program will need to match when it's running. - As an example, you are making a client which can be configured by the user to trigger on certain text patterns received from a server. - For another example, you need a way for users of a text editor to compose matching strings that are more intricate than a simple substring lookup. - The text you're matching against is small (< 64 KiB) and your patterns aren't overly complicated with branches (alternations, repeats, and optionals). - If none of the above general impressions apply but your project doesn't warrant long-term maintenance. **When is it a bad time to use RegEx?** - You know at compile-time the grammar you're parsing; a hand-made parser has the potential to be more maintainable and readable. - The grammar you're parsing has certain validation steps that lend itself to forming complicated expressions, such as e-mail addresses, URIs, dates, postal codes, credit cards, et cetera. Using RegEx to validate these structures is almost always a bad sign. - The text you're matching against is big (> 1 MiB); you would be better served by first dividing the text into manageable chunks and using some heuristic to locate the most likely location of a match before applying RegEx against it. - You value high performance and low memory usage; RegEx will always have a certain overhead which increases with the complexity of the pattern. The implementation of this package has been optimized, but it will never be as thoroughly performant as a hand-made parser. In comparison, there are just too many intermediate steps, assumptions, and generalizations in what it takes to handle a regular expression.

Collection Info

View Source
Collection
core
Path
text/regex
Entries
22

Source Files

Types

9

Capture #

Source
Capture :: Capture

This struct corresponds to a set of string captures from a RegEx match. `pos` will contain the start and end positions for each string in `groups`, such that `str[pos[0][0]:pos[0][1]] == groups[0]`.

Match_Iterator #

Source
Match_Iterator :: Match_Iterator

An iterator to repeatedly match a pattern against a string, to be used with `*_iterator` procedures.

Procedures

11

create #

Source
@(require_results)
create :: proc(pattern: string, flags: bit_set[Flag] = {}, permanent_allocator := context.allocator, temporary_allocator := context.temp_allocator) -> (result: Regular_Expression, err: Error) {…}

Create a regular expression from a string pattern and a set of flags. *Allocates Using Provided Allocators* Inputs: - pattern: The pattern to compile. - flags: A `bit_set` of RegEx flags. - permanent_allocator: The allocator to use for the final regular expression. (default: context.allocator) - temporary_allocator: The allocator to use for the intermediate compilation stages. (default: context.temp_allocator) Returns: - result: The regular expression. - err: An error, if one occurred.

create_by_user #

Source
@(require_results)
create_by_user :: proc(pattern: string, permanent_allocator := context.allocator, temporary_allocator := context.temp_allocator) -> (result: Regular_Expression, err: Error) {…}

Create a regular expression from a delimited string pattern, such as one provided by users of a program or those found in a configuration file. They are in the form of: [DELIMITER] [regular expression] [DELIMITER] [flags] For example, the following strings are valid: /hellope/i #hellope#i •hellope•i つhellopeつi The delimiter is determined by the very first rune in the string. The only restriction is that the delimiter cannot be `\`, as that rune is used to escape the delimiter if found in the middle of the string. All runes after the closing delimiter will be parsed as flags: - 'm': Multiline - 'i': Case_Insensitive - 'x': Ignore_Whitespace - 'u': Unicode - 'n': No_Capture - '-': No_Optimization *Allocates Using Provided Allocators* Inputs: - pattern: The delimited pattern with optional flags to compile. - str: The string to match against. - permanent_allocator: The allocator to use for the final regular expression. (default: context.allocator) - temporary_allocator: The allocator to use for the intermediate compilation stages. (default: context.temp_allocator) Returns: - result: The regular expression. - err: An error, if one occurred.

create_iterator #

Source
create_iterator :: proc(str: string, pattern: string, flags: bit_set[Flag] = {}, permanent_allocator := context.allocator, temporary_allocator := context.temp_allocator) -> (result: Match_Iterator, err: Error) {…}

Create a `Match_Iterator` using a string to search, a regular expression to match against it, and a set of flags. *Allocates Using Provided Allocators* Inputs: - str: The string to iterate over. - pattern: The pattern to match. - flags: A `bit_set` of RegEx flags. - permanent_allocator: The allocator to use for the compiled regular expression. (default: context.allocator) - temporary_allocator: The allocator to use for the intermediate compilation and iteration stages. (default: context.temp_allocator) Returns: - result: The `Match_Iterator`. - err: An error, if one occurred.

destroy_capture #

Source
destroy_capture :: proc(capture: Capture, allocator := context.allocator) {…}

Free all data allocated by the `match_and_allocate_capture` procedure. *Frees Using Provided Allocator* Inputs: - capture: A `Capture`. - allocator: (default: context.allocator)

destroy_iterator #

Source
destroy_iterator :: proc(it: Match_Iterator, allocator := context.allocator) {…}

Free all data allocated by the `create_iterator` procedure. *Frees Using Provided Allocator* Inputs: - it: A `Match_Iterator` - allocator: (default: context.allocator)

destroy_regex #

Source
destroy_regex :: proc(regex: Regular_Expression, allocator := context.allocator) {…}

Free all data allocated by the `create*` procedures. *Frees Using Provided Allocator* Inputs: - regex: A regular expression. - allocator: (default: context.allocator)

match_and_allocate_capture #

Source
@(require_results)
match_and_allocate_capture :: proc(regex: Regular_Expression, str: string, permanent_allocator := context.allocator, temporary_allocator := context.temp_allocator) -> (capture: Capture, success: bool) {…}

Match a regular expression against a string and allocate the results into the returned `capture` structure. The resulting capture strings will be slices to the string `str`, not wholly copied strings, so they won't need to be individually deleted. *Allocates Using Provided Allocators* Inputs: - regex: The regular expression. - str: The string to match against. - permanent_allocator: The allocator to use for the capture results. (default: context.allocator) - temporary_allocator: The allocator to use for the virtual machine. (default: context.temp_allocator) Returns: - capture: The capture groups found in the string. - success: True if the regex matched the string.

match_iterator #

Source
match_iterator :: proc(it: ^Match_Iterator) -> (result: Capture, index: int, ok: bool) {…}

Iterate over a `Match_Iterator` and return successive captures. Inputs: - it: Pointer to the `Match_Iterator` to iterate over. Returns: - result: `Capture` for this iteration. - ok: A bool indicating if there was a match, stopping the iteration on `false`.

match_with_preallocated_capture #

Source
@(require_results)
match_with_preallocated_capture :: proc(regex: Regular_Expression, str: string, capture: ^Capture, temporary_allocator := context.temp_allocator) -> (num_groups: int, success: bool) {…}

Match a regular expression against a string and save the capture results into the provided `capture` structure. The resulting capture strings will be slices to the string `str`, not wholly copied strings, so they won't need to be individually deleted. *Allocates Using Provided Allocator* Inputs: - regex: The regular expression. - str: The string to match against. - capture: A pointer to a Capture structure with `groups` and `pos` already allocated. - temporary_allocator: The allocator to use for the virtual machine. (default: context.temp_allocator) Returns: - num_groups: The number of capture groups set into `capture`. - success: True if the regex matched the string.

preallocate_capture #

Source
@(require_results)
preallocate_capture :: proc(allocator := context.allocator) -> (result: Capture) {…}

Allocate a `Capture` in advance for use with `match`. This can save some time if you plan on performing several matches at once and only need the results between matches. Inputs: - allocator: (default: context.allocator) Returns: - result: The `Capture` with the maximum number of groups allocated.

reset #

Source
reset :: proc(it: ^Match_Iterator) {…}

Reset an iterator, allowing it to be run again as if new. Inputs: - it: The iterator to reset.

Procedure Groups

2