Struct regex_automata::Regex

source ·
pub struct Regex<D: DFA = DenseDFA<Vec<usize>, usize>> { /* private fields */ }
Expand description

A regular expression that uses deterministic finite automata for fast searching.

A regular expression is comprised of two DFAs, a “forward” DFA and a “reverse” DFA. The forward DFA is responsible for detecting the end of a match while the reverse DFA is responsible for detecting the start of a match. Thus, in order to find the bounds of any given match, a forward search must first be run followed by a reverse search. A match found by the forward DFA guarantees that the reverse DFA will also find a match.

The type of the DFA used by a Regex corresponds to the D type parameter, which must satisfy the DFA trait. Typically, D is either a DenseDFA or a SparseDFA, where dense DFAs use more memory but search faster, while sparse DFAs use less memory but search more slowly.

By default, a regex’s DFA type parameter is set to DenseDFA<Vec<usize>, usize>. For most in-memory work loads, this is the most convenient type that gives the best search performance.

Sparse DFAs

Since a Regex is generic over the DFA trait, it can be used with any kind of DFA. While this crate constructs dense DFAs by default, it is easy enough to build corresponding sparse DFAs, and then build a regex from them:

use regex_automata::Regex;

// First, build a regex that uses dense DFAs.
let dense_re = Regex::new("foo[0-9]+")?;

// Second, build sparse DFAs from the forward and reverse dense DFAs.
let fwd = dense_re.forward().to_sparse()?;
let rev = dense_re.reverse().to_sparse()?;

// Third, build a new regex from the constituent sparse DFAs.
let sparse_re = Regex::from_dfas(fwd, rev);

// A regex that uses sparse DFAs can be used just like with dense DFAs.
assert_eq!(true, sparse_re.is_match(b"foo123"));

Implementations§

Parse the given regular expression using a default configuration and return the corresponding regex.

The default configuration uses usize for state IDs, premultiplies them and reduces the alphabet size by splitting bytes into equivalence classes. The underlying DFAs are not minimized.

If you want a non-default configuration, then use the RegexBuilder to set your own configuration.

Example
use regex_automata::Regex;

let re = Regex::new("foo[0-9]+bar")?;
assert_eq!(Some((3, 14)), re.find(b"zzzfoo12345barzzz"));

Parse the given regular expression using a default configuration and return the corresponding regex using sparse DFAs.

The default configuration uses usize for state IDs, reduces the alphabet size by splitting bytes into equivalence classes. The underlying DFAs are not minimized.

If you want a non-default configuration, then use the RegexBuilder to set your own configuration.

Example
use regex_automata::Regex;

let re = Regex::new_sparse("foo[0-9]+bar")?;
assert_eq!(Some((3, 14)), re.find(b"zzzfoo12345barzzz"));

Returns true if and only if the given bytes match.

This routine may short circuit if it knows that scanning future input will never lead to a different result. In particular, if the underlying DFA enters a match state or a dead state, then this routine will return true or false, respectively, without inspecting any future input.

Example
use regex_automata::Regex;

let re = Regex::new("foo[0-9]+bar")?;
assert_eq!(true, re.is_match(b"foo12345bar"));
assert_eq!(false, re.is_match(b"foobar"));

Returns the first position at which a match is found.

This routine stops scanning input in precisely the same circumstances as is_match. The key difference is that this routine returns the position at which it stopped scanning input if and only if a match was found. If no match is found, then None is returned.

Example
use regex_automata::Regex;

let re = Regex::new("foo[0-9]+")?;
assert_eq!(Some(4), re.shortest_match(b"foo12345"));

// Normally, the end of the leftmost first match here would be 3,
// but the shortest match semantics detect a match earlier.
let re = Regex::new("abc|a")?;
assert_eq!(Some(1), re.shortest_match(b"abc"));

Returns the start and end offset of the leftmost first match. If no match exists, then None is returned.

The “leftmost first” match corresponds to the match with the smallest starting offset, but where the end offset is determined by preferring earlier branches in the original regular expression. For example, Sam|Samwise will match Sam in Samwise, but Samwise|Sam will match Samwise in Samwise.

Generally speaking, the “leftmost first” match is how most backtracking regular expressions tend to work. This is in contrast to POSIX-style regular expressions that yield “leftmost longest” matches. Namely, both Sam|Samwise and Samwise|Sam match Samwise when using leftmost longest semantics.

Example
use regex_automata::Regex;

let re = Regex::new("foo[0-9]+")?;
assert_eq!(Some((3, 11)), re.find(b"zzzfoo12345zzz"));

// Even though a match is found after reading the first byte (`a`),
// the leftmost first match semantics demand that we find the earliest
// match that prefers earlier parts of the pattern over latter parts.
let re = Regex::new("abc|a")?;
assert_eq!(Some((0, 3)), re.find(b"abc"));

Returns the same as is_match, but starts the search at the given offset.

The significance of the starting point is that it takes the surrounding context into consideration. For example, if the DFA is anchored, then a match can only occur when start == 0.

Returns the same as shortest_match, but starts the search at the given offset.

The significance of the starting point is that it takes the surrounding context into consideration. For example, if the DFA is anchored, then a match can only occur when start == 0.

Returns the same as find, but starts the search at the given offset.

The significance of the starting point is that it takes the surrounding context into consideration. For example, if the DFA is anchored, then a match can only occur when start == 0.

Returns an iterator over all non-overlapping leftmost first matches in the given bytes. If no match exists, then the iterator yields no elements.

Note that if the regex can match the empty string, then it is possible for the iterator to yield a zero-width match at a location that is not a valid UTF-8 boundary (for example, between the code units of a UTF-8 encoded codepoint). This can happen regardless of whether allow_invalid_utf8 was enabled or not.

Example
use regex_automata::Regex;

let re = Regex::new("foo[0-9]+")?;
let text = b"foo1 foo12 foo123";
let matches: Vec<(usize, usize)> = re.find_iter(text).collect();
assert_eq!(matches, vec![(0, 4), (5, 10), (11, 17)]);

Build a new regex from its constituent forward and reverse DFAs.

This is useful when deserializing a regex from some arbitrary memory region. This is also useful for building regexes from other types of DFAs.

Example

This example is a bit a contrived. The usual use of these methods would involve serializing initial_re somewhere and then deserializing it later to build a regex.

use regex_automata::Regex;

let initial_re = Regex::new("foo[0-9]+")?;
assert_eq!(true, initial_re.is_match(b"foo123"));

let (fwd, rev) = (initial_re.forward(), initial_re.reverse());
let re = Regex::from_dfas(fwd, rev);
assert_eq!(true, re.is_match(b"foo123"));

This example shows how you might build smaller DFAs, and then use those smaller DFAs to build a new regex.

use regex_automata::Regex;

let initial_re = Regex::new("foo[0-9]+")?;
assert_eq!(true, initial_re.is_match(b"foo123"));

let fwd = initial_re.forward().to_u16()?;
let rev = initial_re.reverse().to_u16()?;
let re = Regex::from_dfas(fwd, rev);
assert_eq!(true, re.is_match(b"foo123"));

This example shows how to build a Regex that uses sparse DFAs instead of dense DFAs:

use regex_automata::Regex;

let initial_re = Regex::new("foo[0-9]+")?;
assert_eq!(true, initial_re.is_match(b"foo123"));

let fwd = initial_re.forward().to_sparse()?;
let rev = initial_re.reverse().to_sparse()?;
let re = Regex::from_dfas(fwd, rev);
assert_eq!(true, re.is_match(b"foo123"));

Return the underlying DFA responsible for forward matching.

Return the underlying DFA responsible for reverse matching.

Trait Implementations§

Returns a copy of the value. Read more
Performs copy-assignment from source. Read more
Formats the value using the given formatter. Read more

Auto Trait Implementations§

Blanket Implementations§

Gets the TypeId of self. Read more
Immutably borrows from an owned value. Read more
Mutably borrows from an owned value. Read more

Returns the argument unchanged.

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

The resulting type after obtaining ownership.
Creates owned data from borrowed data, usually by cloning. Read more
Uses borrowed data to replace owned data, usually by cloning. Read more
The type returned in the event of a conversion error.
Performs the conversion.
The type returned in the event of a conversion error.
Performs the conversion.