Struct regex_syntax::hir::Hir
source · pub struct Hir { /* private fields */ }
Expand description
A high-level intermediate representation (HIR) for a regular expression.
The HIR of a regular expression represents an intermediate step between its
abstract syntax (a structured description of the concrete syntax) and
compiled byte codes. The purpose of HIR is to make regular expressions
easier to analyze. In particular, the AST is much more complex than the
HIR. For example, while an AST supports arbitrarily nested character
classes, the HIR will flatten all nested classes into a single set. The HIR
will also “compile away” every flag present in the concrete syntax. For
example, users of HIR expressions never need to worry about case folding;
it is handled automatically by the translator (e.g., by translating (?i)A
to [aA]
).
If the HIR was produced by a translator that disallows invalid UTF-8, then the HIR is guaranteed to match UTF-8 exclusively.
This type defines its own destructor that uses constant stack space and heap space proportional to the size of the HIR.
The specific type of an HIR expression can be accessed via its kind
or into_kind
methods. This extra level of indirection exists for two
reasons:
- Construction of an HIR expression must use the constructor methods
on this
Hir
type instead of building theHirKind
values directly. This permits construction to enforce invariants like “concatenations always consist of two or more sub-expressions.” - Every HIR expression contains attributes that are defined inductively, and can be computed cheaply during the construction process. For example, one such attribute is whether the expression must match at the beginning of the text.
Also, an Hir
’s fmt::Display
implementation prints an HIR as a regular
expression pattern string, and uses constant stack space and heap space
proportional to the size of the Hir
.
Implementations§
source§impl Hir
impl Hir
sourcepub fn into_kind(self) -> HirKind
pub fn into_kind(self) -> HirKind
Consumes ownership of this HIR expression and returns its underlying
HirKind
.
sourcepub fn empty() -> Hir
pub fn empty() -> Hir
Returns an empty HIR expression.
An empty HIR expression always matches, including the empty string.
sourcepub fn literal(lit: Literal) -> Hir
pub fn literal(lit: Literal) -> Hir
Creates a literal HIR expression.
If the given literal has a Byte
variant with an ASCII byte, then this
method panics. This enforces the invariant that Byte
variants are
only used to express matching of invalid UTF-8.
sourcepub fn word_boundary(word_boundary: WordBoundary) -> Hir
pub fn word_boundary(word_boundary: WordBoundary) -> Hir
Creates a word boundary assertion HIR expression.
sourcepub fn repetition(rep: Repetition) -> Hir
pub fn repetition(rep: Repetition) -> Hir
Creates a repetition HIR expression.
sourcepub fn concat(exprs: Vec<Hir>) -> Hir
pub fn concat(exprs: Vec<Hir>) -> Hir
Returns the concatenation of the given expressions.
This flattens the concatenation as appropriate.
sourcepub fn alternation(exprs: Vec<Hir>) -> Hir
pub fn alternation(exprs: Vec<Hir>) -> Hir
Returns the alternation of the given expressions.
This flattens the alternation as appropriate.
sourcepub fn dot(bytes: bool) -> Hir
pub fn dot(bytes: bool) -> Hir
Build an HIR expression for .
.
A .
expression matches any character except for \n
. To build an
expression that matches any character, including \n
, use the any
method.
If bytes
is true
, then this assumes characters are limited to a
single byte.
sourcepub fn any(bytes: bool) -> Hir
pub fn any(bytes: bool) -> Hir
Build an HIR expression for (?s).
.
A (?s).
expression matches any character, including \n
. To build an
expression that matches any character except for \n
, then use the
dot
method.
If bytes
is true
, then this assumes characters are limited to a
single byte.
sourcepub fn is_always_utf8(&self) -> bool
pub fn is_always_utf8(&self) -> bool
Return true if and only if this HIR will always match valid UTF-8.
When this returns false, then it is possible for this HIR expression to match invalid UTF-8.
sourcepub fn is_all_assertions(&self) -> bool
pub fn is_all_assertions(&self) -> bool
Returns true if and only if this entire HIR expression is made up of zero-width assertions.
This includes expressions like ^$\b\A\z
and even ((\b)+())*^
, but
not ^a
.
sourcepub fn is_anchored_start(&self) -> bool
pub fn is_anchored_start(&self) -> bool
Return true if and only if this HIR is required to match from the
beginning of text. This includes expressions like ^foo
, ^(foo|bar)
,
^foo|^bar
but not ^foo|bar
.
sourcepub fn is_anchored_end(&self) -> bool
pub fn is_anchored_end(&self) -> bool
Return true if and only if this HIR is required to match at the end
of text. This includes expressions like foo$
, (foo|bar)$
,
foo$|bar$
but not foo$|bar
.
sourcepub fn is_line_anchored_start(&self) -> bool
pub fn is_line_anchored_start(&self) -> bool
Return true if and only if this HIR is required to match from the
beginning of text or the beginning of a line. This includes expressions
like ^foo
, (?m)^foo
, ^(foo|bar)
, ^(foo|bar)
, (?m)^foo|^bar
but not ^foo|bar
or (?m)^foo|bar
.
Note that if is_anchored_start
is true
, then
is_line_anchored_start
will also be true
. The reverse implication
is not true. For example, (?m)^foo
is line anchored, but not
is_anchored_start
.
sourcepub fn is_line_anchored_end(&self) -> bool
pub fn is_line_anchored_end(&self) -> bool
Return true if and only if this HIR is required to match at the
end of text or the end of a line. This includes expressions like
foo$
, (?m)foo$
, (foo|bar)$
, (?m)(foo|bar)$
, foo$|bar$
,
(?m)(foo|bar)$
, but not foo$|bar
or (?m)foo$|bar
.
Note that if is_anchored_end
is true
, then
is_line_anchored_end
will also be true
. The reverse implication
is not true. For example, (?m)foo$
is line anchored, but not
is_anchored_end
.
sourcepub fn is_any_anchored_start(&self) -> bool
pub fn is_any_anchored_start(&self) -> bool
Return true if and only if this HIR contains any sub-expression that
is required to match at the beginning of text. Specifically, this
returns true if the ^
symbol (when multiline mode is disabled) or the
\A
escape appear anywhere in the regex.
sourcepub fn is_any_anchored_end(&self) -> bool
pub fn is_any_anchored_end(&self) -> bool
Return true if and only if this HIR contains any sub-expression that is
required to match at the end of text. Specifically, this returns true
if the $
symbol (when multiline mode is disabled) or the \z
escape
appear anywhere in the regex.
sourcepub fn is_match_empty(&self) -> bool
pub fn is_match_empty(&self) -> bool
Return true if and only if the empty string is part of the language matched by this regular expression.
This includes a*
, a?b*
, a{0}
, ()
, ()+
, ^$
, a|b?
, \b
and \B
, but not a
or a+
.
sourcepub fn is_literal(&self) -> bool
pub fn is_literal(&self) -> bool
Return true if and only if this HIR is a simple literal. This is only
true when this HIR expression is either itself a Literal
or a
concatenation of only Literal
s.
For example, f
and foo
are literals, but f+
, (foo)
, foo()
,
`` are not (even though that contain sub-expressions that are literals).
sourcepub fn is_alternation_literal(&self) -> bool
pub fn is_alternation_literal(&self) -> bool
Return true if and only if this HIR is either a simple literal or an
alternation of simple literals. This is only
true when this HIR expression is either itself a Literal
or a
concatenation of only Literal
s or an alternation of only Literal
s.
For example, f
, foo
, a|b|c
, and foo|bar|baz
are alternation
literals, but f+
, (foo)
, foo()
, ``
are not (even though that contain sub-expressions that are literals).
Trait Implementations§
source§impl Display for Hir
impl Display for Hir
Print a display representation of this Hir.
The result of this is a valid regular expression pattern string.
This implementation uses constant stack space and heap space proportional
to the size of the Hir
.
source§impl Drop for Hir
impl Drop for Hir
A custom Drop
impl is used for HirKind
such that it uses constant stack
space but heap space proportional to the depth of the total Hir
.