reg.split
rex - regular expressions (Lua rexlib)
Lrexlib is a regular expression library for Lua 5.1. The makefiles provided build it into shared libraries called rex_posix.so and rex_pcre.so, which can be used with require or loadlib.
The library provides POSIX and PCRE regular expression matching:
Lrexlib provides bindings of the two principal regular expression library interfaces POSIX (http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html) and PCRE (http://www.pcre.org/pcre.txt) to Lua (http://www.lua.org) 5.1.
Lrexlib builds into shared libraries called by default rex_posix.so and rex_pcre.so, which can be used with require.
nil
(or omitted if it is trailing one), the library will then use the
default value for that argument.
This document uses the following syntax for optional arguments: they
are bracketed separately, and commas are left outside brackets, e.g.:
MyFunc (arg1, arg2, [arg3], [arg4])Throughout this document, the identifier rex is used in place of either rex_posix or rex_pcre, that are the default namespaces for the corresponding libraries. All functions receiving a regular expression pattern as an argument will generate an error if that pattern is found invalid by the used POSIX (http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html) / PCRE (http://www.pcre.org/pcre.txt) library.
rex.match
rex.match (subj, patt, [init], [cf], [ef], [lo])
The function searches for the first match of the regexp patt
in the
string subj
, starting from offset init
, subject to flags cf
and ef
.
PCRE: A locale lo
may be specified.
nil
.
Returns on success: 1. All substring matches (``captures''), in the
order they appear in the pattern. false
is returned for
sub-patterns that did not participate in the match. If the pattern
specified no captures then the whole matched substring is returned.
Returns on failure: 1. nil
. 2. The return value of the underlying
pcre_exec/regexec call (a number).
rex.find
rex.find (subj, patt, [init], [cf], [ef], [lo])
The function searches for the first match of the regexp patt
in the
string subj
, starting from offset init
, subject to flags cf
and ef
.
PCRE: A locale lo
may be specified.
subj
patt
init
cf
ef
lo
nil
.
Returns on success: 1. The start point of the match (a number). 2. The
end point of the match (a number). 3. All substring matches
(``captures''), in the order they appear in the pattern. false
is
returned for sub-patterns that did not participate in the match.
Returns on failure: 1. nil
. 2. The return value of the underlying
pcre_exec/regexec call (a number).
rex.gmatch
rex.gmatch (subj, patt, [cf], [ef], [lo])
The function returns an iterator for repeated matching of the pattern
patt
in the string subj
, subject to flags cf
and ef
.
PCRE: A locale lo may be specified.
subj
patt
cf
ef
lo
nil
.
The iterator function is called by Lua. On every iteration (that is, on every match), it returns all captures in the order they appear in the pattern (or the entire match if the pattern specified no captures). The iteration will continue till the subject fails to match.
rex.gsub
rex.gsub (subj, patt, repl, [n], [cf], [ef], [lo])
The function searches for all matches of the pattern patt
in the
string subj
and substitutes the found matches according to the
parameter repl
(see details below).
PCRE: A locale lo
may be specified.
subj
patt
repl
n
nil
.
cf
ef
lo
nil
.
Returns: 1. The subject string with the substitutions made. 2. Number of matches found.
The parameter repl
can be either a string, a function or a
table. The function behaves differently depending on the repl
type:
repl
is a string then it is treated as a template for
substitution, where the %X
occurences in repl
are handled in a
special way, depending on the value of the character X
:
X
represents a digit, then each %X
occurence is substituted
by the value of the X
-th submatch (capture), with the following
cases handled specially:
%0
is substituted by the entire match
if the pattern contains no captures, then each %1
is substituted by
the entire match
any other %X
where X
is greater than the number of captures in
the pattern will generate an error (``invalid capture index'')
if the pattern does contain a capture with number X
but that capture
didn't participate in the match, then %X
is substituted by an empty
string
X
is any non-digit character then %X
is substituted by X
all parts of repl other than %X
are copied to the output string
verbatim.
repl
is a function then it gets called on each match with the
submatches passed as parameters (if there are no submatches then the
entire match is passed as the only parameter). The substitution string
is derived depending on the first return value of function repl
:
nil
or false
then no substitution is
made.
values of other types generate an error.
Though gsub
is in general consistent with the API and behavior of
Lua's string.gsub, it has one extension with regards to
string.gsub
behavior:
repl
returns more than one value and its second return
value is the literal string ``break'', then gsub
stops searching for
further matches in the subject and returns.
repl
is a table then the first submatch (or the entire match if
there are no submatches) is used as the key and the value stored in
repl under that key is used for substitution depending on its type.
repl
described in the above
paragraph.
reg.split
rex.split (subj, sep, [cf], [ef], [lo])
This function is used for splitting a subject string subj
into
parts (sections). The sep
parameter is a regular expression pattern
representing separators between the sections.
The function returns an iterator for repeated matching of the pattern
sep
in the string subj
, subject to flags cf
and ef
.
PCRE: A locale lo
may be specified.
subj
sep
cf
ef
lo
nil
.
On every iteration pass, the iterator returns: 1. A subject section (can be an empty string), followed by 2. All captures in the order they appear in the sep pattern (or the entire match if the sep pattern specified no captures). If there is no match (this can occur only in the last iteration), then nothing is returned after the subject section.
The iteration will continue till the end of the subject. Unlike rex.gmatch, there will always be at least one iteration pass, even if there's no matches in the subject.
rex.plainfind
rex.plainfind (subj, patt, [init], [ci])
The function searches for the first match of the string patt
in the
subject subj
, starting from offset init
.
patt
is not regular expression, all its characters stand
for themselves.
subj
and patt
can have embedded zeros.
ci
specifies case-insensitive search (current locale is used).
subj
patt
init
ci
Returns on success: 1. The start point of the match (a number). 2. The end point of the match (a number).
Returns on failure: 1. nil
rex.new
rex.new (patt, [cf], [lo])
The functions compiles regular expression patt
into a regular
expression object whose internal representation is correspondent to
the library used (PCRE or POSIX regex). The returned result then can
be used by the methods r:tfind, r:exec and
r:dfa_exec. Regular expression objects are automatically garbage
collected.
PCRE: A locale lo
may be specified.
patt
cf
lo
nil
.
Returns: 1. Compiled regular expression (a userdata).
rex.flags
rex.flags ([tb])
This function returns a table containing numeric values of the constants defined by the used regex library (either PCRE or POSIX). Those constants are keyed by their names (strings). If the table argument tb is supplied then it is used as the output table, else a new table is created.
The constants contained in the returned table can then be used in most functions and methods where compilation flags or execution flags can be specified. They can also be used for comparing with return codes of some functions and methods for determining the reason of failure. For details, see PCRE (http://www.pcre.org/pcre.txt) and POSIX (http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap09.html) documentation.
Returns: 1. A table filled with the results.
rex.config
[PCRE only. See pcre_config in the PCRE (http://www.pcre.org/pcre.txt) docs.]
rex.config ([tb])
This function returns a table containing the values of the
configuration parameters used at PCRE library build-time. Those
parameters (numbers) are keyed by their names (strings). If the table
argument tb
is supplied then it is used as the output table, else a
new table is created.
tb
nil
.
Returns: 1. A table filled with the results.
rex.version
[PCRE only. See pcre_version in the PCRE (http://www.pcre.org/pcre.txt) docs.]
rex.version ()
This function returns a string containing the version of the used PCRE library and its release date.
r:tfind
r:tfind (subj, [init], [ef])
The method searches for the first match of the compiled regexp r
in
the string subj
, starting from offset init
, subject to execution
flags ef
.
r
subj
init
ef
Returns on success: 1. The start point of the match (a number). 2. The
end point of the match (a number). 3. Substring matches (``captures'' in
Lua terminology) are returned as a third result, in a table. This
table contains false
in the positions where the corresponding
sub-pattern did not participate in the match. PCRE: if named
subpatterns are used then the table also contains substring matches
keyed by their correspondent subpattern names (strings).
Returns on failure: 1. nil
. 2. The return value of the underlying
pcre_exec
/ regexec
call (a number).
Notes: If named subpatterns (see PCRE http://www.pcre.org/pcre.txt docs) are used then the returned table also contains substring matches keyed by their correspondent subpattern names (strings).
r:exec
r:exec (subj, [init], [ef])
The method searches for the first match of the compiled regexp r
in
the string subj
, starting from offset init
, subject to execution
flags ef
.
r
subj
init
ef
Returns on success: 1. The start point of the first match (a number).
2. The end point of the first match (a number). 3. The offsets of
substring matches (``captures'' in Lua terminology) are returned as a
third result, in a table. This table contains false
in the positions
where the corresponding sub-pattern did not participate in the match.
PCRE: if named subpatterns are used then the table also contains
substring matches keyed by their correspondent subpattern names
(strings).
Returns on failure: 1. nil
. 2. The return value of the underlying
pcre_exec
/ regexec
call (a number).
Example: If the whole match is at offsets 10,20 and substring matches are at offsets 12,14 and 16,19 then the function returns the following: 10, 20, { 12,14,16,19 }.
r:dfa_exec
[PCRE 6.0 and later. See pcre_dfa_exec
in the PCRE
(http://www.pcre.org/pcre.txt) docs.]
r:dfa_exec (subj, [init], [ef], [ovecsize], [wscount])
The method matches a compiled regular expression r
against a given
subject string subj
, using a DFA matching algorithm.
r
subj
init
ef
Returns on success: 1. The start point of the matches found (a number). 2. A table containing the end points of the matches found, the longer matches first. 3. The return value of the underlying pcre_dfa_exec call (a number).
Returns on failure: 1. nil
2. The return value of the underlying
pcre_dfa_exec
call (a number).
Example: If there are 3 matches found starting at offset 10 and ending at offsets 15, 20 and 25 then the function returns the following: 10, { 25,20,15 }, 3.
The following changes are incompatible with Lrexlib version 1.19:
rex.newPCRE
and rex.newPOSIX
renamed to rex.new
rex.flagsPCRE
and flagsPOSIX
renamed to rex.flags
rex.versionPCRE
renamed to rex.version
r:match
renamed to r:tfind
r:gmatch
removed (similar functionality is provided by
function rex.gmatch)
This is version 2.0.
by Reuben Thomas (rrt _ at _ sc3d.org) and Shmuel Zeigerman (shmuz _ at _ actcom.co.il) [maintainer]
Thanks to Thatcher Ulrich for bug and warning fixes. Thanks to Nick Gammon for adding support for PCRE named subpatterns.
Please report bugs and make suggestions to the maintainer, or use the LuaForge trackers and mailing lists.
Lrexlib is copyright Reuben Thomas 2000-2007 and copyright Shmuel Zeigerman 2004-2007, and is released under the MIT license, like Lua (see http://www.lua.org/copyright.html for the full license; it's basically the same as the BSD license). There is no warranty.