=head1 NAME rex - regular expressions (Lua rexlib) =head1 OVERVIEW B is a regular expression library for Lua 5.1. The makefiles provided build it into shared libraries called rex_posix.so and rex_pcre.so, which can be used with require or loadlib. The library provides POSIX and PCRE regular expression matching: =head1 INTRODUCTION B provides bindings of the two principal regular expression library interfaces POSIX (L) and PCRE (L) to Lua (L) 5.1. B builds into shared libraries called by default I and I, which can be used with I. =head1 NOTES =over 4 =item 1. Most functions and methods in Lrexlib have mandatory and optional arguments. There are no dependencies between arguments in Lrexlib's functions and methods. Any optional argument can be supplied as C (or omitted if it is trailing one), the library will then use the default value for that argument. =item 2. This document uses the following syntax for optional arguments: they are bracketed separately, and commas are left outside brackets, e.g.: MyFunc (arg1, arg2, [arg3], [arg4]) =item 3. Throughout this document, the identifier I is used in place of either I or I, that are the default namespaces for the corresponding libraries. =item 4. All functions receiving a regular expression pattern as an argument will generate an error if that pattern is found invalid by the used POSIX (L) / PCRE (L) library. =back =head1 REFERENCE - Functions =head2 C rex.match (subj, patt, [init], [cf], [ef], [lo]) The function searches for the first match of the regexp C in the string C, starting from offset C, subject to flags C and C. PCRE: A locale C may be specified. =over 4 =item subj subject string n/a =item patt regular expression pattern. Type: string. Default: n/a. =item init start offset in the subject (can be negative). Type: number. Default: 1. =item cf compilation flags (bitwise OR). Type: number. Default: 0. =item ef execution flags (bitwise OR). Type: number. Default: 0. =item lo [PCRE] locale. Type: string. Default: C. =back Returns on success: 1. All substring matches ("captures"), in the order they appear in the pattern. C is returned for sub-patterns that did not participate in the match. If the pattern specified no captures then the whole matched substring is returned. Returns on failure: 1. C. 2. The return value of the underlying pcre_exec/regexec call (a number). =head2 C rex.find (subj, patt, [init], [cf], [ef], [lo]) The function searches for the first match of the regexp C in the string C, starting from offset C, subject to flags C and C. PCRE: A locale C may be specified. =over 4 =item C subject. Type: string. Default: n/a. =item C regular expression pattern. Type: string. Default: n/a. =item C start offset in the subject (can be negative). Type: number. Default: 1. =item C compilation flags (bitwise OR). Type: number. Default: 0. =item C execution flags (bitwise OR). Type: number. Default: 0. =item C [PCRE] locale. Type: string. Default: C. =back Returns on success: 1. The start point of the match (a number). 2. The end point of the match (a number). 3. All substring matches ("captures"), in the order they appear in the pattern. C is returned for sub-patterns that did not participate in the match. Returns on failure: 1. C. 2. The return value of the underlying I call (a number). =head2 C rex.gmatch (subj, patt, [cf], [ef], [lo]) The function returns an iterator for repeated matching of the pattern C in the string C, subject to flags C and C. PCRE: A locale lo may be specified. =over 4 =item C subject. Type: string. Default: n/a. =item C regular expression pattern. Type: string. Default: n/a. =item C compilation flags (bitwise OR). Type: number. Default: 0. =item C execution flags (bitwise OR). Type: number. Default: 0. =item C [PCRE] locale. Type: string. Default: C. =back The iterator function is called by Lua. On every iteration (that is, on every match), it returns all captures in the order they appear in the pattern (or the entire match if the pattern specified no captures). The iteration will continue till the subject fails to match. =head2 C rex.gsub (subj, patt, repl, [n], [cf], [ef], [lo]) The function searches for all matches of the pattern C in the string C and substitutes the found matches according to the parameter C (see details below). PCRE: A locale C may be specified. =over 4 =item C subject. Type: string. Default: n/a. =item C regular expression pattern. Type: string. Default: n/a. =item C substitution source. Type: string, function or table. Default: n/a. =item C maximum number of matches to search for; unlimited if not supplied. Type: number. Default: C. =item C compilation flags (bitwise OR). Type: number. Default: 0. =item C execution flags (bitwise OR). Type: number. Default: 0. =item C [PCRE] locale. Type: string. Default: C. =back Returns: 1. The subject string with the substitutions made. 2. Number of matches found. The parameter C can be either a string, a function or a table. The function behaves differently depending on the C type: =over =item 1 If C is a string then it is treated as a template for substitution, where the C<%X> occurences in C are handled in a special way, depending on the value of the character C: =over =item * if C represents a digit, then each C<%X> occurence is substituted by the value of the C-th submatch (capture), with the following cases handled specially: =over =item * each C<%0> is substituted by the entire match =item * if the pattern contains no captures, then each C<%1> is substituted by the entire match =item * any other C<%X> where C is greater than the number of captures in the pattern will generate an error ("invalid capture index") =item * if the pattern does contain a capture with number C but that capture didn't participate in the match, then C<%X> is substituted by an empty string =back =item * if C is any non-digit character then C<%X> is substituted by C =item * all parts of repl other than C<%X> are copied to the output string verbatim. =back =item 2. If C is a function then it gets called on each match with the submatches passed as parameters (if there are no submatches then the entire match is passed as the only parameter). The substitution string is derived depending on the first return value of function C: =over =item * if it is a string then it is used as a substitution for the current match. =item * if it is either of nothing, C or C then no substitution is made. =item * values of other types generate an error. =back Though C is in general consistent with the API and behavior of Lua's L, it has one extension with regards to C behavior: =over =item * if function C returns more than one value and its second return value is the literal string "break", then C stops searching for further matches in the subject and returns. =back =item 3. If C is a table then the first submatch (or the entire match if there are no submatches) is used as the key and the value stored in repl under that key is used for substitution depending on its type. =over =item * If no value is stored under the key but repl has a metatable with the __index field set then the correspondent metamethod will be called for obtaining the value. =item * The obtained value is used for the substitution following exactly same rules as for the first return value of C described in the above paragraph. =back =back =head1 C rex.split (subj, sep, [cf], [ef], [lo]) This function is used for splitting a subject string C into parts (sections). The C parameter is a regular expression pattern representing separators between the sections. The function returns an iterator for repeated matching of the pattern C in the string C, subject to flags C and C. PCRE: A locale C may be specified. =over =item C subject. Type: string. Default: n/a. =item C separator (regular expression pattern). Type: string. Default: n/a. =item C compilation flags (bitwise OR). Type: number. Default: 0. =item C execution flags (bitwise OR). Type: number. Default: 0. =item C [PCRE] locale. Type: string. Default: C. =back On every iteration pass, the iterator returns: 1. A subject section (can be an empty string), followed by 2. All captures in the order they appear in the sep pattern (or the entire match if the sep pattern specified no captures). If there is no match (this can occur only in the last iteration), then nothing is returned after the subject section. The iteration will continue till the end of the subject. Unlike L, there will always be at least one iteration pass, even if there's no matches in the subject. =head2 C rex.plainfind (subj, patt, [init], [ci]) The function searches for the first match of the string C in the subject C, starting from offset C. =over =item The string C is not regular expression, all its characters stand for themselves. =item Both strings C and C can have embedded zeros. =item The flag C specifies case-insensitive search (current locale is used). =item This function uses neither PCRE nor POSIX regex library. =back =over =item C subject. Type: string. Default: n/a. =item C text to find. Type: string. Default: n/a. =item C start offset in the subject (can be negative). Type: number. Default: 1. =item C case insensitive search. Type: boolean. Default: false. =back Returns on success: 1. The start point of the match (a number). 2. The end point of the match (a number). Returns on failure: 1. C =head2 C rex.new (patt, [cf], [lo]) The functions compiles regular expression C into a regular expression object whose internal representation is correspondent to the library used (PCRE or POSIX regex). The returned result then can be used by the methods L, L and L. Regular expression objects are automatically garbage collected. PCRE: A locale C may be specified. =over =item C regular expression pattern. Type: string. Default: n/a. =item C compilation flags (bitwise OR). Type: number. Default: 0. =item C [PCRE] locale. Type: string. Default: C. =back Returns: 1. Compiled regular expression (a userdata). =head2 C rex.flags ([tb]) This function returns a table containing numeric values of the constants defined by the used regex library (either PCRE or POSIX). Those constants are keyed by their names (strings). If the table argument tb is supplied then it is used as the output table, else a new table is created. The constants contained in the returned table can then be used in most functions and methods where I or I can be specified. They can also be used for comparing with return codes of some functions and methods for determining the reason of failure. For details, see PCRE (L) and POSIX (L) documentation. Returns: 1. A table filled with the results. =head2 C [PCRE only. See pcre_config in the PCRE (L) docs.] rex.config ([tb]) This function returns a table containing the values of the configuration parameters used at PCRE library build-time. Those parameters (numbers) are keyed by their names (strings). If the table argument C is supplied then it is used as the output table, else a new table is created. =over =item C a table for placing results into. Type: table. Default: C. =back Returns: 1. A table filled with the results. =head2 C [PCRE only. See pcre_version in the PCRE (L) docs.] rex.version () This function returns a string containing the version of the used PCRE library and its release date. =head1 REFERENCE - Methods =head2 C r:tfind (subj, [init], [ef]) The method searches for the first match of the compiled regexp C in the string C, starting from offset C, subject to execution flags C. =over =item C regex object produced by L. Type: userdata. Default: n/a. =item C subject. Type: string. Default: n/a. =item C start offset in the subject (can be negative). Type: number. Default: 1. =item C execution flags (bitwise OR). Type: number. Default: 0. =back Returns on success: 1. The start point of the match (a number). 2. The end point of the match (a number). 3. Substring matches ("captures" in Lua terminology) are returned as a third result, in a table. This table contains C in the positions where the corresponding sub-pattern did not participate in the match. PCRE: if I are used then the table also contains substring matches keyed by their correspondent subpattern names (strings). Returns on failure: 1. C. 2. The return value of the underlying C / C call (a number). Notes: If I (see PCRE I docs) are used then the returned table also contains substring matches keyed by their correspondent subpattern names (strings). =head2 C r:exec (subj, [init], [ef]) The method searches for the first match of the compiled regexp C in the string C, starting from offset C, subject to execution flags C. =over =item C regex object produced by L. Type: userdata. Default: n/a. =item C subject. Type: string. Default: n/a. =item C start offset in the subject (can be negative). Type: number. Default: 1. =item C execution flags (bitwise OR). Type: number. Default: 0. =back Returns on success: 1. The start point of the first match (a number). 2. The end point of the first match (a number). 3. The offsets of substring matches ("captures" in Lua terminology) are returned as a third result, in a table. This table contains C in the positions where the corresponding sub-pattern did not participate in the match. PCRE: if I are used then the table also contains substring matches keyed by their correspondent subpattern names (strings). Returns on failure: 1. C. 2. The return value of the underlying C / C call (a number). Example: If the whole match is at offsets 10,20 and substring matches are at offsets 12,14 and 16,19 then the function returns the following: 10, 20, { 12,14,16,19 }. =head2 C [PCRE 6.0 and later. See C in the PCRE (L) docs.] r:dfa_exec (subj, [init], [ef], [ovecsize], [wscount]) The method matches a compiled regular expression C against a given subject string C, using a DFA matching algorithm. =over =item C regex object produced by L. Type: userdata. Default: n/a. =item C subject. Type: string. Default: n/a. =item C start offset in the subject (can be negative). Default: number. Default: 1. =item C execution flags (bitwise OR). Type: number. Default: 0. =item ovecsize size of the array for result offsets. Type: number. Default: 100. =item wscount number of elements in the working space array. Type: number. Default: 50. =back Returns on success: 1. The start point of the matches found (a number). 2. A table containing the end points of the matches found, the longer matches first. 3. The return value of the underlying pcre_dfa_exec call (a number). Returns on failure: 1. C 2. The return value of the underlying C call (a number). Example: If there are 3 matches found starting at offset 10 and ending at offsets 15, 20 and 25 then the function returns the following: 10, { 25,20,15 }, 3. =head1 Incompatibilities with the Previous Version The following changes are incompatible with Lrexlib version 1.19: =over =item 1. Lua 5.1 is required =item 2. Functions C and C renamed to L =item 3. Functions C and C renamed to L =item 4. Function C renamed to L =item 5. Method C renamed to L =item 6. Method C removed (similar functionality is provided by function L) =item 7. Methods L and L: 2 values are returned on failure =item 8. Method L: the returned table may additionally contain named subpatterns (PCRE only) =back =head1 VERSION This is version 2.0. =head1 CREDITS by Reuben Thomas (rrt _ at _ sc3d.org) and Shmuel Zeigerman (shmuz _ at _ actcom.co.il) [maintainer] Thanks to Thatcher Ulrich for bug and warning fixes. Thanks to Nick Gammon for adding support for PCRE named subpatterns. =head1 CONTACT Please report bugs and make suggestions to the maintainer, or use the LuaForge trackers and mailing lists. =head1 LICENSE Lrexlib is copyright Reuben Thomas 2000-2007 and copyright Shmuel Zeigerman 2004-2007, and is released under the MIT license, like Lua (see http://www.lua.org/copyright.html for the full license; it's basically the same as the BSD license). There is no warranty.