Skip to main content

Regular Expressions

Regex Matching

Jactl provides two operators for doing regular expression (regex) find and regex subsitutions. The =~ operator is used for matching a string against a regex pattern. It does the equivalent of Java's Matcher.find() to check if a substring matches the given pattern. The regex pattern syntax is the same as that used by the Pattern class in Java so for detail information about how to use regular expressions and what is supported see the Javadoc for the Pattern class.

Some examples:

'abc' =~ 'b'                                         // true
'This should match' =~ '[A-Z][a-z]+ sho.l[a-f]' // true

Regular expression patterns are usually expressed in pattern strings to cut down on the number of backslashes needed. For example /\d\d\s+[A-Z][a-z]*\s+\d{4}/ is easier to read and write than '\d\d\s+[A-Z][a-z]*\s+\d{4}':

'24 Mar 2014' =~ '\\d\\d\\s+[A-Z][a-z]*\\s+\\d{4}'   // pattern written as standard string
'24 Mar 2014' =~ /\d\d\s+[A-Z][a-z]*\s+\d{4}/ // same pattern written as pattern string

The !~ operator tests that the string does not match the pattern:

'24 Mar 2014' !~ /\d\d\s+[A-Z][a-z]*\s+\d{4}/       // false

Modifiers

When using a regex string (a string delimited with /) you can append modifiers to the regex pattern to control the behaviour of the pattern match. For example by appending i you can make the pattern match case-insensitive:

'This is a sentence.' =~ /this/          // false
'This is a sentence.' =~ /this/i // true

More than on modifier can be appended and the order of the modifiers is unimportant. The supported modifiers are:

ModifierDescription
iPattern match will be case-insensitive.
mMulti-line mode: this makes ^ and $ match beginning and endings of lines rather than beginning and ending of entire string.
sDot-all mode: makes . match line terminator (by default . won't match end-of-line characters).
gGlobal find: remembers where previous pattern occurred and starts next find from previous location. This can be used to find all occurrences of a pattern within a string.
nIf a capture group is numeric then interpret as a number.
rRegex match: this forces the pattern string to be interpreted as a regex match for implicit matching (see below). For substitutions this makes the substitution non-destructive.

The g modifier for global find can be used to iterate over a string, finding all instances of a given pattern within the string:

def str = 'This Example Text Is Not Complex'
int i = 0
while (str =~ /Ex/ig) { i++ } // count occurrences of 'ex' (case-insensitive)

i // 3

Note that the 'g' modifier on a regex match is only supported within the condition of a while or for statement. If you try to use it in any other context you will get an error:

def x = 'abc'
x =~ /ab/g // error: Cannot use 'g' modifier outside of condition for while/for loop

The other limitation is that only one occurrence of a pattern with a 'g' modifier can appear within the condition expression. Other regex matches can be present in the condition expression as long as they don't use the 'g' global modifier:

def x = 'abc'; def y = 'xyz'; def i = 0
while (x =~ /[a-c]/g && y =~ /[x-z]/g) { i++ } // error: Regex match with global modifier can only occur once within while/for condition
while (x =~ /[a-c]/g && y =~ /[x-z]/) { i++ }
i // 3

These restrictions are due to what makes sense in terms of how capture variables work (see below) and due to the way in which Jactl saves the state of the match to remember where to continue from the next time.

The g modifier is more useful when used with capture variables as described in the next section.

Capture Variables

With regex patterns you can provide capture groups using ( and ). These groupings then create capture variables $1 $2 etc. for each of the capture groups that correspond to the characters in the source string that match that part of the pattern. There is also a $0 variable that captures the portion of the source string that matches the entire pattern. For example:

def str = 'This Example Text Is Not Complex'
str =~ /p.*(.ex.).*(N[a-z].)/ // true
$1 // Text
$2 // Not
$0 // ple Text Is Not

Note that if there are nested groups, the capture variables numbering is based on the number of ( so far encountered, irrespective of the nesting.

You can use \ to escape the ( and ) characters if you want to match against those characters.

Using capture variables with the g global modifier allows you to extract all parts of a string that match:

def data = 'AAPL=$151.03, MSFT=$255.29, GOOG=$94.02'
def stocks = [:]
while (data =~ /(\w*)=\$(\d+.\d+)/g) { stocks[$1] = $2 as Decimal }
stocks // [AAPL:151.03, MSFT:255.29, GOOG:94.02]

The n modifier will make the capture variable a number if the string captured is numeric. If the number has no decimal place then it will be converted to a long value (as long as the value is not too large). If it has a decimal point then it will become a Decimal value. For example:

'rate=-1234' =~ /(\w+)=([\d-]+)/n          // true
$1 // rate
$2 // -1234
$2 instanceof long // true
'rate=56.789' =~ /(\w+)=([\d-.]+)/n // true
$2 // 56.789
$2 instanceof Decimal // true

Regex Substitution

The =~ operator can also be used to perform regex substitutions. The syntax is s/pattern/replacement/modifiers. The pattern is the regex pattern to search for and the replacement is the string to replace it with. The modifiers are the same as for regex matching.

For example:

def str = 'This Example Text Is Not Complex'
str =~ s/Example/Simple/ // true
str // This Simple Text Is Not Complex

Note that the substitution is done in-place on the string variable. If you want to do a non-destructive substitution then use the r modifier:

def str = 'This Example Text Is Not Complex'
def newStr = str =~ s/Example/Simple/r // true
newStr // This Simple Text Is Not Complex
str // This Example Text Is Not Complex

The g modifier can be used to replace all occurrences of the pattern:

> def str = 'This Example Text Is Not Complex'
str =~ s/Ex/X/ig // true
str // This Xample TXt Is Not ComplX

The replacement string can contain capture variables $1, $2, etc. to refer to the captured parts of the string that matched the pattern:

def str = 'This Example Text Is Not Complex'
str =~ s/(\w+)\s+(\w+)/$2 $1/ // true
str // Example This Text Is Not Complex

The replacement string can also contain embedded expressions using ${}:

def str = 'This Example Text Is Not Complex'
str =~ s/(\w+)\s+(\w+)/${$2.toUpperCase()} ${$1.toLowerCase()}/
str // EXAMPLE this Text Is Not Complex

Implicit Matching

If the =~ operator is not used then the regex match or substitution is performed on the implicit it variable.

There are a couple of situations where an implicit it variable is created, with the most common example being in a closure without explicit parameters defined. For example, one way to find elements of a list matching a given pattern might be:

def list = ['abbc', 'xXyz', 'def']
list.filter{ elem -> elem =~ /bb|xx/i } // ['abbc', 'xXyz']

We can rewrite this using the implicit it parameter that will be declared for the closure passed to filter:

def list = ['abbc', 'xXyz', 'def']
list.filter{ /bb|xx/i } // ['abbc', 'xXyz']

This is also useful when using the -p or -n command line options if running Jactl command line scripts for processing input from files or stdin. Each line is assigned to the implicit it variable so we can use a substitution or pattern match without having to specify the variable name. For example, to replace all occurrences of abc with xyz in a file:

$ jactl -p -e 's/abc/xyz/g'

The r modifier can be used to force a regex match since without any modifiers, and when using implicit match against it, the compiler can't tell whether a regex string or an arithmetic division is wanted:

$ jactl -n -e '/abc/r and println it'

This will print all lines that contain abc.