1 8f7f2f4a 2021-12-17 jrmu 20 Regular Expressions
3 8f7f2f4a 2021-12-17 jrmu Regular expressions are the text processing workhorse of perl. With
4 8f7f2f4a 2021-12-17 jrmu regular expressions, you can search strings for patterns, find out what
5 8f7f2f4a 2021-12-17 jrmu matched the patterns, and substitute the matched patterns with new strings.
8 8f7f2f4a 2021-12-17 jrmu There are three different regular expression operators in perl:
10 8f7f2f4a 2021-12-17 jrmu 1.match m{PATTERN}
12 8f7f2f4a 2021-12-17 jrmu 2.substitute s{OLDPATTERN}{NEWPATTERN}
14 8f7f2f4a 2021-12-17 jrmu 3.transliterate tr{OLD_CHAR_SET}{NEW_CHAR_SET}
17 8f7f2f4a 2021-12-17 jrmu Perl allows any delimiter in these operators, such as {} or () or // or
18 8f7f2f4a 2021-12-17 jrmu ## or just about any character you wish to use. The most common
19 8f7f2f4a 2021-12-17 jrmu delimiter used is probably the m// and s/// delimiters, but I prefer to
20 8f7f2f4a 2021-12-17 jrmu use m{} and s{}{} because they are clearer for me. There are two ways to
21 8f7f2f4a 2021-12-17 jrmu "bind" these operators to a string expression:
24 8f7f2f4a 2021-12-17 jrmu 1.=~ pattern does match string expression
26 8f7f2f4a 2021-12-17 jrmu 2.!~ pattern does NOT match string expression
29 8f7f2f4a 2021-12-17 jrmu Binding can be thought of as "Object Oriented Programming" for regular
30 8f7f2f4a 2021-12-17 jrmu expressions. Generic OOP structure can be represented as
33 8f7f2f4a 2021-12-17 jrmu $subject -> verb ( adjectives, adverbs, etc );
36 8f7f2f4a 2021-12-17 jrmu Binding in Regular Expressions can be looked at in a similar fashion:
39 8f7f2f4a 2021-12-17 jrmu $string =~ verb ( pattern );
42 8f7f2f4a 2021-12-17 jrmu where "verb" is limited to 'm' for match, 's' for substitution, and 'tr'
43 8f7f2f4a 2021-12-17 jrmu for translate. You may see perl code that simply looks like this:
49 8f7f2f4a 2021-12-17 jrmu This is functionally equivalent to this:
52 8f7f2f4a 2021-12-17 jrmu $_ =~ m/patt/;
56 8f7f2f4a 2021-12-17 jrmu Here are some examples:
59 8f7f2f4a 2021-12-17 jrmu # spam filter
61 8f7f2f4a 2021-12-17 jrmu my $email = "This is a great Free Offer\n";
63 8f7f2f4a 2021-12-17 jrmu if($email =~ m{Free Offer})
65 8f7f2f4a 2021-12-17 jrmu {$email="*deleted spam*\n"; }
67 8f7f2f4a 2021-12-17 jrmu print "$email\n";
70 8f7f2f4a 2021-12-17 jrmu # upgrade my car
72 8f7f2f4a 2021-12-17 jrmu my $car = "my car is a toyota\n";
75 8f7f2f4a 2021-12-17 jrmu $car =~ s{toyota}{jaguar};
77 8f7f2f4a 2021-12-17 jrmu print "$car\n";
80 8f7f2f4a 2021-12-17 jrmu # simple encryption, Caesar cypher
82 8f7f2f4a 2021-12-17 jrmu my $love_letter = "How I love thee.\n";
84 8f7f2f4a 2021-12-17 jrmu $love_letter =~ tr{A-Za-z}{N-ZA-Mn-za-m};
86 8f7f2f4a 2021-12-17 jrmu print "encrypted: $love_letter";
89 8f7f2f4a 2021-12-17 jrmu $love_letter =~ tr{A-Za-z}{N-ZA-Mn-za-m};
91 8f7f2f4a 2021-12-17 jrmu print "decrypted: $love_letter\n";
94 8f7f2f4a 2021-12-17 jrmu > *deleted spam*
96 8f7f2f4a 2021-12-17 jrmu > my car is a jaguar
98 8f7f2f4a 2021-12-17 jrmu > encrypted: Ubj V ybir gurr.
101 8f7f2f4a 2021-12-17 jrmu > decrypted: How I love thee.
104 8f7f2f4a 2021-12-17 jrmu The above examples all look for fixed patterns within the string.
105 8f7f2f4a 2021-12-17 jrmu Regular expressions also allow you to look for patterns with different
106 8f7f2f4a 2021-12-17 jrmu types of "wildcards".
109 8f7f2f4a 2021-12-17 jrmu 20.1 Variable Interpolation
111 8f7f2f4a 2021-12-17 jrmu The braces that surround the pattern act as double-quote marks,
112 8f7f2f4a 2021-12-17 jrmu subjecting the pattern to one pass of variable interpolation as if the
113 8f7f2f4a 2021-12-17 jrmu pattern were contained in double-quotes. This allows the pattern to be
114 8f7f2f4a 2021-12-17 jrmu contained within variables and interpolated during the regular expression.
117 8f7f2f4a 2021-12-17 jrmu my $actual = "Toyota";
119 8f7f2f4a 2021-12-17 jrmu my $wanted = "Jaguar";
121 8f7f2f4a 2021-12-17 jrmu my $car = "My car is a Toyota\n";
123 8f7f2f4a 2021-12-17 jrmu $car =~ s{$actual}{$wanted};
125 8f7f2f4a 2021-12-17 jrmu print $car;
128 8f7f2f4a 2021-12-17 jrmu > My car is a Jaguar
131 8f7f2f4a 2021-12-17 jrmu 20.2 Wildcard Example
133 8f7f2f4a 2021-12-17 jrmu In the example below, we process an array of lines, each containing the
134 8f7f2f4a 2021-12-17 jrmu pattern {filename: } followed by one or more non-whitespace characters
135 8f7f2f4a 2021-12-17 jrmu forming the actual filename. Each line also contains the pattern {size:
136 8f7f2f4a 2021-12-17 jrmu } followed by one or more digits that indicate the actual size of that
140 8f7f2f4a 2021-12-17 jrmu my @lines = split "\n", <<"MARKER"
142 8f7f2f4a 2021-12-17 jrmu filename: output.txt size: 1024
144 8f7f2f4a 2021-12-17 jrmu filename: input.dat size: 512
146 8f7f2f4a 2021-12-17 jrmu filename: address.db size: 1048576
152 8f7f2f4a 2021-12-17 jrmu foreach my $line (@lines) {
154 8f7f2f4a 2021-12-17 jrmu ####################################
156 8f7f2f4a 2021-12-17 jrmu # \S is a wildcard meaning
158 8f7f2f4a 2021-12-17 jrmu # "anything that is not white-space".
160 8f7f2f4a 2021-12-17 jrmu # the "+" means "one or more"
162 8f7f2f4a 2021-12-17 jrmu ####################################
164 8f7f2f4a 2021-12-17 jrmu if($line =~ m{filename: (\S+)}) {
166 8f7f2f4a 2021-12-17 jrmu my $name = $1;
168 8f7f2f4a 2021-12-17 jrmu ###########################
170 8f7f2f4a 2021-12-17 jrmu # \d is a wildcard meaning
172 8f7f2f4a 2021-12-17 jrmu # "any digit, 0-9".
174 8f7f2f4a 2021-12-17 jrmu ###########################
177 8f7f2f4a 2021-12-17 jrmu $line =~ m{size: (\d+)};
179 8f7f2f4a 2021-12-17 jrmu my $size = $1;
181 8f7f2f4a 2021-12-17 jrmu print "$name,$size\n";
187 8f7f2f4a 2021-12-17 jrmu > output.txt,1024
189 8f7f2f4a 2021-12-17 jrmu > input.dat,512
191 8f7f2f4a 2021-12-17 jrmu > address.db,1048576
194 8f7f2f4a 2021-12-17 jrmu 20.3 Defining a Pattern
196 8f7f2f4a 2021-12-17 jrmu A pattern can be a literal pattern such as {Free Offer}. It can contain
197 8f7f2f4a 2021-12-17 jrmu wildcards such as {\d}. It can also contain metacharacters such as the
198 8f7f2f4a 2021-12-17 jrmu parenthesis. Notice in the above example, the parenthesis were in the
199 8f7f2f4a 2021-12-17 jrmu pattern but did not occur in the string, yet the pattern matched.
203 8f7f2f4a 2021-12-17 jrmu 20.4 Metacharacters
205 8f7f2f4a 2021-12-17 jrmu Metacharacters do not get interpreted as literal characters. Instead
206 8f7f2f4a 2021-12-17 jrmu they tell perl to interpret the metacharacter (and sometimes the
207 8f7f2f4a 2021-12-17 jrmu characters around metacharacter) in a different way. The following are
208 8f7f2f4a 2021-12-17 jrmu metacharacters in perl regular expression patterns:
211 8f7f2f4a 2021-12-17 jrmu \ | ( ) [ ] { } ^ $ * + ? .
218 8f7f2f4a 2021-12-17 jrmu (backslash) if next character combined with this backslash forms a
219 8f7f2f4a 2021-12-17 jrmu character class shortcut, then match that character class. If not a
220 8f7f2f4a 2021-12-17 jrmu shortcut, then simply treat next character as a non-metacharacter.
226 8f7f2f4a 2021-12-17 jrmu alternation: (patt1 | patt2) means (patt1 OR patt2)
233 8f7f2f4a 2021-12-17 jrmu grouping (clustering) and capturing
239 8f7f2f4a 2021-12-17 jrmu grouping (clustering) only. no capturing. (somewhat faster)
245 8f7f2f4a 2021-12-17 jrmu match any single character (usually not "\n")
251 8f7f2f4a 2021-12-17 jrmu define a character class, match any single character in class
258 8f7f2f4a 2021-12-17 jrmu (quantifier): match previous item zero or more times
264 8f7f2f4a 2021-12-17 jrmu (quantifier): match previous item one or more times
270 8f7f2f4a 2021-12-17 jrmu (quantifier): match previous item zero or one time
276 8f7f2f4a 2021-12-17 jrmu (quantifier): match previous item a number of times in given range
283 8f7f2f4a 2021-12-17 jrmu (position marker): beginning of string (or possibly after "\n")
289 8f7f2f4a 2021-12-17 jrmu (position marker): end of string (or possibly before "\n")
294 8f7f2f4a 2021-12-17 jrmu Examples below. Change the value assigned to $str and re-run the script.
295 8f7f2f4a 2021-12-17 jrmu Experiment with what matches and what does not match the different
296 8f7f2f4a 2021-12-17 jrmu regular expression patterns.
299 8f7f2f4a 2021-12-17 jrmu my $str = "Dear sir, hello and goodday! "
301 8f7f2f4a 2021-12-17 jrmu ." dogs and cats and sssnakes put me to sleep."
303 8f7f2f4a 2021-12-17 jrmu ." zzzz. Hummingbirds are ffffast. "
306 8f7f2f4a 2021-12-17 jrmu ." Sincerely, John";
309 8f7f2f4a 2021-12-17 jrmu # | alternation
311 8f7f2f4a 2021-12-17 jrmu # match "hello" or "goodbye"
313 8f7f2f4a 2021-12-17 jrmu if($str =~ m{hello|goodbye}){warn "alt";}
316 8f7f2f4a 2021-12-17 jrmu # () grouping and capturing
318 8f7f2f4a 2021-12-17 jrmu # match 'goodday' or 'goodbye'
320 8f7f2f4a 2021-12-17 jrmu if($str =~ m{(good(day|bye))})
322 8f7f2f4a 2021-12-17 jrmu {warn "group matched, captured '$1'";}
325 8f7f2f4a 2021-12-17 jrmu # . any single character
327 8f7f2f4a 2021-12-17 jrmu # match 'cat' 'cbt' 'cct' 'c%t' 'c+t' 'c?t' ...
329 8f7f2f4a 2021-12-17 jrmu if($str =~ m{c.t}){warn "period";}
333 8f7f2f4a 2021-12-17 jrmu # [] define a character class: 'a' or 'o' or 'u'
335 8f7f2f4a 2021-12-17 jrmu # match 'cat' 'cot' 'cut'
337 8f7f2f4a 2021-12-17 jrmu if($str =~ m{c[aou]t}){warn "class";}
340 8f7f2f4a 2021-12-17 jrmu # * quantifier, match previous item zero or more
342 8f7f2f4a 2021-12-17 jrmu # match '' or 'z' or 'zz' or 'zzz' or 'zzzzzzzz'
344 8f7f2f4a 2021-12-17 jrmu if($str =~ m{z*}){warn "asterisk";}
347 8f7f2f4a 2021-12-17 jrmu # + quantifier, match previous item one or more
349 8f7f2f4a 2021-12-17 jrmu # match 'snake' 'ssnake' 'sssssssnake'
351 8f7f2f4a 2021-12-17 jrmu if($str =~ m{s+nake}){warn "plus sign";}
354 8f7f2f4a 2021-12-17 jrmu # ? quantifier, previous item is optional
356 8f7f2f4a 2021-12-17 jrmu # match only 'dog' and 'dogs'
359 8f7f2f4a 2021-12-17 jrmu if($str =~ m{dogs?}){warn "question";}
362 8f7f2f4a 2021-12-17 jrmu # {} quantifier, match previous, 3 <= qty <= 5
364 8f7f2f4a 2021-12-17 jrmu # match only 'fffast', 'ffffast', and 'fffffast'
366 8f7f2f4a 2021-12-17 jrmu if($str =~ m{f{3,5}ast}){warn "curly brace";}
369 8f7f2f4a 2021-12-17 jrmu # ^ position marker, matches beginning of string
371 8f7f2f4a 2021-12-17 jrmu # match 'Dear' only if it occurs at start of string
373 8f7f2f4a 2021-12-17 jrmu if($str =~ m{^Dear}){warn "caret";}
376 8f7f2f4a 2021-12-17 jrmu # $ position marker, matches end of string
378 8f7f2f4a 2021-12-17 jrmu # match 'John' only if it occurs at end of string
380 8f7f2f4a 2021-12-17 jrmu if($str =~ m{John$}){warn "dollar";}
383 8f7f2f4a 2021-12-17 jrmu > alt at ...
385 8f7f2f4a 2021-12-17 jrmu > group matched, captured 'goodday' at ...
387 8f7f2f4a 2021-12-17 jrmu > period at ...
389 8f7f2f4a 2021-12-17 jrmu > class at ...
391 8f7f2f4a 2021-12-17 jrmu > asterisk at ...
393 8f7f2f4a 2021-12-17 jrmu > plus sign at ...
395 8f7f2f4a 2021-12-17 jrmu > question at ...
397 8f7f2f4a 2021-12-17 jrmu > curly brace at ...
399 8f7f2f4a 2021-12-17 jrmu > caret at ...
401 8f7f2f4a 2021-12-17 jrmu > dollar at ...
404 8f7f2f4a 2021-12-17 jrmu 20.5 Capturing and Clustering Parenthesis
406 8f7f2f4a 2021-12-17 jrmu Normal parentheses will both cluster and capture the pattern they
407 8f7f2f4a 2021-12-17 jrmu contain. Clustering affects the order of evaluation similar to the way
408 8f7f2f4a 2021-12-17 jrmu parentheses affect the order of evaluation within a mathematical
409 8f7f2f4a 2021-12-17 jrmu expression. Normally, multiplication has a higher precedence than
410 8f7f2f4a 2021-12-17 jrmu addition. The expression "2 + 3 * 4" does the multiplication first and
411 8f7f2f4a 2021-12-17 jrmu then the addition, yielding the result of "14". The expression "(2 + 3)
412 8f7f2f4a 2021-12-17 jrmu * 4" forces the addition to occur first, yielding the result of "20".
415 8f7f2f4a 2021-12-17 jrmu Clustering parentheses work in the same fashion. The pattern {cats?}
416 8f7f2f4a 2021-12-17 jrmu will apply the "?" quantifier to the letter "s", matching either "cat"
417 8f7f2f4a 2021-12-17 jrmu or "cats". The pattern {(cats)?} will apply the "?" quantifier to the
418 8f7f2f4a 2021-12-17 jrmu entire pattern within the parentheses, matching "cats" or null string.
421 8f7f2f4a 2021-12-17 jrmu 20.5.1 $1, $2, $3, etc Capturing parentheses
423 8f7f2f4a 2021-12-17 jrmu Clustering parentheses will also Capture the part of the string that
424 8f7f2f4a 2021-12-17 jrmu matched the pattern within parentheses. The captured values are
425 8f7f2f4a 2021-12-17 jrmu accessible through some "magical" variables called $1, $2, $3, ... Each
426 8f7f2f4a 2021-12-17 jrmu left parenthesis increments the number used to access the captured
427 8f7f2f4a 2021-12-17 jrmu string. The left parenthesis are counted from left to right as they
428 8f7f2f4a 2021-12-17 jrmu occur within the pattern, starting at 1.
432 8f7f2f4a 2021-12-17 jrmu my $test="Firstname: John Lastname: Smith";
434 8f7f2f4a 2021-12-17 jrmu ############################################
438 8f7f2f4a 2021-12-17 jrmu $test=~m{Firstname: (\w+) Lastname: (\w+)};
440 8f7f2f4a 2021-12-17 jrmu my $first = $1;
442 8f7f2f4a 2021-12-17 jrmu my $last = $2;
444 8f7f2f4a 2021-12-17 jrmu print "Hello, $first $last\n";
447 8f7f2f4a 2021-12-17 jrmu > Hello, John Smith
452 8f7f2f4a 2021-12-17 jrmu Because capturing takes a little extra time to store the captured result
453 8f7f2f4a 2021-12-17 jrmu into the $1, $2, <85> variables, sometimes you just want to cluster without
454 8f7f2f4a 2021-12-17 jrmu the overhead of capturing. In the below example, we want to cluster
455 8f7f2f4a 2021-12-17 jrmu "day|bye" so that the alternation symbol "|" will go with "day" or
456 8f7f2f4a 2021-12-17 jrmu "bye". Without the clustering parenthesis, the pattern would match
457 8f7f2f4a 2021-12-17 jrmu "goodday" or "bye", rather than "goodday" or "goodbye". The pattern
458 8f7f2f4a 2021-12-17 jrmu contains capturing parens around the entire pattern, so we do not need
459 8f7f2f4a 2021-12-17 jrmu to capture the "day|bye" part of the pattern, therefore we use
460 8f7f2f4a 2021-12-17 jrmu cluster-only parentheses.
463 8f7f2f4a 2021-12-17 jrmu if($str =~ m{(good(?:day|bye))})
465 8f7f2f4a 2021-12-17 jrmu {warn "group matched, captured '$1'";}
469 8f7f2f4a 2021-12-17 jrmu Cluster-only parenthesis don't capture the enclosed pattern, and they
470 8f7f2f4a 2021-12-17 jrmu don't count when determining which magic variable, $1, $2, $3 ..., will
471 8f7f2f4a 2021-12-17 jrmu contain the values from the
473 8f7f2f4a 2021-12-17 jrmu capturing parentheses.
476 8f7f2f4a 2021-12-17 jrmu my $test = 'goodday John';
478 8f7f2f4a 2021-12-17 jrmu ##########################################
482 8f7f2f4a 2021-12-17 jrmu if($test =~ m{(good(?:day|bye)) (\w+)})
484 8f7f2f4a 2021-12-17 jrmu { print "You said $1 to $2\n"; }
487 8f7f2f4a 2021-12-17 jrmu > You said goodday to John
490 8f7f2f4a 2021-12-17 jrmu 20.5.2 Capturing parentheses not capturing
492 8f7f2f4a 2021-12-17 jrmu If a regular expression containing capturing parentheses does not match
493 8f7f2f4a 2021-12-17 jrmu the string, the magic variables $1, $2, $3, etc will retain whatever
494 8f7f2f4a 2021-12-17 jrmu PREVIOUS value they had from any PREVIOUS regular expression. This means
495 8f7f2f4a 2021-12-17 jrmu that you MUST check to make sure the regular expression matches BEFORE
496 8f7f2f4a 2021-12-17 jrmu you use the $1, $2, $3, etc variables.
500 8f7f2f4a 2021-12-17 jrmu In the example below, the second regular expression does not match,
501 8f7f2f4a 2021-12-17 jrmu therefore $1 retains its old value of 'be'. Instead of printing out
502 8f7f2f4a 2021-12-17 jrmu something like "Name is Horatio" or "Name is" and failing on an
503 8f7f2f4a 2021-12-17 jrmu undefined value, perl instead keeps the old value for $1 and prints
504 8f7f2f4a 2021-12-17 jrmu "Name is 'be'", instead.
507 8f7f2f4a 2021-12-17 jrmu my $string1 = 'To be, or not to be';
509 8f7f2f4a 2021-12-17 jrmu $string1 =~ m{not to (\w+)}; # matches, $1='be'
511 8f7f2f4a 2021-12-17 jrmu warn "The question is to $1";
514 8f7f2f4a 2021-12-17 jrmu my $string2 = 'that is the question';
516 8f7f2f4a 2021-12-17 jrmu $string2 =~ m{I knew him once, (\w+)}; # no match
518 8f7f2f4a 2021-12-17 jrmu warn "Name is '$1'";
520 8f7f2f4a 2021-12-17 jrmu # no match, so $1 retains its old value 'be'
523 8f7f2f4a 2021-12-17 jrmu > The question is to be at ./script.pl line 7.
526 8f7f2f4a 2021-12-17 jrmu > Name is 'be' at ./script.pl line 11.
529 8f7f2f4a 2021-12-17 jrmu 20.6 Character Classes
531 8f7f2f4a 2021-12-17 jrmu The "." metacharacter will match any single character. This is
532 8f7f2f4a 2021-12-17 jrmu equivalent to a character class that includes every possible character.
533 8f7f2f4a 2021-12-17 jrmu You can easily define smaller character classes of your own using the
534 8f7f2f4a 2021-12-17 jrmu square brackets []. Whatever characters are listed within the square
535 8f7f2f4a 2021-12-17 jrmu brackets are part of that character class. Perl will then match any one
536 8f7f2f4a 2021-12-17 jrmu character within that class.
539 8f7f2f4a 2021-12-17 jrmu [aeiouAEIOU] any vowel
541 8f7f2f4a 2021-12-17 jrmu [0123456789] any digit
544 8f7f2f4a 2021-12-17 jrmu 20.6.1 Metacharacters Within Character Classes
546 8f7f2f4a 2021-12-17 jrmu Within the square brackets used to define a character class, all
547 8f7f2f4a 2021-12-17 jrmu previously defined metacharacters cease to act as metacharacters and are
548 8f7f2f4a 2021-12-17 jrmu interpreted as simple literal characters. Characters classes have their
549 8f7f2f4a 2021-12-17 jrmu own special metacharacters.
555 8f7f2f4a 2021-12-17 jrmu (backslash) demeta the next character
561 8f7f2f4a 2021-12-17 jrmu (hyphen) Indicates a consecutive character range, inclusively.
563 8f7f2f4a 2021-12-17 jrmu [a-f] indicates the letters a,b,c,d,e,f.
565 8f7f2f4a 2021-12-17 jrmu Character ranges are based off of ASCII numeric values.
571 8f7f2f4a 2021-12-17 jrmu If it is the first character of the class, then this indicates the class
573 8f7f2f4a 2021-12-17 jrmu is any character EXCEPT the ones in the square brackets.
575 8f7f2f4a 2021-12-17 jrmu Warning: [^aeiou] means anything but a lower case vowel. This
578 8f7f2f4a 2021-12-17 jrmu is not the same as "any consonant". The class [^aeiou] will
580 8f7f2f4a 2021-12-17 jrmu match punctuation, numbers, and unicode characters.
583 8f7f2f4a 2021-12-17 jrmu 20.7 Shortcut Character Classes
585 8f7f2f4a 2021-12-17 jrmu Perl has shortcut character classes for some more common classes.
588 8f7f2f4a 2021-12-17 jrmu /*shortcut*/
596 8f7f2f4a 2021-12-17 jrmu /*description*/
606 8f7f2f4a 2021-12-17 jrmu any *d*igit
616 8f7f2f4a 2021-12-17 jrmu any NON-digit
622 8f7f2f4a 2021-12-17 jrmu [ \t\n\r\f]
626 8f7f2f4a 2021-12-17 jrmu any white*s*pace
633 8f7f2f4a 2021-12-17 jrmu [^ \t\n\r\f]
637 8f7f2f4a 2021-12-17 jrmu any NON-whitespace
643 8f7f2f4a 2021-12-17 jrmu [a-zA-Z0-9_]
647 8f7f2f4a 2021-12-17 jrmu any *w*ord character (valid perl identifier)
652 8f7f2f4a 2021-12-17 jrmu [^a-zA-Z0-9_]
656 8f7f2f4a 2021-12-17 jrmu any NON-word character
659 8f7f2f4a 2021-12-17 jrmu 20.8 Greedy (Maximal) Quantifiers
661 8f7f2f4a 2021-12-17 jrmu Quantifiers are used within regular expressions to indicate how many
662 8f7f2f4a 2021-12-17 jrmu times the previous item occurs within the pattern. By default,
663 8f7f2f4a 2021-12-17 jrmu quantifiers are "greedy" or "maximal", meaning that they will match as
664 8f7f2f4a 2021-12-17 jrmu many characters as possible and still be true.
671 8f7f2f4a 2021-12-17 jrmu match zero or more times (match as much as possible)
678 8f7f2f4a 2021-12-17 jrmu match one or more times (match as much as possible)
684 8f7f2f4a 2021-12-17 jrmu match zero or one times (match as much as possible)
690 8f7f2f4a 2021-12-17 jrmu match exactly "count" times
696 8f7f2f4a 2021-12-17 jrmu match at least "min" times (match as much as possible)
702 8f7f2f4a 2021-12-17 jrmu match at least "min" and at most "max" times
704 8f7f2f4a 2021-12-17 jrmu *(match as much as possible)*
708 8f7f2f4a 2021-12-17 jrmu 20.10 Position Assertions / Position Anchors
710 8f7f2f4a 2021-12-17 jrmu Inside a regular expression pattern, some symbols do not translate into
711 8f7f2f4a 2021-12-17 jrmu a character or character class. Instead, they translate into a
712 8f7f2f4a 2021-12-17 jrmu "position" within the string. If a position anchor occurs within a
713 8f7f2f4a 2021-12-17 jrmu pattern, the pattern before and after that anchor must occur within a
714 8f7f2f4a 2021-12-17 jrmu certain position within the string.
721 8f7f2f4a 2021-12-17 jrmu Matches the beginning of the string.
723 8f7f2f4a 2021-12-17 jrmu If the /m (multiline) modifier is present, matches "\n" also.
729 8f7f2f4a 2021-12-17 jrmu Matches the end of the string.
731 8f7f2f4a 2021-12-17 jrmu If the /m (multiline) modifier is present, matches "\n" also.
737 8f7f2f4a 2021-12-17 jrmu Match the beginning of string only. Not affected by /m modifier.
743 8f7f2f4a 2021-12-17 jrmu Match the end of string only. Not affected by /m modifier.
749 8f7f2f4a 2021-12-17 jrmu Matches the end of the string only, but will chomp() a "\n" if that
751 8f7f2f4a 2021-12-17 jrmu was the last character in string.
755 8f7f2f4a 2021-12-17 jrmu word "b"oundary
757 8f7f2f4a 2021-12-17 jrmu A word boundary occurs in four places.
759 8f7f2f4a 2021-12-17 jrmu 1) at a transition from a \w character to a \W character
761 8f7f2f4a 2021-12-17 jrmu 2) at a transition from a \W character to a \w character
763 8f7f2f4a 2021-12-17 jrmu 3) at the beginning of the string
765 8f7f2f4a 2021-12-17 jrmu 4) at the end of the string
776 8f7f2f4a 2021-12-17 jrmu usually used with /g modifier (probably want /c modifier too).
778 8f7f2f4a 2021-12-17 jrmu Indicates the position after the character of the last pattern match
779 8f7f2f4a 2021-12-17 jrmu performed on the string. If this is the first regular expression begin
781 8f7f2f4a 2021-12-17 jrmu performed on the string then \G will match the beginning of the
783 8f7f2f4a 2021-12-17 jrmu string. Use the pos() function to get and set the current \G position
785 8f7f2f4a 2021-12-17 jrmu within the string.
788 8f7f2f4a 2021-12-17 jrmu 20.10.1 The \b Anchor
790 8f7f2f4a 2021-12-17 jrmu Use the \b anchor when you want to match a whole word pattern but not
791 8f7f2f4a 2021-12-17 jrmu part of a word. This example matches "jump" but not "jumprope":
794 8f7f2f4a 2021-12-17 jrmu my $test1='He can jump very high.';
796 8f7f2f4a 2021-12-17 jrmu if($test1=~m{\bjump\b})
798 8f7f2f4a 2021-12-17 jrmu { print "test1 matches\n"; }