Blame


1 8f7f2f4a 2021-12-17 jrmu 20 Regular Expressions
2 8f7f2f4a 2021-12-17 jrmu
3 8f7f2f4a 2021-12-17 jrmu Regular expressions are the text processing workhorse of perl. With
4 8f7f2f4a 2021-12-17 jrmu regular expressions, you can search strings for patterns, find out what
5 8f7f2f4a 2021-12-17 jrmu matched the patterns, and substitute the matched patterns with new strings.
6 8f7f2f4a 2021-12-17 jrmu
7 8f7f2f4a 2021-12-17 jrmu
8 8f7f2f4a 2021-12-17 jrmu There are three different regular expression operators in perl:
9 8f7f2f4a 2021-12-17 jrmu
10 8f7f2f4a 2021-12-17 jrmu 1.match m{PATTERN}
11 8f7f2f4a 2021-12-17 jrmu
12 8f7f2f4a 2021-12-17 jrmu 2.substitute s{OLDPATTERN}{NEWPATTERN}
13 8f7f2f4a 2021-12-17 jrmu
14 8f7f2f4a 2021-12-17 jrmu 3.transliterate tr{OLD_CHAR_SET}{NEW_CHAR_SET}
15 8f7f2f4a 2021-12-17 jrmu
16 8f7f2f4a 2021-12-17 jrmu
17 8f7f2f4a 2021-12-17 jrmu Perl allows any delimiter in these operators, such as {} or () or // or
18 8f7f2f4a 2021-12-17 jrmu ## or just about any character you wish to use. The most common
19 8f7f2f4a 2021-12-17 jrmu delimiter used is probably the m// and s/// delimiters, but I prefer to
20 8f7f2f4a 2021-12-17 jrmu use m{} and s{}{} because they are clearer for me. There are two ways to
21 8f7f2f4a 2021-12-17 jrmu "bind" these operators to a string expression:
22 8f7f2f4a 2021-12-17 jrmu
23 8f7f2f4a 2021-12-17 jrmu
24 8f7f2f4a 2021-12-17 jrmu 1.=~ pattern does match string expression
25 8f7f2f4a 2021-12-17 jrmu
26 8f7f2f4a 2021-12-17 jrmu 2.!~ pattern does NOT match string expression
27 8f7f2f4a 2021-12-17 jrmu
28 8f7f2f4a 2021-12-17 jrmu
29 8f7f2f4a 2021-12-17 jrmu Binding can be thought of as "Object Oriented Programming" for regular
30 8f7f2f4a 2021-12-17 jrmu expressions. Generic OOP structure can be represented as
31 8f7f2f4a 2021-12-17 jrmu
32 8f7f2f4a 2021-12-17 jrmu
33 8f7f2f4a 2021-12-17 jrmu $subject -> verb ( adjectives, adverbs, etc );
34 8f7f2f4a 2021-12-17 jrmu
35 8f7f2f4a 2021-12-17 jrmu
36 8f7f2f4a 2021-12-17 jrmu Binding in Regular Expressions can be looked at in a similar fashion:
37 8f7f2f4a 2021-12-17 jrmu
38 8f7f2f4a 2021-12-17 jrmu
39 8f7f2f4a 2021-12-17 jrmu $string =~ verb ( pattern );
40 8f7f2f4a 2021-12-17 jrmu
41 8f7f2f4a 2021-12-17 jrmu
42 8f7f2f4a 2021-12-17 jrmu where "verb" is limited to 'm' for match, 's' for substitution, and 'tr'
43 8f7f2f4a 2021-12-17 jrmu for translate. You may see perl code that simply looks like this:
44 8f7f2f4a 2021-12-17 jrmu
45 8f7f2f4a 2021-12-17 jrmu
46 8f7f2f4a 2021-12-17 jrmu /patt/;
47 8f7f2f4a 2021-12-17 jrmu
48 8f7f2f4a 2021-12-17 jrmu
49 8f7f2f4a 2021-12-17 jrmu This is functionally equivalent to this:
50 8f7f2f4a 2021-12-17 jrmu
51 8f7f2f4a 2021-12-17 jrmu
52 8f7f2f4a 2021-12-17 jrmu $_ =~ m/patt/;
53 8f7f2f4a 2021-12-17 jrmu
54 8f7f2f4a 2021-12-17 jrmu
55 8f7f2f4a 2021-12-17 jrmu
56 8f7f2f4a 2021-12-17 jrmu Here are some examples:
57 8f7f2f4a 2021-12-17 jrmu
58 8f7f2f4a 2021-12-17 jrmu
59 8f7f2f4a 2021-12-17 jrmu # spam filter
60 8f7f2f4a 2021-12-17 jrmu
61 8f7f2f4a 2021-12-17 jrmu my $email = "This is a great Free Offer\n";
62 8f7f2f4a 2021-12-17 jrmu
63 8f7f2f4a 2021-12-17 jrmu if($email =~ m{Free Offer})
64 8f7f2f4a 2021-12-17 jrmu
65 8f7f2f4a 2021-12-17 jrmu {$email="*deleted spam*\n"; }
66 8f7f2f4a 2021-12-17 jrmu
67 8f7f2f4a 2021-12-17 jrmu print "$email\n";
68 8f7f2f4a 2021-12-17 jrmu
69 8f7f2f4a 2021-12-17 jrmu
70 8f7f2f4a 2021-12-17 jrmu # upgrade my car
71 8f7f2f4a 2021-12-17 jrmu
72 8f7f2f4a 2021-12-17 jrmu my $car = "my car is a toyota\n";
73 8f7f2f4a 2021-12-17 jrmu
74 8f7f2f4a 2021-12-17 jrmu
75 8f7f2f4a 2021-12-17 jrmu $car =~ s{toyota}{jaguar};
76 8f7f2f4a 2021-12-17 jrmu
77 8f7f2f4a 2021-12-17 jrmu print "$car\n";
78 8f7f2f4a 2021-12-17 jrmu
79 8f7f2f4a 2021-12-17 jrmu
80 8f7f2f4a 2021-12-17 jrmu # simple encryption, Caesar cypher
81 8f7f2f4a 2021-12-17 jrmu
82 8f7f2f4a 2021-12-17 jrmu my $love_letter = "How I love thee.\n";
83 8f7f2f4a 2021-12-17 jrmu
84 8f7f2f4a 2021-12-17 jrmu $love_letter =~ tr{A-Za-z}{N-ZA-Mn-za-m};
85 8f7f2f4a 2021-12-17 jrmu
86 8f7f2f4a 2021-12-17 jrmu print "encrypted: $love_letter";
87 8f7f2f4a 2021-12-17 jrmu
88 8f7f2f4a 2021-12-17 jrmu
89 8f7f2f4a 2021-12-17 jrmu $love_letter =~ tr{A-Za-z}{N-ZA-Mn-za-m};
90 8f7f2f4a 2021-12-17 jrmu
91 8f7f2f4a 2021-12-17 jrmu print "decrypted: $love_letter\n";
92 8f7f2f4a 2021-12-17 jrmu
93 8f7f2f4a 2021-12-17 jrmu
94 8f7f2f4a 2021-12-17 jrmu > *deleted spam*
95 8f7f2f4a 2021-12-17 jrmu
96 8f7f2f4a 2021-12-17 jrmu > my car is a jaguar
97 8f7f2f4a 2021-12-17 jrmu
98 8f7f2f4a 2021-12-17 jrmu > encrypted: Ubj V ybir gurr.
99 8f7f2f4a 2021-12-17 jrmu
100 8f7f2f4a 2021-12-17 jrmu
101 8f7f2f4a 2021-12-17 jrmu > decrypted: How I love thee.
102 8f7f2f4a 2021-12-17 jrmu
103 8f7f2f4a 2021-12-17 jrmu
104 8f7f2f4a 2021-12-17 jrmu The above examples all look for fixed patterns within the string.
105 8f7f2f4a 2021-12-17 jrmu Regular expressions also allow you to look for patterns with different
106 8f7f2f4a 2021-12-17 jrmu types of "wildcards".
107 8f7f2f4a 2021-12-17 jrmu
108 8f7f2f4a 2021-12-17 jrmu
109 8f7f2f4a 2021-12-17 jrmu 20.1 Variable Interpolation
110 8f7f2f4a 2021-12-17 jrmu
111 8f7f2f4a 2021-12-17 jrmu The braces that surround the pattern act as double-quote marks,
112 8f7f2f4a 2021-12-17 jrmu subjecting the pattern to one pass of variable interpolation as if the
113 8f7f2f4a 2021-12-17 jrmu pattern were contained in double-quotes. This allows the pattern to be
114 8f7f2f4a 2021-12-17 jrmu contained within variables and interpolated during the regular expression.
115 8f7f2f4a 2021-12-17 jrmu
116 8f7f2f4a 2021-12-17 jrmu
117 8f7f2f4a 2021-12-17 jrmu my $actual = "Toyota";
118 8f7f2f4a 2021-12-17 jrmu
119 8f7f2f4a 2021-12-17 jrmu my $wanted = "Jaguar";
120 8f7f2f4a 2021-12-17 jrmu
121 8f7f2f4a 2021-12-17 jrmu my $car = "My car is a Toyota\n";
122 8f7f2f4a 2021-12-17 jrmu
123 8f7f2f4a 2021-12-17 jrmu $car =~ s{$actual}{$wanted};
124 8f7f2f4a 2021-12-17 jrmu
125 8f7f2f4a 2021-12-17 jrmu print $car;
126 8f7f2f4a 2021-12-17 jrmu
127 8f7f2f4a 2021-12-17 jrmu
128 8f7f2f4a 2021-12-17 jrmu > My car is a Jaguar
129 8f7f2f4a 2021-12-17 jrmu
130 8f7f2f4a 2021-12-17 jrmu
131 8f7f2f4a 2021-12-17 jrmu 20.2 Wildcard Example
132 8f7f2f4a 2021-12-17 jrmu
133 8f7f2f4a 2021-12-17 jrmu In the example below, we process an array of lines, each containing the
134 8f7f2f4a 2021-12-17 jrmu pattern {filename: } followed by one or more non-whitespace characters
135 8f7f2f4a 2021-12-17 jrmu forming the actual filename. Each line also contains the pattern {size:
136 8f7f2f4a 2021-12-17 jrmu } followed by one or more digits that indicate the actual size of that
137 8f7f2f4a 2021-12-17 jrmu file.
138 8f7f2f4a 2021-12-17 jrmu
139 8f7f2f4a 2021-12-17 jrmu
140 8f7f2f4a 2021-12-17 jrmu my @lines = split "\n", <<"MARKER"
141 8f7f2f4a 2021-12-17 jrmu
142 8f7f2f4a 2021-12-17 jrmu filename: output.txt size: 1024
143 8f7f2f4a 2021-12-17 jrmu
144 8f7f2f4a 2021-12-17 jrmu filename: input.dat size: 512
145 8f7f2f4a 2021-12-17 jrmu
146 8f7f2f4a 2021-12-17 jrmu filename: address.db size: 1048576
147 8f7f2f4a 2021-12-17 jrmu
148 8f7f2f4a 2021-12-17 jrmu MARKER
149 8f7f2f4a 2021-12-17 jrmu
150 8f7f2f4a 2021-12-17 jrmu ;
151 8f7f2f4a 2021-12-17 jrmu
152 8f7f2f4a 2021-12-17 jrmu foreach my $line (@lines) {
153 8f7f2f4a 2021-12-17 jrmu
154 8f7f2f4a 2021-12-17 jrmu ####################################
155 8f7f2f4a 2021-12-17 jrmu
156 8f7f2f4a 2021-12-17 jrmu # \S is a wildcard meaning
157 8f7f2f4a 2021-12-17 jrmu
158 8f7f2f4a 2021-12-17 jrmu # "anything that is not white-space".
159 8f7f2f4a 2021-12-17 jrmu
160 8f7f2f4a 2021-12-17 jrmu # the "+" means "one or more"
161 8f7f2f4a 2021-12-17 jrmu
162 8f7f2f4a 2021-12-17 jrmu ####################################
163 8f7f2f4a 2021-12-17 jrmu
164 8f7f2f4a 2021-12-17 jrmu if($line =~ m{filename: (\S+)}) {
165 8f7f2f4a 2021-12-17 jrmu
166 8f7f2f4a 2021-12-17 jrmu my $name = $1;
167 8f7f2f4a 2021-12-17 jrmu
168 8f7f2f4a 2021-12-17 jrmu ###########################
169 8f7f2f4a 2021-12-17 jrmu
170 8f7f2f4a 2021-12-17 jrmu # \d is a wildcard meaning
171 8f7f2f4a 2021-12-17 jrmu
172 8f7f2f4a 2021-12-17 jrmu # "any digit, 0-9".
173 8f7f2f4a 2021-12-17 jrmu
174 8f7f2f4a 2021-12-17 jrmu ###########################
175 8f7f2f4a 2021-12-17 jrmu
176 8f7f2f4a 2021-12-17 jrmu
177 8f7f2f4a 2021-12-17 jrmu $line =~ m{size: (\d+)};
178 8f7f2f4a 2021-12-17 jrmu
179 8f7f2f4a 2021-12-17 jrmu my $size = $1;
180 8f7f2f4a 2021-12-17 jrmu
181 8f7f2f4a 2021-12-17 jrmu print "$name,$size\n";
182 8f7f2f4a 2021-12-17 jrmu
183 8f7f2f4a 2021-12-17 jrmu }
184 8f7f2f4a 2021-12-17 jrmu
185 8f7f2f4a 2021-12-17 jrmu }
186 8f7f2f4a 2021-12-17 jrmu
187 8f7f2f4a 2021-12-17 jrmu > output.txt,1024
188 8f7f2f4a 2021-12-17 jrmu
189 8f7f2f4a 2021-12-17 jrmu > input.dat,512
190 8f7f2f4a 2021-12-17 jrmu
191 8f7f2f4a 2021-12-17 jrmu > address.db,1048576
192 8f7f2f4a 2021-12-17 jrmu
193 8f7f2f4a 2021-12-17 jrmu
194 8f7f2f4a 2021-12-17 jrmu 20.3 Defining a Pattern
195 8f7f2f4a 2021-12-17 jrmu
196 8f7f2f4a 2021-12-17 jrmu A pattern can be a literal pattern such as {Free Offer}. It can contain
197 8f7f2f4a 2021-12-17 jrmu wildcards such as {\d}. It can also contain metacharacters such as the
198 8f7f2f4a 2021-12-17 jrmu parenthesis. Notice in the above example, the parenthesis were in the
199 8f7f2f4a 2021-12-17 jrmu pattern but did not occur in the string, yet the pattern matched.
200 8f7f2f4a 2021-12-17 jrmu
201 8f7f2f4a 2021-12-17 jrmu
202 8f7f2f4a 2021-12-17 jrmu
203 8f7f2f4a 2021-12-17 jrmu 20.4 Metacharacters
204 8f7f2f4a 2021-12-17 jrmu
205 8f7f2f4a 2021-12-17 jrmu Metacharacters do not get interpreted as literal characters. Instead
206 8f7f2f4a 2021-12-17 jrmu they tell perl to interpret the metacharacter (and sometimes the
207 8f7f2f4a 2021-12-17 jrmu characters around metacharacter) in a different way. The following are
208 8f7f2f4a 2021-12-17 jrmu metacharacters in perl regular expression patterns:
209 8f7f2f4a 2021-12-17 jrmu
210 8f7f2f4a 2021-12-17 jrmu
211 8f7f2f4a 2021-12-17 jrmu \ | ( ) [ ] { } ^ $ * + ? .
212 8f7f2f4a 2021-12-17 jrmu
213 8f7f2f4a 2021-12-17 jrmu
214 8f7f2f4a 2021-12-17 jrmu \
215 8f7f2f4a 2021-12-17 jrmu
216 8f7f2f4a 2021-12-17 jrmu
217 8f7f2f4a 2021-12-17 jrmu
218 8f7f2f4a 2021-12-17 jrmu (backslash) if next character combined with this backslash forms a
219 8f7f2f4a 2021-12-17 jrmu character class shortcut, then match that character class. If not a
220 8f7f2f4a 2021-12-17 jrmu shortcut, then simply treat next character as a non-metacharacter.
221 8f7f2f4a 2021-12-17 jrmu
222 8f7f2f4a 2021-12-17 jrmu |
223 8f7f2f4a 2021-12-17 jrmu
224 8f7f2f4a 2021-12-17 jrmu
225 8f7f2f4a 2021-12-17 jrmu
226 8f7f2f4a 2021-12-17 jrmu alternation: (patt1 | patt2) means (patt1 OR patt2)
227 8f7f2f4a 2021-12-17 jrmu
228 8f7f2f4a 2021-12-17 jrmu
229 8f7f2f4a 2021-12-17 jrmu ( )
230 8f7f2f4a 2021-12-17 jrmu
231 8f7f2f4a 2021-12-17 jrmu
232 8f7f2f4a 2021-12-17 jrmu
233 8f7f2f4a 2021-12-17 jrmu grouping (clustering) and capturing
234 8f7f2f4a 2021-12-17 jrmu
235 8f7f2f4a 2021-12-17 jrmu (?: )
236 8f7f2f4a 2021-12-17 jrmu
237 8f7f2f4a 2021-12-17 jrmu
238 8f7f2f4a 2021-12-17 jrmu
239 8f7f2f4a 2021-12-17 jrmu grouping (clustering) only. no capturing. (somewhat faster)
240 8f7f2f4a 2021-12-17 jrmu
241 8f7f2f4a 2021-12-17 jrmu .
242 8f7f2f4a 2021-12-17 jrmu
243 8f7f2f4a 2021-12-17 jrmu
244 8f7f2f4a 2021-12-17 jrmu
245 8f7f2f4a 2021-12-17 jrmu match any single character (usually not "\n")
246 8f7f2f4a 2021-12-17 jrmu
247 8f7f2f4a 2021-12-17 jrmu [ ]
248 8f7f2f4a 2021-12-17 jrmu
249 8f7f2f4a 2021-12-17 jrmu
250 8f7f2f4a 2021-12-17 jrmu
251 8f7f2f4a 2021-12-17 jrmu define a character class, match any single character in class
252 8f7f2f4a 2021-12-17 jrmu
253 8f7f2f4a 2021-12-17 jrmu
254 8f7f2f4a 2021-12-17 jrmu *
255 8f7f2f4a 2021-12-17 jrmu
256 8f7f2f4a 2021-12-17 jrmu
257 8f7f2f4a 2021-12-17 jrmu
258 8f7f2f4a 2021-12-17 jrmu (quantifier): match previous item zero or more times
259 8f7f2f4a 2021-12-17 jrmu
260 8f7f2f4a 2021-12-17 jrmu +
261 8f7f2f4a 2021-12-17 jrmu
262 8f7f2f4a 2021-12-17 jrmu
263 8f7f2f4a 2021-12-17 jrmu
264 8f7f2f4a 2021-12-17 jrmu (quantifier): match previous item one or more times
265 8f7f2f4a 2021-12-17 jrmu
266 8f7f2f4a 2021-12-17 jrmu ?
267 8f7f2f4a 2021-12-17 jrmu
268 8f7f2f4a 2021-12-17 jrmu
269 8f7f2f4a 2021-12-17 jrmu
270 8f7f2f4a 2021-12-17 jrmu (quantifier): match previous item zero or one time
271 8f7f2f4a 2021-12-17 jrmu
272 8f7f2f4a 2021-12-17 jrmu { }
273 8f7f2f4a 2021-12-17 jrmu
274 8f7f2f4a 2021-12-17 jrmu
275 8f7f2f4a 2021-12-17 jrmu
276 8f7f2f4a 2021-12-17 jrmu (quantifier): match previous item a number of times in given range
277 8f7f2f4a 2021-12-17 jrmu
278 8f7f2f4a 2021-12-17 jrmu ^
279 8f7f2f4a 2021-12-17 jrmu
280 8f7f2f4a 2021-12-17 jrmu
281 8f7f2f4a 2021-12-17 jrmu
282 8f7f2f4a 2021-12-17 jrmu
283 8f7f2f4a 2021-12-17 jrmu (position marker): beginning of string (or possibly after "\n")
284 8f7f2f4a 2021-12-17 jrmu
285 8f7f2f4a 2021-12-17 jrmu $
286 8f7f2f4a 2021-12-17 jrmu
287 8f7f2f4a 2021-12-17 jrmu
288 8f7f2f4a 2021-12-17 jrmu
289 8f7f2f4a 2021-12-17 jrmu (position marker): end of string (or possibly before "\n")
290 8f7f2f4a 2021-12-17 jrmu
291 8f7f2f4a 2021-12-17 jrmu
292 8f7f2f4a 2021-12-17 jrmu
293 8f7f2f4a 2021-12-17 jrmu
294 8f7f2f4a 2021-12-17 jrmu Examples below. Change the value assigned to $str and re-run the script.
295 8f7f2f4a 2021-12-17 jrmu Experiment with what matches and what does not match the different
296 8f7f2f4a 2021-12-17 jrmu regular expression patterns.
297 8f7f2f4a 2021-12-17 jrmu
298 8f7f2f4a 2021-12-17 jrmu
299 8f7f2f4a 2021-12-17 jrmu my $str = "Dear sir, hello and goodday! "
300 8f7f2f4a 2021-12-17 jrmu
301 8f7f2f4a 2021-12-17 jrmu ." dogs and cats and sssnakes put me to sleep."
302 8f7f2f4a 2021-12-17 jrmu
303 8f7f2f4a 2021-12-17 jrmu ." zzzz. Hummingbirds are ffffast. "
304 8f7f2f4a 2021-12-17 jrmu
305 8f7f2f4a 2021-12-17 jrmu
306 8f7f2f4a 2021-12-17 jrmu ." Sincerely, John";
307 8f7f2f4a 2021-12-17 jrmu
308 8f7f2f4a 2021-12-17 jrmu
309 8f7f2f4a 2021-12-17 jrmu # | alternation
310 8f7f2f4a 2021-12-17 jrmu
311 8f7f2f4a 2021-12-17 jrmu # match "hello" or "goodbye"
312 8f7f2f4a 2021-12-17 jrmu
313 8f7f2f4a 2021-12-17 jrmu if($str =~ m{hello|goodbye}){warn "alt";}
314 8f7f2f4a 2021-12-17 jrmu
315 8f7f2f4a 2021-12-17 jrmu
316 8f7f2f4a 2021-12-17 jrmu # () grouping and capturing
317 8f7f2f4a 2021-12-17 jrmu
318 8f7f2f4a 2021-12-17 jrmu # match 'goodday' or 'goodbye'
319 8f7f2f4a 2021-12-17 jrmu
320 8f7f2f4a 2021-12-17 jrmu if($str =~ m{(good(day|bye))})
321 8f7f2f4a 2021-12-17 jrmu
322 8f7f2f4a 2021-12-17 jrmu {warn "group matched, captured '$1'";}
323 8f7f2f4a 2021-12-17 jrmu
324 8f7f2f4a 2021-12-17 jrmu
325 8f7f2f4a 2021-12-17 jrmu # . any single character
326 8f7f2f4a 2021-12-17 jrmu
327 8f7f2f4a 2021-12-17 jrmu # match 'cat' 'cbt' 'cct' 'c%t' 'c+t' 'c?t' ...
328 8f7f2f4a 2021-12-17 jrmu
329 8f7f2f4a 2021-12-17 jrmu if($str =~ m{c.t}){warn "period";}
330 8f7f2f4a 2021-12-17 jrmu
331 8f7f2f4a 2021-12-17 jrmu
332 8f7f2f4a 2021-12-17 jrmu
333 8f7f2f4a 2021-12-17 jrmu # [] define a character class: 'a' or 'o' or 'u'
334 8f7f2f4a 2021-12-17 jrmu
335 8f7f2f4a 2021-12-17 jrmu # match 'cat' 'cot' 'cut'
336 8f7f2f4a 2021-12-17 jrmu
337 8f7f2f4a 2021-12-17 jrmu if($str =~ m{c[aou]t}){warn "class";}
338 8f7f2f4a 2021-12-17 jrmu
339 8f7f2f4a 2021-12-17 jrmu
340 8f7f2f4a 2021-12-17 jrmu # * quantifier, match previous item zero or more
341 8f7f2f4a 2021-12-17 jrmu
342 8f7f2f4a 2021-12-17 jrmu # match '' or 'z' or 'zz' or 'zzz' or 'zzzzzzzz'
343 8f7f2f4a 2021-12-17 jrmu
344 8f7f2f4a 2021-12-17 jrmu if($str =~ m{z*}){warn "asterisk";}
345 8f7f2f4a 2021-12-17 jrmu
346 8f7f2f4a 2021-12-17 jrmu
347 8f7f2f4a 2021-12-17 jrmu # + quantifier, match previous item one or more
348 8f7f2f4a 2021-12-17 jrmu
349 8f7f2f4a 2021-12-17 jrmu # match 'snake' 'ssnake' 'sssssssnake'
350 8f7f2f4a 2021-12-17 jrmu
351 8f7f2f4a 2021-12-17 jrmu if($str =~ m{s+nake}){warn "plus sign";}
352 8f7f2f4a 2021-12-17 jrmu
353 8f7f2f4a 2021-12-17 jrmu
354 8f7f2f4a 2021-12-17 jrmu # ? quantifier, previous item is optional
355 8f7f2f4a 2021-12-17 jrmu
356 8f7f2f4a 2021-12-17 jrmu # match only 'dog' and 'dogs'
357 8f7f2f4a 2021-12-17 jrmu
358 8f7f2f4a 2021-12-17 jrmu
359 8f7f2f4a 2021-12-17 jrmu if($str =~ m{dogs?}){warn "question";}
360 8f7f2f4a 2021-12-17 jrmu
361 8f7f2f4a 2021-12-17 jrmu
362 8f7f2f4a 2021-12-17 jrmu # {} quantifier, match previous, 3 <= qty <= 5
363 8f7f2f4a 2021-12-17 jrmu
364 8f7f2f4a 2021-12-17 jrmu # match only 'fffast', 'ffffast', and 'fffffast'
365 8f7f2f4a 2021-12-17 jrmu
366 8f7f2f4a 2021-12-17 jrmu if($str =~ m{f{3,5}ast}){warn "curly brace";}
367 8f7f2f4a 2021-12-17 jrmu
368 8f7f2f4a 2021-12-17 jrmu
369 8f7f2f4a 2021-12-17 jrmu # ^ position marker, matches beginning of string
370 8f7f2f4a 2021-12-17 jrmu
371 8f7f2f4a 2021-12-17 jrmu # match 'Dear' only if it occurs at start of string
372 8f7f2f4a 2021-12-17 jrmu
373 8f7f2f4a 2021-12-17 jrmu if($str =~ m{^Dear}){warn "caret";}
374 8f7f2f4a 2021-12-17 jrmu
375 8f7f2f4a 2021-12-17 jrmu
376 8f7f2f4a 2021-12-17 jrmu # $ position marker, matches end of string
377 8f7f2f4a 2021-12-17 jrmu
378 8f7f2f4a 2021-12-17 jrmu # match 'John' only if it occurs at end of string
379 8f7f2f4a 2021-12-17 jrmu
380 8f7f2f4a 2021-12-17 jrmu if($str =~ m{John$}){warn "dollar";}
381 8f7f2f4a 2021-12-17 jrmu
382 8f7f2f4a 2021-12-17 jrmu
383 8f7f2f4a 2021-12-17 jrmu > alt at ...
384 8f7f2f4a 2021-12-17 jrmu
385 8f7f2f4a 2021-12-17 jrmu > group matched, captured 'goodday' at ...
386 8f7f2f4a 2021-12-17 jrmu
387 8f7f2f4a 2021-12-17 jrmu > period at ...
388 8f7f2f4a 2021-12-17 jrmu
389 8f7f2f4a 2021-12-17 jrmu > class at ...
390 8f7f2f4a 2021-12-17 jrmu
391 8f7f2f4a 2021-12-17 jrmu > asterisk at ...
392 8f7f2f4a 2021-12-17 jrmu
393 8f7f2f4a 2021-12-17 jrmu > plus sign at ...
394 8f7f2f4a 2021-12-17 jrmu
395 8f7f2f4a 2021-12-17 jrmu > question at ...
396 8f7f2f4a 2021-12-17 jrmu
397 8f7f2f4a 2021-12-17 jrmu > curly brace at ...
398 8f7f2f4a 2021-12-17 jrmu
399 8f7f2f4a 2021-12-17 jrmu > caret at ...
400 8f7f2f4a 2021-12-17 jrmu
401 8f7f2f4a 2021-12-17 jrmu > dollar at ...
402 8f7f2f4a 2021-12-17 jrmu
403 8f7f2f4a 2021-12-17 jrmu
404 8f7f2f4a 2021-12-17 jrmu 20.5 Capturing and Clustering Parenthesis
405 8f7f2f4a 2021-12-17 jrmu
406 8f7f2f4a 2021-12-17 jrmu Normal parentheses will both cluster and capture the pattern they
407 8f7f2f4a 2021-12-17 jrmu contain. Clustering affects the order of evaluation similar to the way
408 8f7f2f4a 2021-12-17 jrmu parentheses affect the order of evaluation within a mathematical
409 8f7f2f4a 2021-12-17 jrmu expression. Normally, multiplication has a higher precedence than
410 8f7f2f4a 2021-12-17 jrmu addition. The expression "2 + 3 * 4" does the multiplication first and
411 8f7f2f4a 2021-12-17 jrmu then the addition, yielding the result of "14". The expression "(2 + 3)
412 8f7f2f4a 2021-12-17 jrmu * 4" forces the addition to occur first, yielding the result of "20".
413 8f7f2f4a 2021-12-17 jrmu
414 8f7f2f4a 2021-12-17 jrmu
415 8f7f2f4a 2021-12-17 jrmu Clustering parentheses work in the same fashion. The pattern {cats?}
416 8f7f2f4a 2021-12-17 jrmu will apply the "?" quantifier to the letter "s", matching either "cat"
417 8f7f2f4a 2021-12-17 jrmu or "cats". The pattern {(cats)?} will apply the "?" quantifier to the
418 8f7f2f4a 2021-12-17 jrmu entire pattern within the parentheses, matching "cats" or null string.
419 8f7f2f4a 2021-12-17 jrmu
420 8f7f2f4a 2021-12-17 jrmu
421 8f7f2f4a 2021-12-17 jrmu 20.5.1 $1, $2, $3, etc Capturing parentheses
422 8f7f2f4a 2021-12-17 jrmu
423 8f7f2f4a 2021-12-17 jrmu Clustering parentheses will also Capture the part of the string that
424 8f7f2f4a 2021-12-17 jrmu matched the pattern within parentheses. The captured values are
425 8f7f2f4a 2021-12-17 jrmu accessible through some "magical" variables called $1, $2, $3, ... Each
426 8f7f2f4a 2021-12-17 jrmu left parenthesis increments the number used to access the captured
427 8f7f2f4a 2021-12-17 jrmu string. The left parenthesis are counted from left to right as they
428 8f7f2f4a 2021-12-17 jrmu occur within the pattern, starting at 1.
429 8f7f2f4a 2021-12-17 jrmu
430 8f7f2f4a 2021-12-17 jrmu
431 8f7f2f4a 2021-12-17 jrmu
432 8f7f2f4a 2021-12-17 jrmu my $test="Firstname: John Lastname: Smith";
433 8f7f2f4a 2021-12-17 jrmu
434 8f7f2f4a 2021-12-17 jrmu ############################################
435 8f7f2f4a 2021-12-17 jrmu
436 8f7f2f4a 2021-12-17 jrmu # $1 $2
437 8f7f2f4a 2021-12-17 jrmu
438 8f7f2f4a 2021-12-17 jrmu $test=~m{Firstname: (\w+) Lastname: (\w+)};
439 8f7f2f4a 2021-12-17 jrmu
440 8f7f2f4a 2021-12-17 jrmu my $first = $1;
441 8f7f2f4a 2021-12-17 jrmu
442 8f7f2f4a 2021-12-17 jrmu my $last = $2;
443 8f7f2f4a 2021-12-17 jrmu
444 8f7f2f4a 2021-12-17 jrmu print "Hello, $first $last\n";
445 8f7f2f4a 2021-12-17 jrmu
446 8f7f2f4a 2021-12-17 jrmu
447 8f7f2f4a 2021-12-17 jrmu > Hello, John Smith
448 8f7f2f4a 2021-12-17 jrmu
449 8f7f2f4a 2021-12-17 jrmu
450 8f7f2f4a 2021-12-17 jrmu
451 8f7f2f4a 2021-12-17 jrmu
452 8f7f2f4a 2021-12-17 jrmu Because capturing takes a little extra time to store the captured result
453 8f7f2f4a 2021-12-17 jrmu into the $1, $2, <85> variables, sometimes you just want to cluster without
454 8f7f2f4a 2021-12-17 jrmu the overhead of capturing. In the below example, we want to cluster
455 8f7f2f4a 2021-12-17 jrmu "day|bye" so that the alternation symbol "|" will go with "day" or
456 8f7f2f4a 2021-12-17 jrmu "bye". Without the clustering parenthesis, the pattern would match
457 8f7f2f4a 2021-12-17 jrmu "goodday" or "bye", rather than "goodday" or "goodbye". The pattern
458 8f7f2f4a 2021-12-17 jrmu contains capturing parens around the entire pattern, so we do not need
459 8f7f2f4a 2021-12-17 jrmu to capture the "day|bye" part of the pattern, therefore we use
460 8f7f2f4a 2021-12-17 jrmu cluster-only parentheses.
461 8f7f2f4a 2021-12-17 jrmu
462 8f7f2f4a 2021-12-17 jrmu
463 8f7f2f4a 2021-12-17 jrmu if($str =~ m{(good(?:day|bye))})
464 8f7f2f4a 2021-12-17 jrmu
465 8f7f2f4a 2021-12-17 jrmu {warn "group matched, captured '$1'";}
466 8f7f2f4a 2021-12-17 jrmu
467 8f7f2f4a 2021-12-17 jrmu
468 8f7f2f4a 2021-12-17 jrmu
469 8f7f2f4a 2021-12-17 jrmu Cluster-only parenthesis don't capture the enclosed pattern, and they
470 8f7f2f4a 2021-12-17 jrmu don't count when determining which magic variable, $1, $2, $3 ..., will
471 8f7f2f4a 2021-12-17 jrmu contain the values from the
472 8f7f2f4a 2021-12-17 jrmu
473 8f7f2f4a 2021-12-17 jrmu capturing parentheses.
474 8f7f2f4a 2021-12-17 jrmu
475 8f7f2f4a 2021-12-17 jrmu
476 8f7f2f4a 2021-12-17 jrmu my $test = 'goodday John';
477 8f7f2f4a 2021-12-17 jrmu
478 8f7f2f4a 2021-12-17 jrmu ##########################################
479 8f7f2f4a 2021-12-17 jrmu
480 8f7f2f4a 2021-12-17 jrmu # $1 $2
481 8f7f2f4a 2021-12-17 jrmu
482 8f7f2f4a 2021-12-17 jrmu if($test =~ m{(good(?:day|bye)) (\w+)})
483 8f7f2f4a 2021-12-17 jrmu
484 8f7f2f4a 2021-12-17 jrmu { print "You said $1 to $2\n"; }
485 8f7f2f4a 2021-12-17 jrmu
486 8f7f2f4a 2021-12-17 jrmu
487 8f7f2f4a 2021-12-17 jrmu > You said goodday to John
488 8f7f2f4a 2021-12-17 jrmu
489 8f7f2f4a 2021-12-17 jrmu
490 8f7f2f4a 2021-12-17 jrmu 20.5.2 Capturing parentheses not capturing
491 8f7f2f4a 2021-12-17 jrmu
492 8f7f2f4a 2021-12-17 jrmu If a regular expression containing capturing parentheses does not match
493 8f7f2f4a 2021-12-17 jrmu the string, the magic variables $1, $2, $3, etc will retain whatever
494 8f7f2f4a 2021-12-17 jrmu PREVIOUS value they had from any PREVIOUS regular expression. This means
495 8f7f2f4a 2021-12-17 jrmu that you MUST check to make sure the regular expression matches BEFORE
496 8f7f2f4a 2021-12-17 jrmu you use the $1, $2, $3, etc variables.
497 8f7f2f4a 2021-12-17 jrmu
498 8f7f2f4a 2021-12-17 jrmu
499 8f7f2f4a 2021-12-17 jrmu
500 8f7f2f4a 2021-12-17 jrmu In the example below, the second regular expression does not match,
501 8f7f2f4a 2021-12-17 jrmu therefore $1 retains its old value of 'be'. Instead of printing out
502 8f7f2f4a 2021-12-17 jrmu something like "Name is Horatio" or "Name is" and failing on an
503 8f7f2f4a 2021-12-17 jrmu undefined value, perl instead keeps the old value for $1 and prints
504 8f7f2f4a 2021-12-17 jrmu "Name is 'be'", instead.
505 8f7f2f4a 2021-12-17 jrmu
506 8f7f2f4a 2021-12-17 jrmu
507 8f7f2f4a 2021-12-17 jrmu my $string1 = 'To be, or not to be';
508 8f7f2f4a 2021-12-17 jrmu
509 8f7f2f4a 2021-12-17 jrmu $string1 =~ m{not to (\w+)}; # matches, $1='be'
510 8f7f2f4a 2021-12-17 jrmu
511 8f7f2f4a 2021-12-17 jrmu warn "The question is to $1";
512 8f7f2f4a 2021-12-17 jrmu
513 8f7f2f4a 2021-12-17 jrmu
514 8f7f2f4a 2021-12-17 jrmu my $string2 = 'that is the question';
515 8f7f2f4a 2021-12-17 jrmu
516 8f7f2f4a 2021-12-17 jrmu $string2 =~ m{I knew him once, (\w+)}; # no match
517 8f7f2f4a 2021-12-17 jrmu
518 8f7f2f4a 2021-12-17 jrmu warn "Name is '$1'";
519 8f7f2f4a 2021-12-17 jrmu
520 8f7f2f4a 2021-12-17 jrmu # no match, so $1 retains its old value 'be'
521 8f7f2f4a 2021-12-17 jrmu
522 8f7f2f4a 2021-12-17 jrmu
523 8f7f2f4a 2021-12-17 jrmu > The question is to be at ./script.pl line 7.
524 8f7f2f4a 2021-12-17 jrmu
525 8f7f2f4a 2021-12-17 jrmu
526 8f7f2f4a 2021-12-17 jrmu > Name is 'be' at ./script.pl line 11.
527 8f7f2f4a 2021-12-17 jrmu
528 8f7f2f4a 2021-12-17 jrmu
529 8f7f2f4a 2021-12-17 jrmu 20.6 Character Classes
530 8f7f2f4a 2021-12-17 jrmu
531 8f7f2f4a 2021-12-17 jrmu The "." metacharacter will match any single character. This is
532 8f7f2f4a 2021-12-17 jrmu equivalent to a character class that includes every possible character.
533 8f7f2f4a 2021-12-17 jrmu You can easily define smaller character classes of your own using the
534 8f7f2f4a 2021-12-17 jrmu square brackets []. Whatever characters are listed within the square
535 8f7f2f4a 2021-12-17 jrmu brackets are part of that character class. Perl will then match any one
536 8f7f2f4a 2021-12-17 jrmu character within that class.
537 8f7f2f4a 2021-12-17 jrmu
538 8f7f2f4a 2021-12-17 jrmu
539 8f7f2f4a 2021-12-17 jrmu [aeiouAEIOU] any vowel
540 8f7f2f4a 2021-12-17 jrmu
541 8f7f2f4a 2021-12-17 jrmu [0123456789] any digit
542 8f7f2f4a 2021-12-17 jrmu
543 8f7f2f4a 2021-12-17 jrmu
544 8f7f2f4a 2021-12-17 jrmu 20.6.1 Metacharacters Within Character Classes
545 8f7f2f4a 2021-12-17 jrmu
546 8f7f2f4a 2021-12-17 jrmu Within the square brackets used to define a character class, all
547 8f7f2f4a 2021-12-17 jrmu previously defined metacharacters cease to act as metacharacters and are
548 8f7f2f4a 2021-12-17 jrmu interpreted as simple literal characters. Characters classes have their
549 8f7f2f4a 2021-12-17 jrmu own special metacharacters.
550 8f7f2f4a 2021-12-17 jrmu
551 8f7f2f4a 2021-12-17 jrmu \
552 8f7f2f4a 2021-12-17 jrmu
553 8f7f2f4a 2021-12-17 jrmu
554 8f7f2f4a 2021-12-17 jrmu
555 8f7f2f4a 2021-12-17 jrmu (backslash) demeta the next character
556 8f7f2f4a 2021-12-17 jrmu
557 8f7f2f4a 2021-12-17 jrmu -
558 8f7f2f4a 2021-12-17 jrmu
559 8f7f2f4a 2021-12-17 jrmu
560 8f7f2f4a 2021-12-17 jrmu
561 8f7f2f4a 2021-12-17 jrmu (hyphen) Indicates a consecutive character range, inclusively.
562 8f7f2f4a 2021-12-17 jrmu
563 8f7f2f4a 2021-12-17 jrmu [a-f] indicates the letters a,b,c,d,e,f.
564 8f7f2f4a 2021-12-17 jrmu
565 8f7f2f4a 2021-12-17 jrmu Character ranges are based off of ASCII numeric values.
566 8f7f2f4a 2021-12-17 jrmu
567 8f7f2f4a 2021-12-17 jrmu ^
568 8f7f2f4a 2021-12-17 jrmu
569 8f7f2f4a 2021-12-17 jrmu
570 8f7f2f4a 2021-12-17 jrmu
571 8f7f2f4a 2021-12-17 jrmu If it is the first character of the class, then this indicates the class
572 8f7f2f4a 2021-12-17 jrmu
573 8f7f2f4a 2021-12-17 jrmu is any character EXCEPT the ones in the square brackets.
574 8f7f2f4a 2021-12-17 jrmu
575 8f7f2f4a 2021-12-17 jrmu Warning: [^aeiou] means anything but a lower case vowel. This
576 8f7f2f4a 2021-12-17 jrmu
577 8f7f2f4a 2021-12-17 jrmu
578 8f7f2f4a 2021-12-17 jrmu is not the same as "any consonant". The class [^aeiou] will
579 8f7f2f4a 2021-12-17 jrmu
580 8f7f2f4a 2021-12-17 jrmu match punctuation, numbers, and unicode characters.
581 8f7f2f4a 2021-12-17 jrmu
582 8f7f2f4a 2021-12-17 jrmu
583 8f7f2f4a 2021-12-17 jrmu 20.7 Shortcut Character Classes
584 8f7f2f4a 2021-12-17 jrmu
585 8f7f2f4a 2021-12-17 jrmu Perl has shortcut character classes for some more common classes.
586 8f7f2f4a 2021-12-17 jrmu
587 8f7f2f4a 2021-12-17 jrmu
588 8f7f2f4a 2021-12-17 jrmu /*shortcut*/
589 8f7f2f4a 2021-12-17 jrmu
590 8f7f2f4a 2021-12-17 jrmu
591 8f7f2f4a 2021-12-17 jrmu
592 8f7f2f4a 2021-12-17 jrmu /*class*/
593 8f7f2f4a 2021-12-17 jrmu
594 8f7f2f4a 2021-12-17 jrmu
595 8f7f2f4a 2021-12-17 jrmu
596 8f7f2f4a 2021-12-17 jrmu /*description*/
597 8f7f2f4a 2021-12-17 jrmu
598 8f7f2f4a 2021-12-17 jrmu \d
599 8f7f2f4a 2021-12-17 jrmu
600 8f7f2f4a 2021-12-17 jrmu
601 8f7f2f4a 2021-12-17 jrmu
602 8f7f2f4a 2021-12-17 jrmu [0-9]
603 8f7f2f4a 2021-12-17 jrmu
604 8f7f2f4a 2021-12-17 jrmu
605 8f7f2f4a 2021-12-17 jrmu
606 8f7f2f4a 2021-12-17 jrmu any *d*igit
607 8f7f2f4a 2021-12-17 jrmu
608 8f7f2f4a 2021-12-17 jrmu \D
609 8f7f2f4a 2021-12-17 jrmu
610 8f7f2f4a 2021-12-17 jrmu
611 8f7f2f4a 2021-12-17 jrmu
612 8f7f2f4a 2021-12-17 jrmu [^0-9]
613 8f7f2f4a 2021-12-17 jrmu
614 8f7f2f4a 2021-12-17 jrmu
615 8f7f2f4a 2021-12-17 jrmu
616 8f7f2f4a 2021-12-17 jrmu any NON-digit
617 8f7f2f4a 2021-12-17 jrmu
618 8f7f2f4a 2021-12-17 jrmu \s
619 8f7f2f4a 2021-12-17 jrmu
620 8f7f2f4a 2021-12-17 jrmu
621 8f7f2f4a 2021-12-17 jrmu
622 8f7f2f4a 2021-12-17 jrmu [ \t\n\r\f]
623 8f7f2f4a 2021-12-17 jrmu
624 8f7f2f4a 2021-12-17 jrmu
625 8f7f2f4a 2021-12-17 jrmu
626 8f7f2f4a 2021-12-17 jrmu any white*s*pace
627 8f7f2f4a 2021-12-17 jrmu
628 8f7f2f4a 2021-12-17 jrmu
629 8f7f2f4a 2021-12-17 jrmu \S
630 8f7f2f4a 2021-12-17 jrmu
631 8f7f2f4a 2021-12-17 jrmu
632 8f7f2f4a 2021-12-17 jrmu
633 8f7f2f4a 2021-12-17 jrmu [^ \t\n\r\f]
634 8f7f2f4a 2021-12-17 jrmu
635 8f7f2f4a 2021-12-17 jrmu
636 8f7f2f4a 2021-12-17 jrmu
637 8f7f2f4a 2021-12-17 jrmu any NON-whitespace
638 8f7f2f4a 2021-12-17 jrmu
639 8f7f2f4a 2021-12-17 jrmu \w
640 8f7f2f4a 2021-12-17 jrmu
641 8f7f2f4a 2021-12-17 jrmu
642 8f7f2f4a 2021-12-17 jrmu
643 8f7f2f4a 2021-12-17 jrmu [a-zA-Z0-9_]
644 8f7f2f4a 2021-12-17 jrmu
645 8f7f2f4a 2021-12-17 jrmu
646 8f7f2f4a 2021-12-17 jrmu
647 8f7f2f4a 2021-12-17 jrmu any *w*ord character (valid perl identifier)
648 8f7f2f4a 2021-12-17 jrmu
649 8f7f2f4a 2021-12-17 jrmu \W
650 8f7f2f4a 2021-12-17 jrmu
651 8f7f2f4a 2021-12-17 jrmu
652 8f7f2f4a 2021-12-17 jrmu [^a-zA-Z0-9_]
653 8f7f2f4a 2021-12-17 jrmu
654 8f7f2f4a 2021-12-17 jrmu
655 8f7f2f4a 2021-12-17 jrmu
656 8f7f2f4a 2021-12-17 jrmu any NON-word character
657 8f7f2f4a 2021-12-17 jrmu
658 8f7f2f4a 2021-12-17 jrmu
659 8f7f2f4a 2021-12-17 jrmu 20.8 Greedy (Maximal) Quantifiers
660 8f7f2f4a 2021-12-17 jrmu
661 8f7f2f4a 2021-12-17 jrmu Quantifiers are used within regular expressions to indicate how many
662 8f7f2f4a 2021-12-17 jrmu times the previous item occurs within the pattern. By default,
663 8f7f2f4a 2021-12-17 jrmu quantifiers are "greedy" or "maximal", meaning that they will match as
664 8f7f2f4a 2021-12-17 jrmu many characters as possible and still be true.
665 8f7f2f4a 2021-12-17 jrmu
666 8f7f2f4a 2021-12-17 jrmu
667 8f7f2f4a 2021-12-17 jrmu *
668 8f7f2f4a 2021-12-17 jrmu
669 8f7f2f4a 2021-12-17 jrmu
670 8f7f2f4a 2021-12-17 jrmu
671 8f7f2f4a 2021-12-17 jrmu match zero or more times (match as much as possible)
672 8f7f2f4a 2021-12-17 jrmu
673 8f7f2f4a 2021-12-17 jrmu +
674 8f7f2f4a 2021-12-17 jrmu
675 8f7f2f4a 2021-12-17 jrmu
676 8f7f2f4a 2021-12-17 jrmu
677 8f7f2f4a 2021-12-17 jrmu
678 8f7f2f4a 2021-12-17 jrmu match one or more times (match as much as possible)
679 8f7f2f4a 2021-12-17 jrmu
680 8f7f2f4a 2021-12-17 jrmu ?
681 8f7f2f4a 2021-12-17 jrmu
682 8f7f2f4a 2021-12-17 jrmu
683 8f7f2f4a 2021-12-17 jrmu
684 8f7f2f4a 2021-12-17 jrmu match zero or one times (match as much as possible)
685 8f7f2f4a 2021-12-17 jrmu
686 8f7f2f4a 2021-12-17 jrmu {count}
687 8f7f2f4a 2021-12-17 jrmu
688 8f7f2f4a 2021-12-17 jrmu
689 8f7f2f4a 2021-12-17 jrmu
690 8f7f2f4a 2021-12-17 jrmu match exactly "count" times
691 8f7f2f4a 2021-12-17 jrmu
692 8f7f2f4a 2021-12-17 jrmu {min, }
693 8f7f2f4a 2021-12-17 jrmu
694 8f7f2f4a 2021-12-17 jrmu
695 8f7f2f4a 2021-12-17 jrmu
696 8f7f2f4a 2021-12-17 jrmu match at least "min" times (match as much as possible)
697 8f7f2f4a 2021-12-17 jrmu
698 8f7f2f4a 2021-12-17 jrmu {min,max}
699 8f7f2f4a 2021-12-17 jrmu
700 8f7f2f4a 2021-12-17 jrmu
701 8f7f2f4a 2021-12-17 jrmu
702 8f7f2f4a 2021-12-17 jrmu match at least "min" and at most "max" times
703 8f7f2f4a 2021-12-17 jrmu
704 8f7f2f4a 2021-12-17 jrmu *(match as much as possible)*
705 8f7f2f4a 2021-12-17 jrmu
706 8f7f2f4a 2021-12-17 jrmu
707 8f7f2f4a 2021-12-17 jrmu
708 8f7f2f4a 2021-12-17 jrmu 20.10 Position Assertions / Position Anchors
709 8f7f2f4a 2021-12-17 jrmu
710 8f7f2f4a 2021-12-17 jrmu Inside a regular expression pattern, some symbols do not translate into
711 8f7f2f4a 2021-12-17 jrmu a character or character class. Instead, they translate into a
712 8f7f2f4a 2021-12-17 jrmu "position" within the string. If a position anchor occurs within a
713 8f7f2f4a 2021-12-17 jrmu pattern, the pattern before and after that anchor must occur within a
714 8f7f2f4a 2021-12-17 jrmu certain position within the string.
715 8f7f2f4a 2021-12-17 jrmu
716 8f7f2f4a 2021-12-17 jrmu
717 8f7f2f4a 2021-12-17 jrmu ^
718 8f7f2f4a 2021-12-17 jrmu
719 8f7f2f4a 2021-12-17 jrmu
720 8f7f2f4a 2021-12-17 jrmu
721 8f7f2f4a 2021-12-17 jrmu Matches the beginning of the string.
722 8f7f2f4a 2021-12-17 jrmu
723 8f7f2f4a 2021-12-17 jrmu If the /m (multiline) modifier is present, matches "\n" also.
724 8f7f2f4a 2021-12-17 jrmu
725 8f7f2f4a 2021-12-17 jrmu $
726 8f7f2f4a 2021-12-17 jrmu
727 8f7f2f4a 2021-12-17 jrmu
728 8f7f2f4a 2021-12-17 jrmu
729 8f7f2f4a 2021-12-17 jrmu Matches the end of the string.
730 8f7f2f4a 2021-12-17 jrmu
731 8f7f2f4a 2021-12-17 jrmu If the /m (multiline) modifier is present, matches "\n" also.
732 8f7f2f4a 2021-12-17 jrmu
733 8f7f2f4a 2021-12-17 jrmu \A
734 8f7f2f4a 2021-12-17 jrmu
735 8f7f2f4a 2021-12-17 jrmu
736 8f7f2f4a 2021-12-17 jrmu
737 8f7f2f4a 2021-12-17 jrmu Match the beginning of string only. Not affected by /m modifier.
738 8f7f2f4a 2021-12-17 jrmu
739 8f7f2f4a 2021-12-17 jrmu \z
740 8f7f2f4a 2021-12-17 jrmu
741 8f7f2f4a 2021-12-17 jrmu
742 8f7f2f4a 2021-12-17 jrmu
743 8f7f2f4a 2021-12-17 jrmu Match the end of string only. Not affected by /m modifier.
744 8f7f2f4a 2021-12-17 jrmu
745 8f7f2f4a 2021-12-17 jrmu \Z
746 8f7f2f4a 2021-12-17 jrmu
747 8f7f2f4a 2021-12-17 jrmu
748 8f7f2f4a 2021-12-17 jrmu
749 8f7f2f4a 2021-12-17 jrmu Matches the end of the string only, but will chomp() a "\n" if that
750 8f7f2f4a 2021-12-17 jrmu
751 8f7f2f4a 2021-12-17 jrmu was the last character in string.
752 8f7f2f4a 2021-12-17 jrmu
753 8f7f2f4a 2021-12-17 jrmu \b
754 8f7f2f4a 2021-12-17 jrmu
755 8f7f2f4a 2021-12-17 jrmu word "b"oundary
756 8f7f2f4a 2021-12-17 jrmu
757 8f7f2f4a 2021-12-17 jrmu A word boundary occurs in four places.
758 8f7f2f4a 2021-12-17 jrmu
759 8f7f2f4a 2021-12-17 jrmu 1) at a transition from a \w character to a \W character
760 8f7f2f4a 2021-12-17 jrmu
761 8f7f2f4a 2021-12-17 jrmu 2) at a transition from a \W character to a \w character
762 8f7f2f4a 2021-12-17 jrmu
763 8f7f2f4a 2021-12-17 jrmu 3) at the beginning of the string
764 8f7f2f4a 2021-12-17 jrmu
765 8f7f2f4a 2021-12-17 jrmu 4) at the end of the string
766 8f7f2f4a 2021-12-17 jrmu
767 8f7f2f4a 2021-12-17 jrmu \B
768 8f7f2f4a 2021-12-17 jrmu
769 8f7f2f4a 2021-12-17 jrmu
770 8f7f2f4a 2021-12-17 jrmu
771 8f7f2f4a 2021-12-17 jrmu NOT \b
772 8f7f2f4a 2021-12-17 jrmu
773 8f7f2f4a 2021-12-17 jrmu \G
774 8f7f2f4a 2021-12-17 jrmu
775 8f7f2f4a 2021-12-17 jrmu
776 8f7f2f4a 2021-12-17 jrmu usually used with /g modifier (probably want /c modifier too).
777 8f7f2f4a 2021-12-17 jrmu
778 8f7f2f4a 2021-12-17 jrmu Indicates the position after the character of the last pattern match
779 8f7f2f4a 2021-12-17 jrmu performed on the string. If this is the first regular expression begin
780 8f7f2f4a 2021-12-17 jrmu
781 8f7f2f4a 2021-12-17 jrmu performed on the string then \G will match the beginning of the
782 8f7f2f4a 2021-12-17 jrmu
783 8f7f2f4a 2021-12-17 jrmu string. Use the pos() function to get and set the current \G position
784 8f7f2f4a 2021-12-17 jrmu
785 8f7f2f4a 2021-12-17 jrmu within the string.
786 8f7f2f4a 2021-12-17 jrmu
787 8f7f2f4a 2021-12-17 jrmu
788 8f7f2f4a 2021-12-17 jrmu 20.10.1 The \b Anchor
789 8f7f2f4a 2021-12-17 jrmu
790 8f7f2f4a 2021-12-17 jrmu Use the \b anchor when you want to match a whole word pattern but not
791 8f7f2f4a 2021-12-17 jrmu part of a word. This example matches "jump" but not "jumprope":
792 8f7f2f4a 2021-12-17 jrmu
793 8f7f2f4a 2021-12-17 jrmu
794 8f7f2f4a 2021-12-17 jrmu my $test1='He can jump very high.';
795 8f7f2f4a 2021-12-17 jrmu
796 8f7f2f4a 2021-12-17 jrmu if($test1=~m{\bjump\b})
797 8f7f2f4a 2021-12-17 jrmu
798 8f7f2f4a 2021-12-17 jrmu { print "test1 matches\n"; }
799 8f7f2f4a 2021-12-17 jrmu