Regex Capturing Groups

Created with Sketch.

Regex Capturing Groups

Summary: in this tutorial, you will learn how to use the regex capturing groups to group and capture parts of a match.

Introduction to the regex capturing groups

Suppose you have a URI with the following format:

'posts/25'

Code language: PHP (php)

The URI has a resource name (posts) and id (25). The resource name is a string, while the resource id is an integer.

To match this URI, you can use the following pattern:

\w+/\d+

Code language: PHP (php)

The following describes the pattern:

  • \w+ – start with one or more words
  • / – contains the forward slash (/).
  • \d+ – end with one or more number

Since the pattern contains the forward-slash (/), it’s more readable to use the curly braces as the delimiters to form the regular expression:

"{\w+/\d+}"

Code language: PHP (php)

The following uses the preg_match() function to match the URI:

<?php

$uri = 'posts/25';
$pattern = '{\w+/\d+}';

if (preg_match($pattern, $uri, $matches)) {
print_r($matches);
}

Code language: PHP (php)

Here’s the output:

Array
(
[0] => posts/25
)

Code language: PHP (php)

To get the id from the URI, you can use a capturing group.

A capturing group allows you to get a part of the match as a separate item in the result array.

To create a capturing group, you place part of the pattern in parentheses (...). For example, to capture the id from the URI above, you can use the following regular expression with a capturing group that captures the \d+ part:

'{\w+/(\d+)}'

Code language: PHP (php)

The following shows the updated code with the capturing group:

<?php

$uri = 'posts/25';
$pattern = '{\w+/(\d+)}';

if (preg_match($pattern, $uri, $matches)) {
print_r($matches);
}

Code language: PHP (php)

Output:

Array
(
[0] => posts/25
[1] => 25
)

Code language: PHP (php)

To $matches array now includes both the match and the capturing group. Also, you can have multiple capturing groups like this:

<?php

$uri = 'posts/25';
$pattern = '{(\w+)/(\d+)}';

if (preg_match($pattern, $uri, $matches)) {
print_r($matches);
}

Code language: PHP (php)

Output:

Array
(
[0] => posts/25
[1] => posts
[2] => 25
)

Code language: PHP (php)

Regex named groups

You can put the ?<name> syntax immediately after the opening parenthesis to name a capturing group. For example:

<?php

$uri = 'posts/25';
$pattern = '{(?<controller>\w+)/(?<id>\d+)}';

if (preg_match($pattern, $uri, $matches)) {
print_r($matches);
}

Code language: PHP (php)

Output:

Array
(
[0] => posts/25
[controller] => posts
[1] => posts
[id] => 25
[2] => 25
)

Code language: PHP (php)

In this example, we assign the first part of the URI the name controller and the second part the name id.

To get only controller and id from the $matches array, you can pass the $matches array to the array_filter() function like this:

<?php

$uri = 'posts/25';
$pattern = '{(?<controller>\w+)/(?<id>\d+)}';

if (preg_match($pattern, $uri, $matches)) {
$parts = array_filter($matches, fn($key) => is_string($key), ARRAY_FILTER_USE_KEY);
print_r($parts);
}

Code language: PHP (php)

Output:

Array
(
[controller] => posts
[id] => 25
)

Code language: PHP (php)

Note that PHP MVC frameworks often use this technique to resolve the URI with a controller and query parameters.

More regex capturing groups example

Suppose you need to match the following pattern:

controller/year/month/day

Code language: PHP (php)

And you want to capture the controller, year, month, and day.

To do that, you use the named groups for capturing groups in a pattern like the following:

<?php

// controller/year/month/day
$uri = 'posts/2021/09/12';

$pattern = '{(?<controller>\w+)/(?<year>\d{4})/(?<month>\d{2})/(?<day>\d{2})}';

if (preg_match($pattern, $uri, $matches)) {
// only get string key
$parts = array_filter($matches, fn($key) => is_string($key), ARRAY_FILTER_USE_KEY);
print_r($parts);
}

Code language: PHP (php)

Output:

Array
(
[controller] => posts
[year] => 2021
[month] => 09
[day] => 12
)

Code language: PHP (php)

Reference regex capturing groups in replacement strings

Suppose you have the name of a person in the first name and last name order e.g., 'John Doe' and you want to reformat it in the reverse order like 'Doe, John':

$name = 'John Doe'; // turns into 'Doe, John'

Code language: PHP (php)

To match the name format, you can use the following regular expression:

'{\w+ \w+}'

Code language: JavaScript (javascript)

To capture the first name and last name in the matches array, you can put the \w+ pattern in parentheses:

'{(\w+) (\w+)}'

Code language: JavaScript (javascript)

The preg_replace() function allows you to reference a capturing group by its number using the $n format, where n is the capturing group number.

So in the following pattern:

'{(\w+) (\w+)}'

Code language: JavaScript (javascript)

The $1 references the capturing group for the first name and $2 references the capturing group for the last name.

The following shows how to use the preg_replace() function to swap the first name and last name and place a comma between them:

<?php

$name = 'John Doe';
$pattern = '{(\w+) (\w+)}';

echo preg_replace($pattern, '$2, $1', $name);

Code language: HTML, XML (xml)

Output:

Doe, John

 

Summary

  • Use a regex capturing group to get a part of the match as a separate item in the result array.
  • Put a part of the pattern in parentheses (...) to create a capturing group.
  • Assign a capturing group a name by putting the ?<name> immediately after the opening parentheses (?<name>...).
  • Use $n to reference a capturing group, where n is the capturing group number.

Leave a Reply

Your email address will not be published. Required fields are marked *