Regex Capturing Groups
Summary: in this tutorial, you will learn how to use the regex capturing groups to group and capture parts of a match.
Introduction to the regex capturing groups
Suppose you have a URI with the following format:
'posts/25'
Code language: PHP (php)
The URI has a resource name (posts
) and id (25
). The resource name is a string, while the resource id is an integer.
To match this URI, you can use the following pattern:
\w+/\d+
Code language: PHP (php)
The following describes the pattern:
\w+
– start with one or more words/
– contains the forward slash (/).\d+
– end with one or more number
Since the pattern contains the forward-slash (/
), it’s more readable to use the curly braces as the delimiters to form the regular expression:
"{\w+/\d+}"
Code language: PHP (php)
The following uses the preg_match()
function to match the URI:
$uri = 'posts/25';
$pattern = '{\w+/\d+}';
if (preg_match($pattern, $uri, $matches)) {
print_r($matches);
}
Code language: PHP (php)
Here’s the output:
Array
(
[0] => posts/25
)
Code language: PHP (php)
To get the id from the URI, you can use a capturing group.
A capturing group allows you to get a part of the match as a separate item in the result array.
To create a capturing group, you place part of the pattern in parentheses (...)
. For example, to capture the id
from the URI above, you can use the following regular expression with a capturing group that captures the \d+
part:
'{\w+/(\d+)}'
Code language: PHP (php)
The following shows the updated code with the capturing group:
$uri = 'posts/25';
$pattern = '{\w+/(\d+)}';
if (preg_match($pattern, $uri, $matches)) {
print_r($matches);
}
Code language: PHP (php)
Output:
Array
(
[0] => posts/25
[1] => 25
)
Code language: PHP (php)
To $matches
array now includes both the match and the capturing group. Also, you can have multiple capturing groups like this:
$uri = 'posts/25';
$pattern = '{(\w+)/(\d+)}';
if (preg_match($pattern, $uri, $matches)) {
print_r($matches);
}
Code language: PHP (php)
Output:
Array
(
[0] => posts/25
[1] => posts
[2] => 25
)
Code language: PHP (php)
Regex named groups
You can put the ?<name>
syntax immediately after the opening parenthesis to name a capturing group. For example:
$uri = 'posts/25';
$pattern = '{(?<controller>\w+)/(?<id>\d+)}';
if (preg_match($pattern, $uri, $matches)) {
print_r($matches);
}
Code language: PHP (php)
Output:
Array
(
[0] => posts/25
[controller] => posts
[1] => posts
[id] => 25
[2] => 25
)
Code language: PHP (php)
In this example, we assign the first part of the URI the name controller
and the second part the name id
.
To get only controller
and id
from the $matches
array, you can pass the $matches
array to the array_filter()
function like this:
$uri = 'posts/25';
$pattern = '{(?<controller>\w+)/(?<id>\d+)}';
if (preg_match($pattern, $uri, $matches)) {
$parts = array_filter($matches, fn($key) => is_string($key), ARRAY_FILTER_USE_KEY);
print_r($parts);
}
Code language: PHP (php)
Output:
Array
(
[controller] => posts
[id] => 25
)
Code language: PHP (php)
Note that PHP MVC frameworks often use this technique to resolve the URI with a controller and query parameters.
More regex capturing groups example
Suppose you need to match the following pattern:
controller/year/month/day
Code language: PHP (php)
And you want to capture the controller, year, month, and day.
To do that, you use the named groups for capturing groups in a pattern like the following:
// controller/year/month/day
$uri = 'posts/2021/09/12';
$pattern = '{(?<controller>\w+)/(?<year>\d{4})/(?<month>\d{2})/(?<day>\d{2})}';
if (preg_match($pattern, $uri, $matches)) {
// only get string key
$parts = array_filter($matches, fn($key) => is_string($key), ARRAY_FILTER_USE_KEY);
print_r($parts);
}
Code language: PHP (php)
Output:
Array
(
[controller] => posts
[year] => 2021
[month] => 09
[day] => 12
)
Code language: PHP (php)
Reference regex capturing groups in replacement strings
Suppose you have the name of a person in the first name and last name order e.g., 'John Doe'
and you want to reformat it in the reverse order like 'Doe, John'
:
$name = 'John Doe'; // turns into 'Doe, John'
Code language: PHP (php)
To match the name format, you can use the following regular expression:
'{\w+ \w+}'
Code language: JavaScript (javascript)
To capture the first name and last name in the matches array, you can put the \w+
pattern in parentheses:
'{(\w+) (\w+)}'
Code language: JavaScript (javascript)
The preg_replace()
function allows you to reference a capturing group by its number using the $n
format, where n
is the capturing group number.
So in the following pattern:
'{(\w+) (\w+)}'
Code language: JavaScript (javascript)
The $1
references the capturing group for the first name and $2
references the capturing group for the last name.
The following shows how to use the preg_replace()
function to swap the first name and last name and place a comma between them:
$name = 'John Doe';
$pattern = '{(\w+) (\w+)}';
echo preg_replace($pattern, '$2, $1', $name);
Code language: HTML, XML (xml)
Output:
Doe, John
Summary
- Use a regex capturing group to get a part of the match as a separate item in the result array.
- Put a part of the pattern in parentheses
(...)
to create a capturing group. - Assign a capturing group a name by putting the
?<name>
immediately after the opening parentheses(?<name>...)
. - Use
$n
to reference a capturing group, wheren
is the capturing group number.