Regex Backreferences
Summary: in this tutorial, you’ll learn how to use the regex backreferences and their practical applications.
Introduction to Regex Backreferences
When constructing a regular expression, you can put \n
within the pattern. In this case, the \n
is a backreference of a capturing group number n
.
Regex Backreference examples
Let’s take some examples of using the regex backreferences.
1) Using regex backreferences to remove doubled word in the text
Suppose you have a text that has some doubled words. For example:
$str = "It's the the PHP 8.0";
Code language: PHP (php)
In this example, the word 'the'
is doubled in the text.
To detect the doubled word, you can use the following regular expression:
'/\b(\w+)\s+\1\b/'
Code language: PHP (php)
In this regular expression:
\b
matches the word boundary.(\w+)
matches one or more word characters. It’s also a capturing group with the number 1.\s+
matches one or more whitespaces, including spaces.\1
is a backreference that references the capturing group1
.
Here’s the complete code:
$str = "It's the the PHP 8.0";
$pattern = '/\b(\w+)\s+\1\b/';
if (preg_match($pattern, $str, $matches)) {
print_r($matches);
}
Code language: PHP (php)
Output:
Array
(
[0] => the the
[1] => the
)
Code language: PHP (php)
2) Using regex backreferences to match the text in single & double quotes
Suppose you need to get the text inside double quotes ("
), for example:
"text here"
Code language: PHP (php)
or single quotes:
'text here'
Code language: PHP (php)
But not mixed between double and single quotes like this:
'will not match."
Code language: PHP (php)
To do that, you can use the backreferences as shown in the following regular expression:
'/([\'"])(.*?)\1/'
Code language: PHP (php)
In this regular expression:
- The
[\'"]
matches any text that starts with a single or double quote. Since we use a single-quoted string, we need to escape it using the backslash character (\
). - The
([\'"])
creates the first capturing group with group number 1. - The
(.*?)
creates the second capturing group that has non-greedy, which matches as few characters (except the newline) as possible. - The
\1
is a backreference that references the first capturing group.
Here’s the complete code:
$messages = [
'They said: "PHP is awesome"',
"They said: 'PHP is awesome'",
'They said: "PHP\'s awesome"'
];
$pattern = '/([\'"])(.*?)\1/';
foreach ($messages as $message) {
if (preg_match($pattern, $message, $matches)) {
echo $matches[0] . PHP_EOL;
}
}
Code language: PHP (php)
Output:
"PHP is awesome"
'PHP is awesome'
"PHP's awesome"
Code language: PHP (php)
Summary
- The
\n
in the pattern is a backreference that references the capturing groupn
, wheren
is an integer greater than zero.