Regex Backreferences

Created with Sketch.

Regex Backreferences

Summary: in this tutorial, you’ll learn how to use the regex backreferences and their practical applications.

Introduction to Regex Backreferences

When constructing a regular expression, you can put \n within the pattern. In this case, the \n is a backreference of a capturing group number n.

Regex Backreference examples

Let’s take some examples of using the regex backreferences.

1) Using regex backreferences to remove doubled word in the text

Suppose you have a text that has some doubled words. For example:

$str = "It's the the PHP 8.0";

Code language: PHP (php)

In this example, the word 'the' is doubled in the text.

To detect the doubled word, you can use the following regular expression:

'/\b(\w+)\s+\1\b/'

Code language: PHP (php)

In this regular expression:

  • \b matches the word boundary.
  • (\w+) matches one or more word characters. It’s also a capturing group with the number 1.
  • \s+ matches one or more whitespaces, including spaces.
  • \1 is a backreference that references the capturing group 1.

Here’s the complete code:

<?php

$str = "It's the the PHP 8.0";
$pattern = '/\b(\w+)\s+\1\b/';

if (preg_match($pattern, $str, $matches)) {
print_r($matches);
}

Code language: PHP (php)

Output:

Array
(
[0] => the the
[1] => the
)

Code language: PHP (php)

2) Using regex backreferences to match the text in single & double quotes

Suppose you need to get the text inside double quotes ("), for example:

"text here"

Code language: PHP (php)

or single quotes:

'text here'

Code language: PHP (php)

But not mixed between double and single quotes like this:

'will not match."

Code language: PHP (php)

To do that, you can use the backreferences as shown in the following regular expression:

'/([\'"])(.*?)\1/'

Code language: PHP (php)

In this regular expression:

  • The [\'"] matches any text that starts with a single or double quote. Since we use a single-quoted string, we need to escape it using the backslash character (\).
  • The ([\'"]) creates the first capturing group with group number 1.
  • The (.*?) creates the second capturing group that has non-greedy, which matches as few characters (except the newline) as possible.
  • The \1 is a backreference that references the first capturing group.

Here’s the complete code:

<?php

$messages = [
'They said: "PHP is awesome"',
"They said: 'PHP is awesome'",
'They said: "PHP\'s awesome"'
];

$pattern = '/([\'"])(.*?)\1/';

foreach ($messages as $message) {
if (preg_match($pattern, $message, $matches)) {
echo $matches[0] . PHP_EOL;
}
}

Code language: PHP (php)

Output:

"PHP is awesome"
'PHP is awesome'
"PHP's awesome"

Code language: PHP (php)

Summary

  • The \n in the pattern is a backreference that references the capturing group n, where n is an integer greater than zero.

Leave a Reply

Your email address will not be published. Required fields are marked *