Regex Character Classes
Summary: in this tutorial, you’ll learn about the regex character classes and how to create regular expressions with patterns that match a set of characters.
A character class is a set of characters, for example, alphabets, numbers, whitespaces.
A character class allows you to create a regular expression with a pattern that matches a string with one or more characters in a set.
Note that a character class is also known as a character set.
The digit character class
The \d
represents the digit character class that matches any single digit from 0 to 9. The following example uses the digit character class that matches any single digit in a phone number:
$pattern = '/\d/';
$phone = '(650)-543-2100';
if (preg_match_all($pattern, $phone, $matches)) {
print_r($matches[0]);
}
Code language: PHP (php)
Output:
Array
(
[0] => 6
[1] => 5
[2] => 0
[3] => 5
[4] => 4
[5] => 3
[6] => 2
[7] => 1
[8] => 0
[9] => 0
)
Code language: PHP (php)
In this example, the preg_match_all()
function returns 10 digits.
The word character class
The \w
represents the word character class. It matches a single ASCII character, including Latin alphabets, digits, and underscore (_
).
The following example uses the word character class to match all characters, including Latin alphabets and digits:
$pattern = '/\w/';
$str = 'PHP 8.0';
if (preg_match_all($pattern, $str, $matches)) {
print_r($matches[0]);
}
Code language: PHP (php)
Output:
Array
(
[0] => P
[1] => H
[2] => P
[3] => 8
[4] => 0
)
Code language: PHP (php)
Notice that the regular expression /\w/
doesn’t match the spaces and dot (.
).
The whitespace character class
The \s
matches whitespace such as a space, a tab, a newline, a carriage return, a vertical tab, and a NUL-byte:
- ” ” (ASCII 32 (0x20)), an ordinary space.
- “\t” (ASCII 9 (0x09)), a tab.
- “\n” (ASCII 10 (0x0A)), a new line (line feed).
- “\r” (ASCII 13 (0x0D)), a carriage return.
- “\v” (ASCII 11 (0x0B)), a vertical tab.
- “\0” (ASCII 0 (0x00)), the NUL-byte.
The following example uses the whitespace character class to match all spaces in a string:
$pattern = '/\s/';
$str = 'PHP version 8.0';
echo preg_match_all($pattern, $str, $matches);
Code language: PHP (php)
It returns two as expected.
Inverse character classes
A character class has an inverse set with the same letter but in the uppercase:
\D
is the inverse character class of the\d
character class, which matches any character except a digit.\S
is the inverse character class of the\s
character set, which matches any character except whitespace.\W
is the inverse character class of the\w
, which matches any character except a word character.
The following example uses the \D
character class to match any characters except digits:
$pattern = '/\D/';
$phone = '(650)-543-2100';
if (preg_match_all($pattern, $phone, $matches)) {
print_r($matches[0]);
}
Code language: PHP (php)
Output:
Array
(
[0] => (
[1] => )
[2] => -
[3] => -
)
Code language: PHP (php)
The dot (.) character class
The dot (.
) is a special character class that matches any character but a new line.
The following example uses the dot (.
) character class to match any character except the new line.
$pattern = '/./';
$str = "PHP\n";
if (preg_match_all($pattern, $str, $matches)) {
print_r($matches[0]);
}
Code language: PHP (php)
Output:
Array
(
[0] => P
[1] => H
[2] => P
)
Code language: PHP (php)
Summary
- Use
\d
character class to match any single digit. - Use
\w
character class to match any word character. - Use
\s
character class to match any whitespace. - The
\D
,\W
,\S
character class are the inverse sets of\d
,\w
, and\s
character class. - Use the dot character class (
.
) to match any character but a new line.