Python Regex Word Boundary
Summary: in this tutorial, you’ll learn how to construct regular expressions that match word boundary positions in a string.
Introduction to the Python regex word boundary
A string has the following positions that qualify as word boundaries:
- Before the first character in the string if the first character is a word character (
\w). - Between two characters in the string if the first character is a word character (
\w) and the other is not (\W– inverse character set of the word character\w). - After the last character in a string if the last character is the word character (
\w)
The following picture shows the word boundary positions in the string "PYTHON 3!":
In this example, the "PYTHON 3!" string has four word boundary positions:
- Before the letter P (criteria #1)
- After the letter N (criteria #2)
- Before the digit 3 (criteria #2)
- After the digit 3 (criteria #2)
Regular expressions use the \b to represent a word boundary. For example, you can use the \b to match the whole word using the following pattern:
r'\bword\b'Code language: JavaScript (javascript)
The following example matches the word Python in a string:
import res = 'CPython is the implementation of Python in C'
matches = re.finditer('Python', s)
for match in matches:
print(match.group())
Code language: JavaScript (javascript)
It returns two matches, one in the word CPython and another in the word Python.
Python
Python
However, if you use the word boundary \b, the program returns one match:
import res = 'CPython is the implementation of Python in C'
matches = re.finditer(r'\bPython\b', s)
for match in matches:
print(match.group())
Code language: JavaScript (javascript)
Output:
<re.Match object; span=(33, 39), match='Python'>Code language: HTML, XML (xml)
In this example, the '\bPython\b' pattern match the whole word Python in the string 'CPython is the implementation of Python in C'.
Summary
- The
\brepresents a word boundary in a string. - Use the
r'\bword\b'pattern uses the \b to match the wholeword