Python Regex Lookahead
Summary: in this tutorial, you’ll learn about Python regex lookahead and negative lookahead.
Introduction to the Python regex lookahead
Sometimes, you want to match X
but only if it is followed by Y
. In this case, you can use the lookahead in regular expressions.
The syntax of the lookahead is as follows:
X(?=Y)
Code language: Python (python)
This syntax means to search for X
but matches only if it is followed by Y
.
For example, suppose you have the following string:
'1 Python is about 4 feet long'
Code language: Python (python)
And you want to match the number (4
) that is followed by a space and the literal string feet
, not the number 1
. In this case, you can use the following pattern that contains a lookahead:
\d+(?=\s*feet)
Code language: Python (python)
In this pattern:
\d+
is the combination of the digit character set with the+
quantifier that matches one or more digits.?=
is the lookahead syntax\s*
is the combination of the whitespace character set and*
quantifier that matches zero or more whitespaces.feet
matches the literal stringfeet
.
The following code uses the above pattern to match the number that is followed by zero or more spaces and the literal string feet
:
import re
s = ‘1 Python is about 4 feet long’
pattern = ‘\d+(?=\s*feet)’
matches = re.finditer(pattern,s)
for match in matches:
print(match.group())
Code language: Python (python)
Output:
4
Code language: PHP (php)
Regex multiple lookaheads
Regex allows you to have multiple lookaheads with the following syntax:
X(?=Y)(?=Z)
Code language: Python (python)
In this syntax, the regex engine will perform the following steps:
- Find X
- Test if Y is immediately after X, skip if it isn’t.
- Test if Z is also immediately after Y; skip if it isn’t.
- If both tests pass, the X is a match; otherwise, search for the next match.
So the X(?=Y)(?=Z)
pattern matches X
followed by Y
and Z
simultaneously.
Regex negative lookaheads
Suppose you want to match only the number 1
in the following text but not the number 4
:
'1 Python is about 4 feet long'
Code language: Python (python)
To do that, you can use the negative lookahead syntax:
X(?!Y)
Code language: Python (python)
The X(?!Y)
matches X
only if it is followed by Y
. It’s the \d+
not followed by the literal string feet
:
import re
s = ‘1 Python is about 4 feet long’
pattern = ‘\d+(?!\s*feet)’
matches = re.finditer(pattern,s)
for match in matches:
print(match.group())
Code language: Python (python)
Output:
1
Code language: plaintext (plaintext)
Summary
- Use the Python regex lookahead
X(?=Y)
that matchesX
only if it is followed byY
. - Use the negative regex lookahead
X(?!Y)
that matchesX
only if it is not followed byY
.