Python Regex Lookahead
Summary: in this tutorial, you’ll learn about Python regex lookahead and negative lookahead.
Introduction to the Python regex lookahead
Sometimes, you want to match X but only if it is followed by Y. In this case, you can use the lookahead in regular expressions.
The syntax of the lookahead is as follows:
X(?=Y)Code language: Python (python)
This syntax means to search for X but matches only if it is followed by Y.
For example, suppose you have the following string:
'1 Python is about 4 feet long'Code language: Python (python)
And you want to match the number (4) that is followed by a space and the literal string feet, not the number 1. In this case, you can use the following pattern that contains a lookahead:
\d+(?=\s*feet)Code language: Python (python)
In this pattern:
\d+is the combination of the digit character set with the+quantifier that matches one or more digits.?=is the lookahead syntax\s*is the combination of the whitespace character set and*quantifier that matches zero or more whitespaces.feetmatches the literal stringfeet.
The following code uses the above pattern to match the number that is followed by zero or more spaces and the literal string feet:
import re
s = ‘1 Python is about 4 feet long’
pattern = ‘\d+(?=\s*feet)’
matches = re.finditer(pattern,s)
for match in matches:
print(match.group())
Code language: Python (python)
Output:
4Code language: PHP (php)
Regex multiple lookaheads
Regex allows you to have multiple lookaheads with the following syntax:
X(?=Y)(?=Z)Code language: Python (python)
In this syntax, the regex engine will perform the following steps:
- Find X
- Test if Y is immediately after X, skip if it isn’t.
- Test if Z is also immediately after Y; skip if it isn’t.
- If both tests pass, the X is a match; otherwise, search for the next match.
So the X(?=Y)(?=Z) pattern matches X followed by Y and Z simultaneously.
Regex negative lookaheads
Suppose you want to match only the number 1 in the following text but not the number 4:
'1 Python is about 4 feet long'Code language: Python (python)
To do that, you can use the negative lookahead syntax:
X(?!Y)Code language: Python (python)
The X(?!Y) matches X only if it is followed by Y. It’s the \d+ not followed by the literal string feet:
import re
s = ‘1 Python is about 4 feet long’
pattern = ‘\d+(?!\s*feet)’
matches = re.finditer(pattern,s)
for match in matches:
print(match.group())
Code language: Python (python)
Output:
1Code language: plaintext (plaintext)
Summary
- Use the Python regex lookahead
X(?=Y)that matchesXonly if it is followed byY. - Use the negative regex lookahead
X(?!Y)that matchesXonly if it is not followed byY.