Python Regex Lookbehind
Summary: in this tutorial, you’ll learn about Python regex lookbehind and negative lookbehind.
Introduction to the Python regex lookbehind
In regular expressions, the lookbehind matches an element if there is another specific element before it. The lookbehind has the following syntax:
(?<=Y)X
In this syntax, the pattern will match X
if there is Y
before it.
For example, suppose you have the following string and want to match the number 500
not the number 1
:
'1 phone costs $500'
Code language: JavaScript (javascript)
To do that, you can use the following regular expression with a lookahead like this:
(?<=\$)\d+
In this pattern:
(?<=\$)
matches an element if there is a literal string$
before it. Since the$
is a special character in the regex, we use the backslash character\
to escape it. As a result, the regex engine will treat\$
as a regular character$
.\d+
matches one or more digits.
The following example uses a regular expression with a lookbehind to match a number that has the $
sign before it:
import res = '1 phone costs $500'
pattern = '(?<=\$)\d+'
matches = re.finditer(pattern, s)
for match in matches:
print(match.group())
Code language: JavaScript (javascript)
Output:
500
Negative lookbehind
The negative lookbehind has the following syntax:
(?<!Y)X
This pattern matches X
if there is no Y
before it.
The following example uses a negative lookbehind to match a number that doesn’t have the $
sign before it:
import res = '1 phone costs $500'
pattern = r'\b(?<!\$)\d+\b'
matches = re.finditer(pattern, s)
for match in matches:
print(match.group())
Code language: JavaScript (javascript)
Output:
1
In the regular expression:
r'\b(?<!\$)\d+\b'
Code language: JavaScript (javascript)
- The
\b
matches the word boundary. - The
(?<!\$)
is a negative lookbehind that does not match the$
sign. - The
\d+
matches a number with one or more digits.
Summary
- A lookbehind
(?<!Y)X
matchesX
only if there is elementY
before it. - A negative lookbehind
(?<!Y)X
matchesX
only if there’s no elementY
before it.