Regular Expression: Word Boundaries

Created with Sketch.

Regular Expression: Word Boundaries

Summary: in this tutorial, you’ll learn how to use the word boundary in regular expressions.

The (\b) is an anchor like the caret (^) and the dollar sign ($). It matches a position that is called a “word boundary”. The word boundary match is zero-length.

The following three positions are qualified as word boundaries:

  • Before the first character in a string if the first character is a word character.
  • After the last character in a string if the last character is a word character.
  • Between two characters in a string if one is a word character and the other is not.

Simply put, the word boundary \b allows you to carry the match the whole word using a regular expression in the following form:

\bword\b

Code language: JavaScript (javascript)

For example, in the string Hello, JS! the following positions qualify as a word boundary:

The following example returns 'JS' because 'Hello, JS!' matches the regular expression /\bJS\b/:

console.log('Hello, JS!'.match(/\bJS\b/)); // true

Code language: JavaScript (javascript)

Output:

["JS"]

Code language: JavaScript (javascript)

However, the 'Hello, JScript' doesn’t match /\bJS\b/:

console.log('Hello, JSscript!'.match(/\bJS\b/)); // null

Code language: JavaScript (javascript)

Note that without \b, the /JS/ matches both 'Hello, JS' and 'Hello, JScript':

console.log('Hello, JSscript!'.match(/JS/)); // ["JS"]
console.log('Hello, JS!'.match(/JS/)); // ["JS"]

Code language: JavaScript (javascript)

It’s possible to use the word boundary with digits.

For example, the regular expression \b\d\d\d\d\b matches a 4-digit number surrounded by characters different from \w:

console.log('ES 2015'.match(/\b\d\d\d\d\b/));

Code language: JavaScript (javascript)

Output:

["2015"]

Code language: JavaScript (javascript)

The following example uses the word boundary to find the time that has the format hh:mm e.g., 09:15:

let str = 'I start coding JS at 05:30 AM';
let re = /\b\d\d:\d\d\b/;
let result = str.match(re);

console.log(result);

Code language: JavaScript (javascript)

Output:

["05:30"]

Code language: JavaScript (javascript)

It’s important to note that the \b doesn’t work for non-Latin alphabets.

As you have seen so far, the patterns \d\d\d\d and \d\d has been used to match a four-digit or a two-digit number.

It’ll be easier and more flexible if you use quantifiers that will be covered in the quantifiers tutorial. Basically, you can use \d{4} instead of \d\d\d\d, which is much shorter.

Summary

  • The \b anchor matches a word boundary position.

Leave a Reply

Your email address will not be published. Required fields are marked *