Learning Regex with Javascript

Regular expressions, or regex for short, are patterns that allow you to describe, match, or parse text. With regex, you can do things like find and replace text, validate input data, extract information, and more. In this article, you will learn the basics of regex syntax, how to create and test regex patterns in Javascript, and how to use regex methods and features to perform various tasks.

Regex Syntax

A regex pattern is composed of simple characters, such as `abc`, or a combination of simple and special characters, such as `ab*c` or `Chapter (\d+)\.\d*`. The simple characters match themselves in the text, while the special characters have different meanings depending on the context. For example, the `*` after `b` means "0 or more occurrences of the preceding item", so `ab*c` matches `ac`, `abc`, `abbc`, `abbbc`, and so on.

There are many special characters in regex, and they can be grouped into different categories, such as:

  • Quantifiers: These specify how many times a character or a group of characters can occur. For example, `+` means "1 or more", `?` means "0 or 1", and `{n,m}` means "between n and m". So, `a+b?c{2,4}` matches `abc`, `abbc`, `abccc`, and `abcccc`, but not `ac`, `ab`, or `abccccc`.
  • Character classes: These define a set of characters that can match a single character in the text. For example, `\d` means "any digit", `\w` means "any word character", and `[aeiou]` means "any vowel". So, `\d\w+\d` matches `1abc2`, `3def4`, and `5ghi6`, but not `abc`, `123`, or `a1b2c3`.
  • Anchors: These indicate the position of the match in the text. For example, `^` means "the beginning of the line", `$` means "the end of the line", and `\b` means "a word boundary". So, `^\d+$` matches `123`, `456`, and `789`, but not `abc`, `1a2`, or ` 123 `.
  • Groups and Captures: These allow you to group a part of the pattern and capture the match for later use. For example, `(a|b)c` means "either `a` or `b` followed by `c`", and `(a|b)c\1` means "either `a` or `b` followed by `c` and then the same character as the first group". So, `(a|b)c` matches `ac` and `bc`, but not `cc` or `abc`, and `(a|b)c\1` matches `aca` and `bcb`, but not `acc`, `bcc`, or `acb`.

There are more special characters and features in regex, such as assertions, backreferences, lookarounds, and flags.

Creating and Testing Regex Patterns in Javascript

Using a regex literal, which consists of a pattern enclosed between slashes, as follows:

const re = /ab+c/;
Using the constructor function of the RegExp object, as follows:
const re = new RegExp("ab+c");

The regex literal provides compilation of the regex pattern when the script is loaded, so it is more efficient if the pattern is constant. The constructor function provides runtime compilation of the regex pattern, so it is more flexible if the pattern is dynamic or unknown.

To test a regex pattern against a string, you can use the following methods:

  • The `test()` method of the RegExp object, which returns a boolean value indicating whether the pattern matches the string or not. For example:
    const re = /\d+/;
    console.log(re.test("abc")); // false
    console.log(re.test("123")); // true
  • The `match()` method of the String object, which returns an array of the matched substrings, or null if no match is found. For example:
    const re = /(\w+) (\w+)/;
    console.log("John Doe".match(re)); // ["John Doe", "John", "Doe"]
    console.log("Jane Smith".match(re)); // ["Jane Smith", "Jane", "Smith"]
    console.log("abc".match(re)); // null
  • The `exec()` method of the RegExp object, which returns a similar array as the `match()` method, but also updates the properties of the RegExp object, such as `lastIndex` and `groups`. This method is useful for global or sticky matching, where you can iterate over the matches in a loop. For example:
    const re = /(\w+) (\w+)/g;
    let result;
    while ((result = re.exec("John Doe Jane Smith")) !== null) {
      console.log(result); // ["John Doe", "John", "Doe"], ["Jane Smith", "Jane", "Smith"]
      console.log(re.lastIndex); // 8, 17
    }

Using Regex Methods and Features in Javascript

Besides testing a regex pattern, you can also use regex to perform other tasks, such as:

  • Replacing text with the `replace()` method of the String object, which takes a regex pattern and a replacement string or a function as arguments, and returns a new string with the matches replaced. For example:
    const re = /(\w+) (\w+)/;
    console.log("John Doe".replace(re, "$2, $1")); // Doe, John
    console.log("Jane Smith".replace(re, (match, p1, p2) => p2.toUpperCase() + ", " + p1.toLowerCase())); // SMITH, jane
  • Splitting text with the `split()` method of the String object, which takes a regex pattern as an argument, and returns an array of the substrings separated by the matches. For example:
    const re = /[,;\s]+/;
    console.log("red,green;blue yellow".split(re)); // ["red", "green", "blue", "yellow"]
  • Searching text with the `search()` method of the String object, which takes a regex pattern as an argument, and returns the index of the first match, or -1 if no match is found. For example:
    const re = /\d+/;
    console.log("abc".search(re)); // -1
    console.log("123".search(re)); // 0
    console.log("abc123".search(re)); // 3

Using flags to modify the behavior of the regex pattern. Flags are characters that follow the regex pattern, either after the closing slash of the regex literal, or as the second argument of the RegExp constructor. There are six flags in Javascript:

  • `g` for global matching, which means finding all matches in the string, not just the first one.
  • `i` for case-insensitive matching, which means ignoring the difference between uppercase and lowercase letters.
  • `m` for multiline matching, which means treating the beginning and end of each line as anchors, not just the whole string.
  • `s` for dotall matching, which means allowing the dot `.` to match any character, including newline characters.
  • `u` for Unicode matching, which means treating the pattern as a sequence of Unicode code points, not bytes.
  • `y` for sticky matching, which means matching only from the index indicated by the `lastIndex` property of the RegExp object.

You can use one or more flags together, for example:

const re = /hello/gi;
console.log("Hello world".match(re)); // ["Hello"]
console.log("HELLO WORLD".match(re)); // ["HELLO"]

Conclusion

Regex is a powerful tool for text processing, and Javascript provides various methods and features to use regex effectively. In this article, you learned the basics of regex syntax, how to create and test regex patterns in Javascript, and how to use regex methods and features to perform various tasks. You can practice your regex skills with online tools. Happy regexing!

Post a Comment

0 Comments