October 15, 2024

Python Regular Expressions (Regex)

Regular expressions, often shortened to regex or regexp, are sequences of characters that form search patterns. They are used to find or manipulate patterns in strings. Python’s re module provides support for working with regular expressions.

1. Importing the re Module

To work with regular expressions in Python, you need to import the re module:

import re

2. Basic Regex Functions

The re module provides several functions for working with regular expressions:

  • re.match(): Determines if the regex matches at the beginning of the string.
  • re.search(): Scans through a string, looking for any location where the regex matches.
  • re.findall(): Returns a list of all non-overlapping matches of a pattern in a string.
  • re.finditer(): Returns an iterator yielding match objects over all non-overlapping matches.
  • re.sub(): Replaces one or many matches with a string.
  • re.split(): Splits the string by the occurrences of the pattern.

2.1 re.match()

The re.match() function checks for a match only at the beginning of the string:

import re

pattern = r"hello"
text = "hello world"

match = re.match(pattern, text)

if match:
    print("Match found:", match.group())
else:
    print("No match found.")
    

2.2 re.search()

The re.search() function searches the entire string for the first match of the pattern:

import re

pattern = r"world"
text = "hello world"

match = re.search(pattern, text)

if match:
    print("Match found:", match.group())
else:
    print("No match found.")
    

2.3 re.findall()

The re.findall() function returns a list of all matches in the string:

import re

pattern = r"\d+"
text = "There are 2 apples and 5 oranges."

matches = re.findall(pattern, text)
print("Matches found:", matches)
    

2.4 re.sub()

The re.sub() function replaces all occurrences of the pattern with a specified string:

import re

pattern = r"apples"
replacement = "bananas"
text = "I like apples and apples are tasty."

new_text = re.sub(pattern, replacement, text)
print("Updated text:", new_text)
    

2.5 re.split()

The re.split() function splits the string at each occurrence of the pattern:

import re

pattern = r"\s+"
text = "Split this text into words."

split_text = re.split(pattern, text)
print("Split text:", split_text)
    

3. Special Characters in Regular Expressions

Regular expressions use special characters to define patterns:

  • .: Matches any single character except a newline.
  • ^: Matches the start of the string.
  • $: Matches the end of the string.
  • *: Matches 0 or more repetitions of the preceding element.
  • +: Matches 1 or more repetitions of the preceding element.
  • ?: Matches 0 or 1 repetition of the preceding element.
  • {n}: Matches exactly n repetitions of the preceding element.
  • {n,}: Matches n or more repetitions of the preceding element.
  • {n,m}: Matches between n and m repetitions of the preceding element.
  • \: Escapes a special character, or signals a special sequence.
  • [...]: Matches any single character in the set.
  • [^...]: Matches any single character not in the set.
  • (...): Groups elements into a single element.
  • |: Matches either the expression before or the expression after.

4. Special Sequences

Special sequences represent predefined character sets:

  • \d: Matches any digit, equivalent to [0-9].
  • \D: Matches any non-digit character, equivalent to [^0-9].
  • \w: Matches any alphanumeric character (including underscore), equivalent to [a-zA-Z0-9_].
  • \W: Matches any non-alphanumeric character, equivalent to [^a-zA-Z0-9_].
  • \s: Matches any whitespace character (spaces, tabs, newlines).
  • \S: Matches any non-whitespace character.
  • \b: Matches the empty string at the beginning or end of a word.
  • \B: Matches the empty string not at the beginning or end of a word.

5. Compiling Regular Expressions

You can compile a regular expression pattern into a regex object for reuse, which can improve performance when the same pattern is used multiple times:

import re

# Compile the pattern
pattern = re.compile(r"\d+")

# Use the compiled pattern to search
text = "There are 2 apples and 5 oranges."
matches = pattern.findall(text)
print("Matches found:", matches)
    

6. Example: Validating an Email Address

Here’s an example of how to use regex to validate an email address:

import re

def validate_email(email):
    pattern = r"^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$"
    if re.match(pattern, email):
        return True
    return False

email = "example@example.com"
if validate_email(email):
    print("Valid email address.")
else:
    print("Invalid email address.")
    

Regular expressions are a powerful tool for pattern matching and string manipulation in Python. By mastering the re module and the various regex functions, you can perform complex text processing tasks efficiently. Whether you need to validate input, search for patterns, or replace text, regex provides a flexible solution for all these tasks.