Regular expressions, often shortened to regex or regexp, are sequences of characters that form search patterns. They are used to find or manipulate patterns in strings. Python’s re
module provides support for working with regular expressions.
1. Importing the re
Module
To work with regular expressions in Python, you need to import the re
module:
import re
2. Basic Regex Functions
The re
module provides several functions for working with regular expressions:
re.match()
: Determines if the regex matches at the beginning of the string.re.search()
: Scans through a string, looking for any location where the regex matches.re.findall()
: Returns a list of all non-overlapping matches of a pattern in a string.re.finditer()
: Returns an iterator yielding match objects over all non-overlapping matches.re.sub()
: Replaces one or many matches with a string.re.split()
: Splits the string by the occurrences of the pattern.
2.1 re.match()
The re.match()
function checks for a match only at the beginning of the string:
import re
pattern = r"hello"
text = "hello world"
match = re.match(pattern, text)
if match:
print("Match found:", match.group())
else:
print("No match found.")
2.2 re.search()
The re.search()
function searches the entire string for the first match of the pattern:
import re
pattern = r"world"
text = "hello world"
match = re.search(pattern, text)
if match:
print("Match found:", match.group())
else:
print("No match found.")
2.3 re.findall()
The re.findall()
function returns a list of all matches in the string:
import re
pattern = r"\d+"
text = "There are 2 apples and 5 oranges."
matches = re.findall(pattern, text)
print("Matches found:", matches)
2.4 re.sub()
The re.sub()
function replaces all occurrences of the pattern with a specified string:
import re
pattern = r"apples"
replacement = "bananas"
text = "I like apples and apples are tasty."
new_text = re.sub(pattern, replacement, text)
print("Updated text:", new_text)
2.5 re.split()
The re.split()
function splits the string at each occurrence of the pattern:
import re
pattern = r"\s+"
text = "Split this text into words."
split_text = re.split(pattern, text)
print("Split text:", split_text)
3. Special Characters in Regular Expressions
Regular expressions use special characters to define patterns:
.
: Matches any single character except a newline.^
: Matches the start of the string.$
: Matches the end of the string.*
: Matches 0 or more repetitions of the preceding element.+
: Matches 1 or more repetitions of the preceding element.?
: Matches 0 or 1 repetition of the preceding element.{n}
: Matches exactlyn
repetitions of the preceding element.{n,}
: Matchesn
or more repetitions of the preceding element.{n,m}
: Matches betweenn
andm
repetitions of the preceding element.\
: Escapes a special character, or signals a special sequence.[...]
: Matches any single character in the set.[^...]
: Matches any single character not in the set.(...)
: Groups elements into a single element.|
: Matches either the expression before or the expression after.
4. Special Sequences
Special sequences represent predefined character sets:
\d
: Matches any digit, equivalent to[0-9]
.\D
: Matches any non-digit character, equivalent to[^0-9]
.\w
: Matches any alphanumeric character (including underscore), equivalent to[a-zA-Z0-9_]
.\W
: Matches any non-alphanumeric character, equivalent to[^a-zA-Z0-9_]
.\s
: Matches any whitespace character (spaces, tabs, newlines).\S
: Matches any non-whitespace character.\b
: Matches the empty string at the beginning or end of a word.\B
: Matches the empty string not at the beginning or end of a word.
5. Compiling Regular Expressions
You can compile a regular expression pattern into a regex object for reuse, which can improve performance when the same pattern is used multiple times:
import re
# Compile the pattern
pattern = re.compile(r"\d+")
# Use the compiled pattern to search
text = "There are 2 apples and 5 oranges."
matches = pattern.findall(text)
print("Matches found:", matches)
6. Example: Validating an Email Address
Here’s an example of how to use regex to validate an email address:
import re
def validate_email(email):
pattern = r"^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$"
if re.match(pattern, email):
return True
return False
email = "example@example.com"
if validate_email(email):
print("Valid email address.")
else:
print("Invalid email address.")
Regular expressions are a powerful tool for pattern matching and string manipulation in Python. By mastering the re
module and the various regex functions, you can perform complex text processing tasks efficiently. Whether you need to validate input, search for patterns, or replace text, regex provides a flexible solution for all these tasks.