Introduction
Regular expressions (regex) in Python provide powerful tools for string manipulation, and re.sub()
is the primary function for replacing text using regex patterns. Whether you need to clean data, format text, or modify large datasets efficiently, mastering re.sub()
is essential. This guide covers basic and advanced use cases, helping you understand regex replacement in Python with practical, real-world examples.
Basic Usage of re.sub()
The re.sub()
function replaces occurrences of a regex pattern in a string.
Syntax:
import re
pattern = r"apple"
replacement = "orange"
text = "I like apple pie and apple juice."
result = re.sub(pattern, replacement, text)
print(result) # Output: I like orange pie and orange juice.
This replaces all occurrences of “apple” with “orange.”
Using Flags in re.sub()
Flags modify regex behavior. Some commonly used flags include:
re.IGNORECASE
(case-insensitive search)re.MULTILINE
(multi-line matching)re.DOTALL
(matches newlines)
Example: Case-insensitive replacement
text = "Python is FUN. Learning python is great!"
result = re.sub(r"python", "Java", text, flags=re.IGNORECASE)
print(result) # Output: Java is FUN. Learning Java is great!
Handling Special Characters in Regex Replacement
Some characters have special meanings in regex (e.g., .
matches any character). To match them literally, escape with \
.
Example: Replacing dots in an IP address
text = "192.168.1.1"
result = re.sub(r"\.", "-", text)
print(result) # Output: 192-168-1-1
Using Backreferences and Captured Groups
Captured groups allow dynamic replacements using \1
, \2
, etc.
Example: Swapping first and last names
text = "Doe, John"
result = re.sub(r"(\w+), (\w+)", r"\2 \1", text)
print(result) # Output: John Doe
Replacing Multiple Patterns Simultaneously
You can use alternation (|
) to replace multiple patterns in one call.
Example: Removing extra spaces and tabs
text = "Hello World\tPython"
result = re.sub(r"\s+", " ", text)
print(result) # Output: Hello World Python
Another approach is using a dictionary for complex replacements:
replacements = {"Python": "Java", "C++": "Rust"}
pattern = re.compile("|".join(re.escape(k) for k in replacements.keys()))
text = "I love Python and C++."
result = pattern.sub(lambda m: replacements[m.group(0)], text)
print(result) # Output: I love Java and Rust.
Comparison: re.sub()
vs. str.replace()
vs. str.translate()
Method | Use Case | Regex Support |
---|---|---|
re.sub() | Complex patterns, backreferences, and dynamic replacements | ✅ Yes |
str.replace() | Simple, static replacements | ❌ No |
str.translate() | Character-level replacements using mappings | ❌ No |
Example: Using str.replace()
text = "Hello World"
print(text.replace("World", "Python")) # Output: Hello Python
Real-World Applications of re.sub()
- Data Cleaning: Remove special characters from user input.
- Text Formatting: Convert snake_case to CamelCase.
- Log File Parsing: Extract and replace timestamps.
Example: Extracting dates from logs
log = "Error on 2025-02-23 at 10:30 AM"
date_pattern = r"\d{4}-\d{2}-\d{2}"
result = re.sub(date_pattern, "[DATE]", log)
print(result) # Output: Error on [DATE] at 10:30 AM
Advanced Usage: Callback Functions in re.sub()
For complex replacements, use a function instead of a string.
Example: Capitalizing matched words
def capitalize_match(match):
return match.group(0).upper()
text = "hello world, python is fun!"
result = re.sub(r"\b\w+\b", capitalize_match, text)
print(result) # Output: HELLO WORLD, PYTHON IS FUN!
Conclusion
Python’s re.sub()
is a powerful tool for text replacement, offering flexibility through regex patterns, flags, and dynamic replacements. Understanding its advanced features can significantly enhance your ability to manipulate text efficiently.
For further details, refer to the official Python documentation on regular expressions. Let explore more about python.