Python Regex Replace: The Ultimate Guide

Introduction

Regular expressions (regex) in Python provide powerful tools for string manipulation, and re.sub() is the primary function for replacing text using regex patterns. Whether you need to clean data, format text, or modify large datasets efficiently, mastering re.sub() is essential. This guide covers basic and advanced use cases, helping you understand regex replacement in Python with practical, real-world examples.

Basic Usage of re.sub()

The re.sub() function replaces occurrences of a regex pattern in a string.

Syntax:

import re

pattern = r"apple"
replacement = "orange"
text = "I like apple pie and apple juice."

result = re.sub(pattern, replacement, text)
print(result)  # Output: I like orange pie and orange juice.

This replaces all occurrences of “apple” with “orange.”

Using Flags in re.sub()

Flags modify regex behavior. Some commonly used flags include:

  • re.IGNORECASE (case-insensitive search)
  • re.MULTILINE (multi-line matching)
  • re.DOTALL (matches newlines)

Example: Case-insensitive replacement

text = "Python is FUN. Learning python is great!"
result = re.sub(r"python", "Java", text, flags=re.IGNORECASE)
print(result)  # Output: Java is FUN. Learning Java is great!

Handling Special Characters in Regex Replacement

Some characters have special meanings in regex (e.g., . matches any character). To match them literally, escape with \.

Example: Replacing dots in an IP address

text = "192.168.1.1"
result = re.sub(r"\.", "-", text)
print(result)  # Output: 192-168-1-1

Using Backreferences and Captured Groups

Captured groups allow dynamic replacements using \1, \2, etc.

Example: Swapping first and last names

text = "Doe, John"
result = re.sub(r"(\w+), (\w+)", r"\2 \1", text)
print(result)  # Output: John Doe

Replacing Multiple Patterns Simultaneously

You can use alternation (|) to replace multiple patterns in one call.

Example: Removing extra spaces and tabs

text = "Hello    World\tPython"
result = re.sub(r"\s+", " ", text)
print(result)  # Output: Hello World Python

Another approach is using a dictionary for complex replacements:

replacements = {"Python": "Java", "C++": "Rust"}
pattern = re.compile("|".join(re.escape(k) for k in replacements.keys()))

text = "I love Python and C++."
result = pattern.sub(lambda m: replacements[m.group(0)], text)
print(result)  # Output: I love Java and Rust.

Comparison: re.sub() vs. str.replace() vs. str.translate()

MethodUse CaseRegex Support
re.sub()Complex patterns, backreferences, and dynamic replacements✅ Yes
str.replace()Simple, static replacements❌ No
str.translate()Character-level replacements using mappings❌ No

Example: Using str.replace()

text = "Hello World"
print(text.replace("World", "Python"))  # Output: Hello Python

Real-World Applications of re.sub()

  1. Data Cleaning: Remove special characters from user input.
  2. Text Formatting: Convert snake_case to CamelCase.
  3. Log File Parsing: Extract and replace timestamps.

Example: Extracting dates from logs

log = "Error on 2025-02-23 at 10:30 AM"
date_pattern = r"\d{4}-\d{2}-\d{2}"
result = re.sub(date_pattern, "[DATE]", log)
print(result)  # Output: Error on [DATE] at 10:30 AM

Advanced Usage: Callback Functions in re.sub()

For complex replacements, use a function instead of a string.

Example: Capitalizing matched words

def capitalize_match(match):
    return match.group(0).upper()

text = "hello world, python is fun!"
result = re.sub(r"\b\w+\b", capitalize_match, text)
print(result)  # Output: HELLO WORLD, PYTHON IS FUN!

Conclusion

Python’s re.sub() is a powerful tool for text replacement, offering flexibility through regex patterns, flags, and dynamic replacements. Understanding its advanced features can significantly enhance your ability to manipulate text efficiently.

For further details, refer to the official Python documentation on regular expressions. Let explore more about python.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top