Python 正则表达式分组与捕获

分组用于捕获匹配内容、反向引用和逻辑分组，是正则表达式的高级应用。

基本分组

Python

import re

text = "2024-05-19"
pattern = r"(\d{4})-(\d{2})-(\d{2})"

match = re.search(pattern, text)
print(match.group(0))  # 2024-05-19（完整匹配）
print(match.group(1))  # 2024（第一个分组）
print(match.group(2))  # 05（第二个分组）
print(match.group(3))  # 19（第三个分组）

print(match.groups())  # ('2024', '05', '19')

嵌套分组

Python

import re

text = "user@example.com"
pattern = r"((\w+)@(\w+)\.(\w+))"

match = re.search(pattern, text)
print(match.group(1))  # user@example.com（整个邮箱）
print(match.group(2))  # user
print(match.group(3))  # example
print(match.group(4))  # com

命名分组

Python

import re

text = "2024-05-19"
pattern = r"(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})"

match = re.search(pattern, text)
print(match.group('year'))   # 2024
print(match.group('month'))  # 05
print(match.group('day'))    # 19

# 获取所有命名分组
print(match.groupdict())  # {'year': '2024', 'month': '05', 'day': '19'}

反向引用

Python

import re

# 匹配重复单词
text = "hello hello world"
pattern = r"(\w+)\s+\1"  # \1 引用第一个分组

match = re.search(pattern, text)
print(match.group(0))  # hello hello

# 匹配成对的引号
text = "'quoted text'"
pattern = r"(['\"])(.*?)\1"  # 开始和结束引号相同

match = re.search(pattern, text)
print(match.group(2))  # quoted text

命名分组反向引用

Python

import re

text = "<div>content</div>"
pattern = r"<(?P<tag>\w+)>.*?</(?P=tag)>"  # (?P=name) 引用命名分组

match = re.search(pattern, text)
print(match.group('tag'))  # div

非捕获分组

Python

import re

# (?:...) 非捕获分组，不计入 group 编号
text = "http://example.com"
pattern = r"(?:http|ftp)://(\w+)\.(\w+)"

match = re.search(pattern, text)
print(match.groups())  # ('example', 'com')（协议不在捕获中）

分组应用示例

Python

import re

# 提取电话号码各部分
text = "电话: 010-12345678"
pattern = r"(\d{3})-(\d{8})"
match = re.search(pattern, text)
if match:
    print(f"区号: {match.group(1)}, 号码: {match.group(2)}")

# 重组日期格式
text = "2024-05-19"
pattern = r"(\d{4})-(\d{2})-(\d{2})"
new_format = re.sub(pattern, r"\2/\3/\1", text)
print(new_format)  # 05/19/2024

findall 捕获

Python

import re

text = "a1 b2 c3 d4"
pattern = r"(\w)(\d)"

matches = re.findall(pattern, text)
print(matches)  # [('a', '1'), ('b', '2'), ('c', '3'), ('d', '4')]

# 无分组返回完整匹配
matches = re.findall(r"\w\d", text)
print(matches)  # ['a1', 'b2', 'c3', 'd4']

finditer 迭代

Python

import re

text = "a1 b2 c3"
pattern = r"(\w)(\d)"

for match in re.finditer(pattern, text):
    print(f"匹配: {match.group(0)}, 字母: {match.group(1)}, 数字: {match.group(2)}")

分组类型对比

类型	语法	特点
普通分组	(...)	捕获内容，可引用
命名分组	(?P...)	按名称访问
非捕获分组	(?:...)	仅分组，不捕获
反向引用	\n 或 (?P=name)	引用已捕获内容

分组边界处理

Python

import re

text = "color colour"
pattern = r"col(ou)?r"  # 可选分组

matches = re.findall(pattern, text)
print(matches)  # ['', 'ou']（未匹配返回空）

# 验证分组是否存在
match = re.search(pattern, "color")
print(match.group(1))  # None 或空字符串

要点总结

(...) 普通分组，match.group(n) 获取内容
(?P<name>...) 命名分组，按名称访问
\n 反向引用第 n 个分组
(?P=name) 反向引用命名分组
(?:...) 非捕获分组，不计入编号
findall 返回分组元组列表
分组用于提取、重组、验证匹配内容

📝 发现内容有误？点击此处直接编辑