Python 正则表达式分组与捕获
分组用于捕获匹配内容、反向引用和逻辑分组,是正则表达式的高级应用。
基本分组
Python
import re
text = "2024-05-19"
pattern = r"(\d{4})-(\d{2})-(\d{2})"
match = re.search(pattern, text)
print(match.group(0)) # 2024-05-19(完整匹配)
print(match.group(1)) # 2024(第一个分组)
print(match.group(2)) # 05(第二个分组)
print(match.group(3)) # 19(第三个分组)
print(match.groups()) # ('2024', '05', '19')
嵌套分组
Python
import re
text = "user@example.com"
pattern = r"((\w+)@(\w+)\.(\w+))"
match = re.search(pattern, text)
print(match.group(1)) # user@example.com(整个邮箱)
print(match.group(2)) # user
print(match.group(3)) # example
print(match.group(4)) # com
命名分组
Python
import re
text = "2024-05-19"
pattern = r"(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})"
match = re.search(pattern, text)
print(match.group('year')) # 2024
print(match.group('month')) # 05
print(match.group('day')) # 19
# 获取所有命名分组
print(match.groupdict()) # {'year': '2024', 'month': '05', 'day': '19'}
反向引用
Python
import re
# 匹配重复单词
text = "hello hello world"
pattern = r"(\w+)\s+\1" # \1 引用第一个分组
match = re.search(pattern, text)
print(match.group(0)) # hello hello
# 匹配成对的引号
text = "'quoted text'"
pattern = r"(['\"])(.*?)\1" # 开始和结束引号相同
match = re.search(pattern, text)
print(match.group(2)) # quoted text
命名分组反向引用
Python
import re
text = "<div>content</div>"
pattern = r"<(?P<tag>\w+)>.*?</(?P=tag)>" # (?P=name) 引用命名分组
match = re.search(pattern, text)
print(match.group('tag')) # div
非捕获分组
Python
import re
# (?:...) 非捕获分组,不计入 group 编号
text = "http://example.com"
pattern = r"(?:http|ftp)://(\w+)\.(\w+)"
match = re.search(pattern, text)
print(match.groups()) # ('example', 'com')(协议不在捕获中)
分组应用示例
Python
import re
# 提取电话号码各部分
text = "电话: 010-12345678"
pattern = r"(\d{3})-(\d{8})"
match = re.search(pattern, text)
if match:
print(f"区号: {match.group(1)}, 号码: {match.group(2)}")
# 重组日期格式
text = "2024-05-19"
pattern = r"(\d{4})-(\d{2})-(\d{2})"
new_format = re.sub(pattern, r"\2/\3/\1", text)
print(new_format) # 05/19/2024
findall 捕获
Python
import re
text = "a1 b2 c3 d4"
pattern = r"(\w)(\d)"
matches = re.findall(pattern, text)
print(matches) # [('a', '1'), ('b', '2'), ('c', '3'), ('d', '4')]
# 无分组返回完整匹配
matches = re.findall(r"\w\d", text)
print(matches) # ['a1', 'b2', 'c3', 'd4']
finditer 迭代
Python
import re
text = "a1 b2 c3"
pattern = r"(\w)(\d)"
for match in re.finditer(pattern, text):
print(f"匹配: {match.group(0)}, 字母: {match.group(1)}, 数字: {match.group(2)}")
分组类型对比
| 类型 | 语法 | 特点 |
|---|---|---|
| 普通分组 | (...) | 捕获内容,可引用 |
| 命名分组 | (?P...) | 按名称访问 |
| 非捕获分组 | (?:...) | 仅分组,不捕获 |
| 反向引用 | \n 或 (?P=name) | 引用已捕获内容 |
分组边界处理
Python
import re
text = "color colour"
pattern = r"col(ou)?r" # 可选分组
matches = re.findall(pattern, text)
print(matches) # ['', 'ou'](未匹配返回空)
# 验证分组是否存在
match = re.search(pattern, "color")
print(match.group(1)) # None 或空字符串
要点总结
(...)普通分组,match.group(n)获取内容(?P<name>...)命名分组,按名称访问\n反向引用第 n 个分组(?P=name)反向引用命名分组(?:...)非捕获分组,不计入编号findall返回分组元组列表- 分组用于提取、重组、验证匹配内容
📝 发现内容有误?点击此处直接编辑