Passion/Python

RE(regular expression)

sunshout 2009. 9. 30. 20:30

Python은 스트링 처리가 아주 막강하다.
RE(regular expression)에 대해서 알아보자

예를 들어
netstat -at 를 하면

Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 localhost:33504 *:* LISTEN
tcp 0 0 *:7200 *:* LISTEN
tcp 0 0 192.168.122.1:domain *:* LISTEN
tcp 0 0 *:ssh *:* LISTEN
tcp 0 0 localhost:ipp *:* LISTEN
tcp 0 0 localhost:33274 *:* LISTEN
tcp 0 0 localhost:6010 *:* LISTEN
tcp 0 0 localhost:6011 *:* LISTEN

다음과 같은 결과가 나왔다고 하자
이를 각 칼럼별로 분리하고 싶으면

(Language : python)

import re
temp2 = re.split('\s+', line)

# line을 s+ (즉 1개 이상의 문자열)로 나눔

string 을 파싱하여 토큰으로 저장

email4 = re.compile(r'(?P<user>[\w.]+)@(?P<domain>\w+)\.(?P<suffix>[a-z]{3})')
match = email4.match('guido@python.org')
match.groupdict()
{'domain': 'python', 'suffix': 'org', 'user': 'guido'}

import re

a = 'arn:aws:service:region:account-id:resource-id'

a = 'arn:aws:service:region:account-id:resource-type/resource-id'

a = 'arn:aws:service:region:account-id:resource-type:resource-id'

#a = 'arn:aws:service:region::resource-id'

p = (r"(?Parn):"

 r"(?P<partition>aws|aws-cn|aws-us-gov):"

 r"(?P<service>\[A-Za-z0-9\_\\-\]\*):"

 r"(?P<region>\[A-Za-z0-9\_\\-\]\*):"

 r"(?P<account>\[A-Za-z0-9\_\\-\]\*):"

 r"(?P<resources>\[A-Za-z0-9\_\\-:/\]\*)")

r = re.compile(p)

match = r.match(a)

d = match.groupdict()

print(d)

b = "resource-id"

b = "resource-type:resource-id"

#b = "resource-type/resource-id"

print(re.split('/|:', b))

a = 'data.aws.lifecycle=spot'

b = 'data.auto_scaling_group!='

item = re.compile(r'(?P[\w.]+)(?P=|!=|<=|<|>=|>)(?P[\w.]*)')

m = item.match(a)

m.groupdict()

{'k': 'data.aws.lifecycle', 'o': '=', 'v': 'spot'}

{'k': 'data.auto_scaling_group', 'o': '!=', 'v': ''}

 

negative search를 하고 싶을 때

m = re.search('(?![a-zA-Z0-9\-]).*', 'plugin-dfsdfds_gggg')
m
<re.Match object; span=(14, 19), match='_gggg'>