본문 바로가기

Python

Python - 정규표현식

728x90

 

Regular expression operations in Python

https://docs.python.org/3/library/re.html

 

re — Regular expression operations

Source code: Lib/re/ This module provides regular expression matching operations similar to those found in Perl. Both patterns and strings to be searched can be Unicode strings ( str) as well as 8-...

docs.python.org

 

 

Test site for regular expresion 

https://regex101.com/

 

regex101: build, test, and debug regex

Regular expression tester with syntax highlighting, explanation, cheat sheet for PHP/PCRE, Python, GO, JavaScript, Java, C#/.NET, Rust.

regex101.com

 

The Zen of Python

 

Python Easter Egg

$> python -c "import this"     //실행하면 아래 내용이 출력됨

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.

....

 

 

Basic Pattern Expressions

 python 을 실행한 후 아래 command 를 차례로 입력한다.

import re
line = "Beautiful is better than ugly."
matches = re.findall("Beautiful", line)
print(matches)

output : ['Beautiful']

matches2 = re.findall("beautiful", line, re.IGNORECASE)
print(matches2)

output :   ['Beautiful']

 

 

Start or End Matching

zen2 = """Although never is often better than * right * now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea - - let's do more of those!"""

m = re.findall("^If", zen2, re.MULTILINE) # ^ : start with If
m2 = re.findall("idea\.", zen2, re.MULTILINE) # find "idea." \. means real '.'.
m22 = re.findall("idea.*", zen2, re.MULTILINE) #zere or more occurrence of any characters followed by idea
m3 = re.findall("idea.$", zen2, re.MULTILINE) # $ : end with idea.
print(m, m2, m22, m3)

 

m = re.findall("^If", zen2, re.MULTILINE) # ^ : start with If
m2 = re.findall("idea\.", zen2, re.MULTILINE) # find "idea." \. means real '.'.
m22 = re.findall("idea.*", zen2, re.MULTILINE) #zere or more occurrence of any characters followed by idea
m3 = re.findall("idea.$", zen2, re.MULTILINE) # $ : end with idea.

 

output : ['If', 'If'] ['idea.', 'idea.'] ['idea.', 'idea.', "idea - - let's do more of those!"] ['idea.', 'idea.']

 

 

Select One and None

 

import re

string = "Two aa too"

# m = re.findall("t[ow]o", string)  #[ow] : ‘o’ or ‘w’
m = re.findall("t[ow]o", string, re.IGNORECASE)
print(m)

 

output :  ['Two', 'too']

 

 

m = re.findall("t[^w]o", string, re.IGNORECASE) #[]안의 ^은 NOT임
print(m)

output :  ['too']

 

 

Find Numbers

 

import re

string = "123?45yy7890 hi 999 hello"

m1 = re.findall("\d", string) # ‘\d’ 은 숫자
m2 = re.findall("[0-9]{1,2}", string) #[0-9]:0부터9까지 {1,2}: 1글자에서 2글자.  
m3 = re.findall("[1-5]{1,2}", string) #[1-5]:1부터5까지 {1,2}: 1글자에서 2글자.

print("m1=", m1)

output :  m1= ['1', '2', '3', '4', '5', '7', '8', '9', '0', '9', '9', '9']


print("m2=", m2)

output :  m2= ['12', '3', '45', '78', '90', '99', '9']

 

print("m3=", m3)
output :  m3= ['12', '3', '45']

 

 

Compiled Pattern

 

import re

string = "123?45yy7890 hi 999 hello"

# pattern = re.compile("[0-9]{1,3}") #[0-9]:0부터9까지 {1,3}: 1글자에서 3글자.
pattern = re.compile("(\d{1,3})") #숫자중 {1,3}: 1글자에서 3글자.

mm = re.findall(pattern, string)   # mm 은 list type
print(mm)

output :  ['123', '45', '789', '0', '999']


for m in re.finditer(pattern, string): # m은 tuple type
    print(m.groups())
output : 

('123',)
('45',)
('789',)
('0',)
('999',)

 

 

Compiled Pattern (Cont'd)

 

import re

string = "aaaaaaa<hr>This</hr>"

pattern = re.compile("<(.*)>")         cf. re.compile("<.*>") #<hr>This</hr>
# 소괄호 ( 은 찾고자 하는 문자 시작을 의미한다.

mm = re.findall(pattern, string)
print(mm)

output :  ['hr>This</hr']

 


for m in re.finditer(pattern, string):
    print(m.groups(1))

output :  ('hr>This</hr',)

 

re.sub() : replace

 

import re

string = "https://aaa.bbb.com/"

x = re.sub("(http(s)*|:|/)", '', str)

print(x)

 

 

Try This

 

The Zen of Python에서 'is better than' 이 있는 문장들을 찾아내고,

무엇이 무엇보다 나은지를 소문자로 출력하시오.

  1. 나은것만 출력하기
  2. 나은것 > 못한것
    ex.
    beautiful > ugly
    simple > complex
    now > never

 

'Python' 카테고리의 다른 글

postman 사용법 w/ Django  (0) 2022.06.02