크롤링에서 POST 요청이 필요한 경우

GET 요청으로 원하는 정보를 모두 가져올 수 있으면 좋겠지만, form 태그를 통해 정보를 입력하고 POST요청을 수행해야 정보를 받아올 수 있는 경우도 있다.

이런 경우 payload에서 form 정보를 가져오고 가져온 정보로 POST요청을 날려 데이터를 받아온 후 처리하자.

예시로, 토요코인호텔 사이트가 있다.

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager
from bs4 import BeautifulSoup

import requests
import time
import pyautogui
import os
import urllib.request
import openpyxl

# 꺼짐 방지
chrome_options = Options()
chrome_options.add_experimental_option("detach", True)

# 불필요한 에러메세지 없애기 
chrome_options.add_experimental_option("excludeSwitches", ["enable-logging"])

# 크롬드라이버 매너저를 통해 드라이버를 설치, 서비스를 만들어낸다
service = Service(executable_path=ChromeDriverManager().install())
driver = webdriver.Chrome(service = service, options=chrome_options)

#---- selenium 기본설정 끝----#

url = "https://www.toyoko-inn.com/korea/search"


# payload에 있는 헤더 정보로 post 요청을 진행해야 정보를 받아올 수 있다.

data_obj = {
    "lcl_id": "ko",
    "prcssng_dvsn": "dtl",    
    "sel_area_txt": "한국",
    "sel_htl_txt": "토요코인 서울강남",
    "chck_in": "2022/09/26",
    "inn_date": "1",
    "sel_area": "8",
    "sel_htl": "00282",
    "rsrv_num": "1",
    "sel_ldgngPpl": "1"
}

response = requests.post(url, data = data_obj)
html = response.text
soup = BeautifulSoup(html, html.parser)
beds = soup.select("ul.btnLink03")

for bed in beds:
    links = bed.select('a')
    if len(links) > 0:
        print("빈 방 있음")

헤더 정보로 POST요청을 보내야 데이터를 받아오고, 데이터를 크롤링 할 수 있다.

저작자표시 (새창열림)

'낙서장' 카테고리의 다른 글

[HTML] Form (1) (0)	2022.10.25
네이버 지도 크롤링 (0)	2022.09.25
네이버 금융 크롤링 (2)	2022.09.24
[CSS] CSS 정리 (5) (0)	2022.09.21
이미지 크롤링 (2)	2022.09.21

크롤링에서 POST 요청이 필요한 경우

'낙서장' 카테고리의 다른 글

댓글

이 글 공유하기

티스토리툴바

'낙서장' 카테고리의 다른 글

댓글

이 글 공유하기

다른 글

[HTML] Form (1)

네이버 지도 크롤링

네이버 금융 크롤링

[CSS] CSS 정리 (5)

티스토리툴바