네이버쇼핑 크롤링

## parser_custom.py

from bs4 import BeautifulSoup

def getProductInfo(li):
    # print(li)
    img = li.find("img")
    alt = img['alt']
    priceReload = li.find("span", {"class":"num"})
    aTit = li.find("a", {"class":"link"})
    href = aTit['href']

    return {"name":alt, "price":priceReload.text.replace(",", ""), "link":href}

    # try:
    #     img = li.find("img")
    #     alt = img['alt']
    #     priceReload = li.find("span", {"class":"_price_reload"})
    #     aTit = li.find("a", {"class":"link"})
    #     href = aTit['href']

    #     return {"name":alt, "price":priceReload.text.replace(",", ""), "link":href}
    # except AttributeError as e:
    #     img = li.find("img")
    #     alt = img['alt']
    #     priceReload = li.find("span", {"class":"num"})
    #     aTit = li.find("a", {"class":"link"})
    #     href = aTit['href']

    #     return {"name":alt, "price":priceReload.text.replace(",", ""), "link":href}
    #     # return {"name":'', "price":'', "link":''}
    
    
def parse(pageString):
    bsObj = BeautifulSoup(pageString, "html.parser")
    ul = bsObj.find("ul", {"class":"goods_list"})
    lis = ul.findAll("li", {"class":"_itemSection"})
    # print(len(lis))
    # print(lis[0])

    products = []
    for li in lis:  # :1으로 입력하면 1개만 테스트로. 20번째에 문제가 있는 것으로 보임
        product = getProductInfo(li)
        products.append(product)

    return products

## stage_naver_shopping_paging.py

import requests
from parser_custom import parse
import json

def crawl(productName, pageNo):
    url = "https://search.shopping.naver.com/search/all.nhn?query={}&pagingIndex={}&cat_id=&frm=NVSHATC".format(productName, pageNo)
    data = requests.get(url)
    print(data, url)
    return data.content

productName = "콘센트"

totalProducts = []
for pageNo in range(1, 10+1):   # 1 페이지부터 N 페이지 까지
    pageString = crawl(productName, pageNo)
    products = parse(pageString)
    totalProducts += products

print(totalProducts)
print(len(totalProducts))

file = open("./products.json", "w+")
file.write(json.dumps(totalProducts))

## analyze.py

import pandas as pd

df = pd.read_json("./products.json")

# print(df.count())

writer = pd.ExcelWriter("products.xlsx",options={'strings_to_urls': False})
df.to_excel(writer, "sheet1")
writer.save()

저작자표시 비영리 변경금지 (새창열림)

'CAE > Enjoy Programming' 카테고리의 다른 글

[Trouble-shooting] 네이버쇼핑 크롤링 중 발생한 AttributeError의 예외처리 (0)	2019.12.27
[Python] ImportError: DLL load failed: 지정된 모듈을 찾을 수 없습니다. (0)	2019.08.08
클립보드 구글번역 프로그램 ClipToGT (0)	2017.12.12
MATLAB에서 load, importdata, textscan의 차이 및 활용 (1)	2017.11.29
[Trouble Shooting] 첨자 인덱스는 실수형 양의 정수(복소수형 정수가 아님)이거나 논리형이어야 합니다 (1)	2017.11.29

일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30

Beat Inertia and Gain Momentum

네이버쇼핑 크롤링

'CAE > Enjoy Programming' 카테고리의 다른 글

티스토리툴바

'CAE > Enjoy Programming' 카테고리의 다른 글

검색

티스토리툴바