[엑셀로 웹 스크래핑2] Scrape webpage data using Excel & VBA

This post is going to get a bit technical- If you’re just looking for a quick tool to scrape data off pages and you’re not trying to set up the code yourself, I’d recommend this post: Search source code of any sites with Excel. You can just download the completed VBA file, and get some ideas about practical marketing applications of why you would want to scrape webpage data.

Considerations for coding a basic VBA script to scrape webpage data

In the past, I’ve highlighted one method of coding VBA to scrape webpage data by using Internet Explorer and making VBA calls. While IE is good for certain applications, if all you need is to scrape webpage source HTML code then oHTTP calls allow VBA to get through a lot more pages in a shorter amount of time, and generally encounter fewer errors.

If you’ve never coded VBA, I’d recommend looking at my introductory piece before diving into this particular code.

Writing the code

Since Excel will be the driving thing, generally the starting place is going to be some sort of list of URLs that you need to extract data from. The basic outline of what you will be coding is:

 

For each Cell in URL-List
  oHTTP Navigate to Cell.Value
  Open site source
  Perform some action with HTML file
  Output results of action in some way
Next

 

The actual code you’ll need to have the ability to scrape webpage data follows. First, you’ll need to define your variables and set your HTTP object.

 

Sub Site_Source_Scraper()
    Dim navtar As String
    Dim oHTTP As Object
    Dim page_html As String
    Dim cell As Range
    Set oHTTP = CreateObject("msxml2.ServerXMLHTTP")

 

In this first block of code, we are simply defining the variables and the objects we’ll need for the script. Navtar is the string that is the URL for the website we’ll be scraping data off of, oHTTP is HTTP object we’ll be using for most of our commands, page_html the string of the complete html file for the page, and of course cell is just the cell we are currently getting a URL from. Lastly, setting oHTTP as a Server XML object allows us to make URL calls with oHTTP.

The next section is the loop that does the actual scraping.

 

경축! 아무것도 안하여 에스천사게임즈가 새로운 모습으로 재오픈 하였습니다.
어린이용이며, 설치가 필요없는 브라우저 게임입니다.
https://s1004games.com

    For Each cell In Selection
        cell.Activate
        navtar = Replace(Replace(Replace(ActiveCell.Value, "https://", ""), "http://", ""), "www.", "") 'Clean URL
        navtar = "http://" & navtar
        On Error GoTo HTTPErr:
        oHTTP.Open "GET", navtar, False
        oHTTP.send
        page_html = (oHTTP.responseText)
        'put all the awesome code here!
LoopPickup:
    Next
Exit Sub

 

The whole block of code is incased in a for-loop that will lookup the webpages for all the cells selected when the script is executed. The first thing that happens inside of the loop is cleaning the URL. The URL will have to be of the form http://URLHERE.COM (or .net, .org, etc.). If there is a www. in front of the URL it will cause an error. The Replace( command is there to basically pull out all the extraneous info from the URL. Then, in the next line, “http://” is added to the beginning of the URL.

To avoid errors from crashing the script, the On Error right before the .Open command will push any problems to the error handler (see below). The actual “get this data” happens next. The .Open command takes the form OBJECT.Open “GET”, theURL, False to navigate to a page; after that a .Send will execute the command. The result is the variable “page_html”. The variable is a giant string of the entire HTML of the webpage. If you’re looking for ideas of things to do with this, see my post on why a marketer would be searching through HTML..

Lastly, the error handler must be built.

 

HTTPErr:
    If Err.Number <> 0 Then
        ActiveCell.Offset(0, 1).Value = "Error: " & Err.Description
    End If
    Resume LoopPickup:
End Sub

 

When you put those 3 parts together you’ll have a VBA script that can let you search through a massive list of sites instantly, scraping webpage data for whatever you might need. Leave comments below about what you used this for!

Complete VBA code:

 

Sub Site_Source_Scraper()
    Dim navtar As String
    Dim oHTTP As Object
    Dim page_html As String
    Dim cell As Range
    Set oHTTP = CreateObject("msxml2.ServerXMLHTTP")
    
    For Each cell In Selection
        cell.Activate
        navtar = Replace(Replace(Replace(ActiveCell.Value, "https://", ""), "http://", ""), "www.", "") 'Clean URL
        navtar = "http://" & navtar
        On Error GoTo HTTPErr:
        oHTTP.Open "GET", navtar, False
        oHTTP.send
        page_html = (oHTTP.responseText)
        'put all the awesome code here!
LoopPickup:
    Next
Exit Sub

HTTPErr:
    If Err.Number <> 0 Then
        ActiveCell.Offset(0, 1).Value = "Error: " & Err.Description
    End If
    Resume LoopPickup:
End Sub

 

[출처] http://tipsformarketers.com/scrape-webpage-data-using-excel-vba/

본 웹사이트는 광고를 포함하고 있습니다.
광고 클릭에서 발생하는 수익금은 모두 웹사이트 서버의 유지 및 관리, 그리고 기술 콘텐츠 향상을 위해 쓰여집니다.
번호 제목 글쓴이 날짜 조회 수
1195 [ 一日30分 인생승리의 학습법] VBA Web Scraping: How Can VBA Be Used To Scrape Website Data? file 졸리운_곰 2024.04.13 3
1194 [ 一日30分 인생승리의 학습법] 윈도우 실행파일 구조(PE파일) file 졸리운_곰 2024.03.31 3
1193 [ 一日30分 인생승리의 학습법] [Analysis] PE(Portable Executable) 파일 포맷 공부 file 졸리운_곰 2024.03.31 3
1192 [ 一日30分 인생승리의 학습법] 성공하는 메타버스의 3가지 조건 file 졸리운_곰 2024.03.30 7
1191 [ 一日30分 인생승리의 학습법] REST, REST API, RESTful 과 HATEOAS file 졸리운_곰 2024.03.10 9
1190 [ 一日30分 인생승리의 학습법] 렌더링 삼형제 CSR, SSR, SSG 이해하기 file 졸리운_곰 2024.03.10 2
1189 [ 一日30分 인생승리의 학습법] 엑셀 VBA에서 셀레니움 사용을 위한 Selenium Basic 설치 file 졸리운_곰 2024.02.23 11
1188 [ 一日30分 인생승리의 학습법]500 Lines or Less Blockcode: A Visual Programming Toolkit : 500줄 이하의 블록코드: 시각적 프로그래밍 툴킷 졸리운_곰 2024.02.12 4
1187 [ 一日30分 인생승리의 학습법] 구글 클라이언트(앱) 아이디를 발급받으려면 어떻게 해야 하나요? 졸리운_곰 2024.01.28 3
1186 [ 一日30分 인생승리의 학습법] 빅뱅 프로젝트를 성공적으로 오픈하기 위한 팁 졸리운_곰 2023.12.27 16
1185 [ 一日30分 인생승리의 학습법]“빅뱅 전환보다 단계적 전환 방식이 이상적 애자일팀과 협업 쉽게 체질 개선을” file 졸리운_곰 2023.12.27 12
1184 [ 一日30分 인생승리의 학습법] Big-bang / phased 접근 file 졸리운_곰 2023.12.27 3
1183 [ 一日30分 인생승리의 학습법] CodeDragon 메뉴 데이터 전환의 개념 이해 - 데이터 전환의 개념, 데이터 전환방식, 데이터 전환방식 및 장단점 비교, 데이터전환 이후 검토해야 할 사항 졸리운_곰 2023.12.27 5
1182 [ 一日30分 인생승리의 학습법] 블록체인과 IPFS를 이용한 안전한 데이터 공유 플랫폼 - 분쟁 해결 시스템 file 졸리운_곰 2023.12.27 6
1181 [ 一日30分 인생승리의 학습법] 블록체인과 IPFS를 이용한 안전한 데이터 공유 플랫폼 - 개념과 리뷰 시스템 file 졸리운_곰 2023.12.27 4
1180 [ 一日30分 인생승리의 학습법] 소켓 CLOSE_WAIT 발생 현상 및 처리 방안 file 졸리운_곰 2023.12.03 7
1179 [ 一日30分 인생승리의 학습법] robots 설정하기 졸리운_곰 2023.12.03 3
1178 [ 一日30分 인생승리의 학습법] A Tutorial and Elementary Trajectory Model for the Differential Steering System of Robot Wheel Actuators : 로봇 휠 액츄에이터의 차동 조향 시스템에 대한 튜토리얼 및 기본 궤적 모델 file 졸리운_곰 2023.11.29 6
1177 [ 一日30分 인생승리의 학습법] Streamline Your MLOps Journey with CodeProject.AI Server : CodeProject.AI 서버로 MLOps 여정을 간소화하세요 file 졸리운_곰 2023.11.25 2
1176 [ 一日30分 인생승리의 학습법] Comparing Self-Hosted AI Servers: A Guide for Developers / : 자체 호스팅 AI 서버 비교: 개발자를 위한 가이드 file 졸리운_곰 2023.11.25 10
대표 김성준 주소 : 경기 용인 분당수지 U타워 등록번호 : 142-07-27414
통신판매업 신고 : 제2012-용인수지-0185호 출판업 신고 : 수지구청 제 123호 개인정보보호최고책임자 : 김성준 sjkim70@stechstar.com
대표전화 : 010-4589-2193 [fax] 02-6280-1294 COPYRIGHT(C) stechstar.com ALL RIGHTS RESERVED