Basic PHP Web Scraping Script Tutorial

Now that we have a basic idea on what web scraping is, let's get into some very simple scripts to do it. As mentioned before I'm going to assuming you have php and curl enabled on your server or desktop. So assuming you have those things installed lets get into the basics of php web scraping. I'll be posting a small program and then walking through what we have done.

 

Lets start with the most basic of scraping scripts

Whole script -

So this is exactlly what the code inside our .php file is going to look like on the server. Of course minus the line numbers.

1. <?php
2. $url = 'http://www.oooff.com';
3. $output = file_get_contents($url);
4. echo $output;
5. ?>

Script Explanation -

So what have we done here?

I'm not going to cover line 1 and 5 as we already know they just let Apache know that the code within them is to be processed as PHP.

Line 2.
$url = 'http://www.oooff.com';
As you know in PHP the $ symbol declares a variable or "holder of things". So in this we're assigning the root url to Oooff.com to a variable so that we don't have to use the whole url string each time. We can just use $url in it's place.

Line 3.
3. $output = file_get_contents($url);
Here again we're declaring the variable $output on the fly and and then calling one of the internal functions to PHP call file_get_contents(). This function is going to go to a url or file and pull all the data that's held in it. For example if you had a file called 'domains.txt' in the same directory as this script is held we'd do something like this $output = file_get_contents('domains.txt'); this would pull all the data from that file and load it into the variable $output to be used in your script. So we can do the same thing with domains, so after this line we'll have all the HTML from the homepage of Ooof.com held in the variable $output.

Line 4.
echo $output;
Very this is just going to print whatever is held in our variable to the screen. Pretty basic.

Trying things out -

ok so now that we have a solid understanding of what these lines of code are doing lets copy or type these into a file on your server where you run PHP files. Once you've created your file navigate on your browser to wherever that file is located. So if you have the file located in a directory/folder called phpfiles in the root directory on your local machine. You'd go to http://localhost/phpfiles/phpfile.php assuming you named your file phpfiles.php.

Click here to see what your scraped php result should look like!

경축! 아무것도 안하여 에스천사게임즈가 새로운 모습으로 재오픈 하였습니다.
어린이용이며, 설치가 필요없는 브라우저 게임입니다.
https://s1004games.com

Download the file here

Other things to try -

Now, lets try a couple of things to make sure you have it down.

1. Try and get the data from http://endhousepayments.com

2.Try calling and echoing the page 2 times and see what happens.

Conclusion -

And there you have it we've made the most basic scraping script there is. But now you have the idea of how we get data from the internet to work on in our basic PHP program.

In the next tutorial I'll show you how to take that data and do some basic processing on it.

Next: Basic Data Scraping Using Curl and PHP

 

[출처] http://www.oooff.com/php-scripts/basic-php-scrape-tutorial/basic-php-scraping.php

 

 

 

 

본 웹사이트는 광고를 포함하고 있습니다.
광고 클릭에서 발생하는 수익금은 모두 웹사이트 서버의 유지 및 관리, 그리고 기술 콘텐츠 향상을 위해 쓰여집니다.
번호 제목 글쓴이 날짜 조회 수
37 워드프레스 – 코드플로우(URL에서 페이지까지) 졸리운_곰 2016.07.21 175
36 워드프레스 데이터베이스 들여다보기. file 졸리운_곰 2016.07.21 93
35 워드프레스 웹페이지 구조와 구성요소인 템플릿 파일 이해하기. 졸리운_곰 2016.07.21 374
34 기본적으로 알아야할 워드프레스 파일 구조 및 디렉터리 구조 file 졸리운_곰 2016.07.21 250
33 워드프레스의 기본 구조에 대해 알아보자 file 졸리운_곰 2016.07.21 100
32 php에서 외부 명령어 실행하기 졸리운_곰 2016.05.10 173
31 php함수정리 졸리운_곰 2016.05.10 127
30 10분 안에 PHP 확장 모듈 만들기 file 졸리운_곰 2016.05.10 90
29 How to Call SWI-Prolog from PHP 5 졸리운_곰 2016.05.10 303
28 neural-network by php file 졸리운_곰 2016.03.16 176
27 Learning Library for PHP file 졸리운_곰 2016.03.16 389
26 php 전문가 시스템 php expert system file 졸리운_곰 2016.03.15 92
25 How to Insert JSON Data into MySQL using PHP file 졸리운_곰 2015.12.04 858
24 이클립스(Eclipse) PHP 개발환경 설정. file 졸리운_곰 2015.11.14 239
23 PHP로 만든 달력 file 졸리운_곰 2015.10.27 146
22 라이트 cms 다운로드 ritecms_2.2.1.zip file 졸리운_곰 2015.10.27 50
21 드루팔 다운로드 drupal-7.41.zip file 졸리운_곰 2015.10.27 29
20 도쿠위키 다운로드 dokuwiki-5422200921b.tgz file 졸리운_곰 2015.10.27 65
19 미디어위키 다운로드 mediawiki-1.25.3.tar.gz file 졸리운_곰 2015.10.27 50
18 워드프레스 다운로드 wordpress-4.3.1-ko_KR.zip file 졸리운_곰 2015.10.27 60
대표 김성준 주소 : 경기 용인 분당수지 U타워 등록번호 : 142-07-27414
통신판매업 신고 : 제2012-용인수지-0185호 출판업 신고 : 수지구청 제 123호 개인정보보호최고책임자 : 김성준 sjkim70@stechstar.com
대표전화 : 010-4589-2193 [fax] 02-6280-1294 COPYRIGHT(C) stechstar.com ALL RIGHTS RESERVED