Basic PHP Web Scraping Script Tutorial

Now that we have a basic idea on what web scraping is, let's get into some very simple scripts to do it. As mentioned before I'm going to assuming you have php and curl enabled on your server or desktop. So assuming you have those things installed lets get into the basics of php web scraping. I'll be posting a small program and then walking through what we have done.

 

Lets start with the most basic of scraping scripts

Whole script -

So this is exactlly what the code inside our .php file is going to look like on the server. Of course minus the line numbers.

1. <?php
2. $url = 'http://www.oooff.com';
3. $output = file_get_contents($url);
4. echo $output;
5. ?>

Script Explanation -

So what have we done here?

I'm not going to cover line 1 and 5 as we already know they just let Apache know that the code within them is to be processed as PHP.

Line 2.
$url = 'http://www.oooff.com';
As you know in PHP the $ symbol declares a variable or "holder of things". So in this we're assigning the root url to Oooff.com to a variable so that we don't have to use the whole url string each time. We can just use $url in it's place.

Line 3.
3. $output = file_get_contents($url);
Here again we're declaring the variable $output on the fly and and then calling one of the internal functions to PHP call file_get_contents(). This function is going to go to a url or file and pull all the data that's held in it. For example if you had a file called 'domains.txt' in the same directory as this script is held we'd do something like this $output = file_get_contents('domains.txt'); this would pull all the data from that file and load it into the variable $output to be used in your script. So we can do the same thing with domains, so after this line we'll have all the HTML from the homepage of Ooof.com held in the variable $output.

Line 4.
echo $output;
Very this is just going to print whatever is held in our variable to the screen. Pretty basic.

Trying things out -

ok so now that we have a solid understanding of what these lines of code are doing lets copy or type these into a file on your server where you run PHP files. Once you've created your file navigate on your browser to wherever that file is located. So if you have the file located in a directory/folder called phpfiles in the root directory on your local machine. You'd go to http://localhost/phpfiles/phpfile.php assuming you named your file phpfiles.php.

Click here to see what your scraped php result should look like!

경축! 아무것도 안하여 에스천사게임즈가 새로운 모습으로 재오픈 하였습니다.
어린이용이며, 설치가 필요없는 브라우저 게임입니다.
https://s1004games.com

Download the file here

Other things to try -

Now, lets try a couple of things to make sure you have it down.

1. Try and get the data from http://endhousepayments.com

2.Try calling and echoing the page 2 times and see what happens.

Conclusion -

And there you have it we've made the most basic scraping script there is. But now you have the idea of how we get data from the internet to work on in our basic PHP program.

In the next tutorial I'll show you how to take that data and do some basic processing on it.

Next: Basic Data Scraping Using Curl and PHP

 

[출처] http://www.oooff.com/php-scripts/basic-php-scrape-tutorial/basic-php-scraping.php

 

 

 

 

본 웹사이트는 광고를 포함하고 있습니다.
광고 클릭에서 발생하는 수익금은 모두 웹사이트 서버의 유지 및 관리, 그리고 기술 콘텐츠 향상을 위해 쓰여집니다.
번호 제목 글쓴이 날짜 조회 수
28 neural-network by php file 졸리운_곰 2016.03.16 152
27 Learning Library for PHP file 졸리운_곰 2016.03.16 371
26 php 전문가 시스템 php expert system file 졸리운_곰 2016.03.15 56
25 How to Insert JSON Data into MySQL using PHP file 졸리운_곰 2015.12.04 817
24 이클립스(Eclipse) PHP 개발환경 설정. file 졸리운_곰 2015.11.14 216
23 PHP로 만든 달력 file 졸리운_곰 2015.10.27 117
22 라이트 cms 다운로드 ritecms_2.2.1.zip file 졸리운_곰 2015.10.27 24
21 드루팔 다운로드 drupal-7.41.zip file 졸리운_곰 2015.10.27 11
20 도쿠위키 다운로드 dokuwiki-5422200921b.tgz file 졸리운_곰 2015.10.27 47
19 미디어위키 다운로드 mediawiki-1.25.3.tar.gz file 졸리운_곰 2015.10.27 34
18 워드프레스 다운로드 wordpress-4.3.1-ko_KR.zip file 졸리운_곰 2015.10.27 34
17 제로보드 다운로드 XE Core ver. 1.8.13 xe.zip file 졸리운_곰 2015.10.27 30
16 XE 스킨 제작 매뉴얼 v1.1 XE-Skin_Manual-ko(v1.1).pdf file 졸리운_곰 2015.10.26 21
15 제로보드 XE 개발자 가이드 file 졸리운_곰 2015.10.26 23
» php로 웹 수집 : Basic PHP Web Scraping Script Tutorial 졸리운_곰 2015.09.20 64
13 PHP 와 MYSQL 연동 졸리운_곰 2015.08.11 585
12 PHP 기반의 Micro Frameworks 정리 졸리운_곰 2015.05.15 276
11 PHP의 composer 란 무엇인가?, PHP 의존성 관리도구 졸리운_곰 2015.05.15 372
10 슬림(Slim): 마이크로 프레임워크 [모바일 restful 서버] 졸리운_곰 2015.05.15 243
9 15 very useful PHP code snippets for PHP developers 졸리운_곰 2015.05.12 246
대표 김성준 주소 : 경기 용인 분당수지 U타워 등록번호 : 142-07-27414
통신판매업 신고 : 제2012-용인수지-0185호 출판업 신고 : 수지구청 제 123호 개인정보보호최고책임자 : 김성준 sjkim70@stechstar.com
대표전화 : 010-4589-2193 [fax] 02-6280-1294 COPYRIGHT(C) stechstar.com ALL RIGHTS RESERVED