Top 50 open source web crawlers for data mining
2016.08.10 12:07
Top 50 open source web crawlers for data mining
web crawler (also known in other terms like ants, automatic indexers, bots, web spiders, web robots or web scutters) is an automated program, or script, that methodically scans or “crawls” through web pages to create an index of the data it is set to look for. This process is called Web crawling or spidering.
There are various uses for web crawlers, but essentially a web crawler is used to collect/mine data from the Internet. Most search engines use it as a means of providing up-to-date data and to find what’s new on the Internet. Analytics companies and market researchers use web crawlers to determine customer and market trends in a given geography. In this article, we present top 50 open source web crawlers available on the web for data mining.
Name | Language | Platform |
Heritrix | Java | Linux |
Nutch | Java | Cross-platform |
Scrapy | Python | Cross-platform |
DataparkSearch | C++ | Cross-platform |
GNU Wget | C | Linux |
GRUB | C#, C, Python, Perl | Cross-platform |
ht://Dig | C++ | Unix |
HTTrack | C/C++ | Cross-platform |
ICDL Crawler | C++ | Cross-platform |
mnoGoSearch | C | Windows |
Norconex HTTP Collector | Java | Cross-platform |
Open Source Server | C/C++, Java PHP | Cross-platform |
PHP-Crawler | PHP | Cross-platform |
YaCy | Java | Cross-platform |
WebSPHINX | Java | Cross-platform |
WebLech | Java | Cross-platform |
Arale | Java | Cross-platform |
JSpider | Java | Cross-platform |
HyperSpider | Java | Cross-platform |
Arachnid | Java | Cross-platform |
Spindle | Java | Cross-platform |
Spider | Java | Cross-platform |
LARM | Java | Cross-platform |
Metis | Java | Cross-platform |
SimpleSpider | Java | Cross-platform |
Grunk | Java | Cross-platform |
CAPEK | Java | Cross-platform |
Aperture | Java | Cross-platform |
Smart and Simple Web Crawler | Java | Cross-platform |
Web Harvest | Java | Cross-platform |
Aspseek | C++ | Linux |
Bixo | Java | Cross-platform |
crawler4j | Java | Cross-platform |
Ebot | Erland | Linux |
Hounder | Java | Cross-platform |
Hyper Estraier | C/C++ | Cross-platform |
OpenWebSpider | C#, PHP | Cross-platform |
Pavuk | C | Lunix |
Sphider | PHP | Cross-platform |
Xapian | C++ | Cross-platform |
Arachnode.net | C# | Windows |
Crawwwler | C++ | Java |
Distributed Web Crawler | C, Java, Python | Cross-platform |
iCrawler | Java | Cross-platform |
pycreep | Java | Cross-platform |
Opese | C++ | Linux |
Andjing | Java | |
Ccrawler | C# | Windows |
WebEater | Java | Cross-platform |
JoBo | Java | Cross-platform |
광고 클릭에서 발생하는 수익금은 모두 웹사이트 서버의 유지 및 관리, 그리고 기술 콘텐츠 향상을 위해 쓰여집니다.
Have something to say? Share it in the comments.
[출처] http://bigdata-madesimple.com/top-50-open-source-web-crawlers-for-data-mining/
경축! 아무것도 안하여 에스천사게임즈가 새로운 모습으로 재오픈 하였습니다.
어린이용이며, 설치가 필요없는 브라우저 게임입니다.
https://s1004games.com