Big Data Hadoop Alternatives: What They Offer and Who Uses Them : 맵리듀스와 하둡의 대안 찾기

Big Data Hadoop Alternatives: What They Offer and Who Uses Them

      Big-Data-Hadoop-Alternatives.png

 

 

 

Many people, particularly those new to the concept of Big Data, think of Big Data and Hadoop as almost one and the same. But there are frameworks other than Hadoop that are gaining popularity. The costs of implementing Hadoop can be quite substantial, and so organizations are exploring other options.

big-data-image.jpg

 

Alternatives to Hadoop for big and unstructured data are emerging.

The two top Hadoop vendors, Hortonworks and Cloudera, aren't exactly suffering from an increase in competition at this point, but more organizations are discovering that Big Data comprises more than the Hadoop ecosystem. Following are some of these Big Data alternatives to Hadoop.

Apache Spark

Apache Spark promises faster speeds than Hadoop MapReduce along with good application programming interfaces. This open source framework runs in-memory on a cluster and is not tied to the Hadoop MapReduce two-stage paradigm, so repeated access to the same data is faster, plus it can read data directly from the Hadoop Distributed File System (HDFS).

It requires a lot of memory, however, because it loads a process into memory and keeps it there unless told otherwise. For iterative computations that pass over the same data multiple times, Spark excels. But with one-pass extract-transform-load (ETL) jobs, MapReduce is still tops. When all data fits in the memory, Spark performs better. It's also easier to program and has an interactive mode. But Hadoop MapReduce still has more security features than Apache Spark.

Cluster Map Reduce

Cluster Map Reduce was developed by Massachusetts-based online ad company Chitika. They had been using HDFS with MapReduce, and then started using a file system called Gluster for its analytical data warehouse. They tried bridging Gluster with MapReduce using existing tools, but found they wanted a more efficient solution. So they built Cluster Map Reduce.

Cluster Map Reduce provides a Hadoop-like framework for MapReduce jobs run in a distributed environment. By simplifying movement of data and minimizing dependencies that can slow data pull, they were able to create something faster. Compared to Hadoop, it also offers:

• More straightforward construction of queries
• Lighter footprint compared to Hadoop
• Greater ability to customize future iterations in Perl or Python (or other languages)
• Resilience to failure in server nodes

경축! 아무것도 안하여 에스천사게임즈가 새로운 모습으로 재오픈 하였습니다.
어린이용이며, 설치가 필요없는 브라우저 게임입니다.
https://s1004games.com

Cluster Map Reduce makes better use of hardware, allowing the same workload to be completed on fewer nodes than Hadoop requires.

big-data-laptop-hadoop.jpg

 

Some Hadoop alternatives move data more efficiently through analytical back-end processes.

High Performance Computing Cluster

A massive parallel-processing platform, High Performance Computing Cluster (HPCC) is open source and incorporates a data refinery cluster called Thor, a query cluster called Roxie, plus middleware components, external communications, and client interfaces. An HPCC environment may include only Thor clusters, or both Thor and Roxie clusters.

Thor functions as a distributed file system with parallel processing spread across nodes. It consumes, transforms, links, and indexes data. Roxie offers separate high-performance online query processing as well as data warehousing capabilities. HPCC uses Enterprise Control Language, a language specifically suited to Big Data manipulation that is compiled and optimized into C++ and is easily extended using C++ libraries.

Hydra

Hydra is a distributed task processing system developed by social bookmarking service AddThis. It's available under an open source Apache license and can tackle some Big Data tasks that Hadoop struggles with. The company needed a scalable distributed system to deliver real-time analysis of data to customers, and Hadoop wasn't an option for AddThis at the time, so they created Hydra.

Hydra supports streaming and batch operations using a tree-based data structure so it can store and process data across clusters that may have thousands of nodes. AddThis engineer Chris Burroughs describes Hydra thus: "It ingests streams of data (think log files) and builds trees that are aggregates, summaries, or transformations of the data. These trees can be used by humans to explore (tiny queries), as part of a machine learning pipeline (big queries), or to support live consoles on websites (lots of queries)." Hydra can use HDFS, but it also operates on native file systems.

Conclusion

For all its tremendous power and benefits, Hadoop does have drawbacks. How it moves data is complex, and it's not always the most efficient execution with Big Data and unstructured data processing. The automatic association between Big Data and Hadoop is becoming looser as more alternatives to Hadoop are developed. Some have speed advantages, while others allow streaming processing or make more efficient use of hardware. Hadoop alternatives are emerging, and those who deal with Big Data or unstructured data are wise to scope them out when considering their own needs.

 

 

 

[출처] https://datafloq.com/read/Big-Data-Hadoop-Alternatives/1135

본 웹사이트는 광고를 포함하고 있습니다.
광고 클릭에서 발생하는 수익금은 모두 웹사이트 서버의 유지 및 관리, 그리고 기술 콘텐츠 향상을 위해 쓰여집니다.
번호 제목 글쓴이 날짜 조회 수
1195 [ 一日30分 인생승리의 학습법] VBA Web Scraping: How Can VBA Be Used To Scrape Website Data? file 졸리운_곰 2024.04.13 3
1194 [ 一日30分 인생승리의 학습법] 윈도우 실행파일 구조(PE파일) file 졸리운_곰 2024.03.31 3
1193 [ 一日30分 인생승리의 학습법] [Analysis] PE(Portable Executable) 파일 포맷 공부 file 졸리운_곰 2024.03.31 3
1192 [ 一日30分 인생승리의 학습법] 성공하는 메타버스의 3가지 조건 file 졸리운_곰 2024.03.30 7
1191 [ 一日30分 인생승리의 학습법] REST, REST API, RESTful 과 HATEOAS file 졸리운_곰 2024.03.10 9
1190 [ 一日30分 인생승리의 학습법] 렌더링 삼형제 CSR, SSR, SSG 이해하기 file 졸리운_곰 2024.03.10 2
1189 [ 一日30分 인생승리의 학습법] 엑셀 VBA에서 셀레니움 사용을 위한 Selenium Basic 설치 file 졸리운_곰 2024.02.23 11
1188 [ 一日30分 인생승리의 학습법]500 Lines or Less Blockcode: A Visual Programming Toolkit : 500줄 이하의 블록코드: 시각적 프로그래밍 툴킷 졸리운_곰 2024.02.12 4
1187 [ 一日30分 인생승리의 학습법] 구글 클라이언트(앱) 아이디를 발급받으려면 어떻게 해야 하나요? 졸리운_곰 2024.01.28 3
1186 [ 一日30分 인생승리의 학습법] 빅뱅 프로젝트를 성공적으로 오픈하기 위한 팁 졸리운_곰 2023.12.27 16
1185 [ 一日30分 인생승리의 학습법]“빅뱅 전환보다 단계적 전환 방식이 이상적 애자일팀과 협업 쉽게 체질 개선을” file 졸리운_곰 2023.12.27 12
1184 [ 一日30分 인생승리의 학습법] Big-bang / phased 접근 file 졸리운_곰 2023.12.27 3
1183 [ 一日30分 인생승리의 학습법] CodeDragon 메뉴 데이터 전환의 개념 이해 - 데이터 전환의 개념, 데이터 전환방식, 데이터 전환방식 및 장단점 비교, 데이터전환 이후 검토해야 할 사항 졸리운_곰 2023.12.27 5
1182 [ 一日30分 인생승리의 학습법] 블록체인과 IPFS를 이용한 안전한 데이터 공유 플랫폼 - 분쟁 해결 시스템 file 졸리운_곰 2023.12.27 6
1181 [ 一日30分 인생승리의 학습법] 블록체인과 IPFS를 이용한 안전한 데이터 공유 플랫폼 - 개념과 리뷰 시스템 file 졸리운_곰 2023.12.27 4
1180 [ 一日30分 인생승리의 학습법] 소켓 CLOSE_WAIT 발생 현상 및 처리 방안 file 졸리운_곰 2023.12.03 7
1179 [ 一日30分 인생승리의 학습법] robots 설정하기 졸리운_곰 2023.12.03 3
1178 [ 一日30分 인생승리의 학습법] A Tutorial and Elementary Trajectory Model for the Differential Steering System of Robot Wheel Actuators : 로봇 휠 액츄에이터의 차동 조향 시스템에 대한 튜토리얼 및 기본 궤적 모델 file 졸리운_곰 2023.11.29 6
1177 [ 一日30分 인생승리의 학습법] Streamline Your MLOps Journey with CodeProject.AI Server : CodeProject.AI 서버로 MLOps 여정을 간소화하세요 file 졸리운_곰 2023.11.25 2
1176 [ 一日30分 인생승리의 학습법] Comparing Self-Hosted AI Servers: A Guide for Developers / : 자체 호스팅 AI 서버 비교: 개발자를 위한 가이드 file 졸리운_곰 2023.11.25 10
대표 김성준 주소 : 경기 용인 분당수지 U타워 등록번호 : 142-07-27414
통신판매업 신고 : 제2012-용인수지-0185호 출판업 신고 : 수지구청 제 123호 개인정보보호최고책임자 : 김성준 sjkim70@stechstar.com
대표전화 : 010-4589-2193 [fax] 02-6280-1294 COPYRIGHT(C) stechstar.com ALL RIGHTS RESERVED