Big Data Hadoop Alternatives: What They Offer and Who Uses Them : 맵리듀스와 하둡의 대안 찾기

Big Data Hadoop Alternatives: What They Offer and Who Uses Them

      Big-Data-Hadoop-Alternatives.png

 

 

 

Many people, particularly those new to the concept of Big Data, think of Big Data and Hadoop as almost one and the same. But there are frameworks other than Hadoop that are gaining popularity. The costs of implementing Hadoop can be quite substantial, and so organizations are exploring other options.

big-data-image.jpg

 

Alternatives to Hadoop for big and unstructured data are emerging.

The two top Hadoop vendors, Hortonworks and Cloudera, aren't exactly suffering from an increase in competition at this point, but more organizations are discovering that Big Data comprises more than the Hadoop ecosystem. Following are some of these Big Data alternatives to Hadoop.

Apache Spark

Apache Spark promises faster speeds than Hadoop MapReduce along with good application programming interfaces. This open source framework runs in-memory on a cluster and is not tied to the Hadoop MapReduce two-stage paradigm, so repeated access to the same data is faster, plus it can read data directly from the Hadoop Distributed File System (HDFS).

It requires a lot of memory, however, because it loads a process into memory and keeps it there unless told otherwise. For iterative computations that pass over the same data multiple times, Spark excels. But with one-pass extract-transform-load (ETL) jobs, MapReduce is still tops. When all data fits in the memory, Spark performs better. It's also easier to program and has an interactive mode. But Hadoop MapReduce still has more security features than Apache Spark.

Cluster Map Reduce

Cluster Map Reduce was developed by Massachusetts-based online ad company Chitika. They had been using HDFS with MapReduce, and then started using a file system called Gluster for its analytical data warehouse. They tried bridging Gluster with MapReduce using existing tools, but found they wanted a more efficient solution. So they built Cluster Map Reduce.

Cluster Map Reduce provides a Hadoop-like framework for MapReduce jobs run in a distributed environment. By simplifying movement of data and minimizing dependencies that can slow data pull, they were able to create something faster. Compared to Hadoop, it also offers:

• More straightforward construction of queries
• Lighter footprint compared to Hadoop
• Greater ability to customize future iterations in Perl or Python (or other languages)
• Resilience to failure in server nodes

경축! 아무것도 안하여 에스천사게임즈가 새로운 모습으로 재오픈 하였습니다.
어린이용이며, 설치가 필요없는 브라우저 게임입니다.
https://s1004games.com

Cluster Map Reduce makes better use of hardware, allowing the same workload to be completed on fewer nodes than Hadoop requires.

big-data-laptop-hadoop.jpg

 

Some Hadoop alternatives move data more efficiently through analytical back-end processes.

High Performance Computing Cluster

A massive parallel-processing platform, High Performance Computing Cluster (HPCC) is open source and incorporates a data refinery cluster called Thor, a query cluster called Roxie, plus middleware components, external communications, and client interfaces. An HPCC environment may include only Thor clusters, or both Thor and Roxie clusters.

Thor functions as a distributed file system with parallel processing spread across nodes. It consumes, transforms, links, and indexes data. Roxie offers separate high-performance online query processing as well as data warehousing capabilities. HPCC uses Enterprise Control Language, a language specifically suited to Big Data manipulation that is compiled and optimized into C++ and is easily extended using C++ libraries.

Hydra

Hydra is a distributed task processing system developed by social bookmarking service AddThis. It's available under an open source Apache license and can tackle some Big Data tasks that Hadoop struggles with. The company needed a scalable distributed system to deliver real-time analysis of data to customers, and Hadoop wasn't an option for AddThis at the time, so they created Hydra.

Hydra supports streaming and batch operations using a tree-based data structure so it can store and process data across clusters that may have thousands of nodes. AddThis engineer Chris Burroughs describes Hydra thus: "It ingests streams of data (think log files) and builds trees that are aggregates, summaries, or transformations of the data. These trees can be used by humans to explore (tiny queries), as part of a machine learning pipeline (big queries), or to support live consoles on websites (lots of queries)." Hydra can use HDFS, but it also operates on native file systems.

Conclusion

For all its tremendous power and benefits, Hadoop does have drawbacks. How it moves data is complex, and it's not always the most efficient execution with Big Data and unstructured data processing. The automatic association between Big Data and Hadoop is becoming looser as more alternatives to Hadoop are developed. Some have speed advantages, while others allow streaming processing or make more efficient use of hardware. Hadoop alternatives are emerging, and those who deal with Big Data or unstructured data are wise to scope them out when considering their own needs.

 

 

 

[출처] https://datafloq.com/read/Big-Data-Hadoop-Alternatives/1135

본 웹사이트는 광고를 포함하고 있습니다.
광고 클릭에서 발생하는 수익금은 모두 웹사이트 서버의 유지 및 관리, 그리고 기술 콘텐츠 향상을 위해 쓰여집니다.
번호 제목 글쓴이 날짜 조회 수
1220 [一日30分 인생승리의 학습법] Qiskit 시작하기 (Getting Started with Qiskit) file 졸리운_곰 2025.06.03 16
1219 [一日30分 인생승리의 학습법] 양자컴퓨팅 프로그래밍 file 졸리운_곰 2025.06.03 12
1218 [一日30分 인생승리의 학습법] [Git] 다중 리모트를 사용하여 여러 Git 연동하기(Gitea, GitHub) file 졸리운_곰 2025.05.25 7
1217 [一日30分 인생승리의 학습법] [GitHub][terminal] 비밀번호 인증 에러를 토큰으로 해결하고 로그인 하기 file 졸리운_곰 2025.05.24 20
1216 [一日30分 인생승리의 학습법] [알아봅시다] 블록체인 게임들의 가능성과 미래 file 졸리운_곰 2025.04.08 29
1215 이 어지러운시대의 극복법 만화보기 file unmask 2025.04.08 55
1214 [ 一日30分 인생승리의 학습법] IT 국비교육, 쓰레기 속에서 그나마 덜 쓰레기인 곳 찾는 팁 file 졸리운_곰 2025.03.08 22
1213 [ 一日30分 인생승리의 학습법] 소프트웨어 개발하다보면 "connection reset" 등, 소프트웨어 버그 적인 문제가아닌 하드웨어나 네트워크 오류 메시지의 예 file 졸리운_곰 2025.03.01 22
1212 [ 一日30分 인생승리의 학습법] 기술부채(Technical Debt)는 소프트웨어 개발이나 프로젝트 과정에서, 약속된 것과 실제로 제공된 것 사이에 차이가 발생하는 것을 의미합니다. file 졸리운_곰 2025.01.23 32
1211 [ 一日30分 인생승리의 학습법] 고가용성(High Availability) 시스템을 위한 5가지 전략 file 졸리운_곰 2024.12.28 34
1210 [ 一日30分 인생승리의 학습법] 켈리 공식을 간단히 투자해 적용해 보자 - 켈리 크라이티리언과 확률적 사고의 중요성 file 졸리운_곰 2024.12.26 36
1209 [ 一日30分 인생승리의 학습법] [markdown] mermaid를 이용해서 UML 그리기 - 플로우차트 file 졸리운_곰 2024.12.01 50
1208 [ 一日30分 인생승리의 학습법] Mermaid.js 정리???????? file 졸리운_곰 2024.12.01 69
1207 [ 一日30分 인생승리의 학습법] Mermaid를 이용한 시퀀스 다이어그램 file 졸리운_곰 2024.12.01 34
1206 [ 一日30分 인생승리의 학습법] Mermaid - 코드로 순서도(flowchart) 그리기 file 졸리운_곰 2024.12.01 30
1205 [ 一日30分 인생승리의 학습법] 유니코드 그래픽 기호(심벌) Huge List of Unicode Symbols 졸리운_곰 2024.07.31 48
1204 [ 一日30分 인생승리의 학습법] PocketBase Attempt to simplify the serve command for prod : 포켓베이스 프로덕션 포트 도메인 네임 설정 졸리운_곰 2024.06.10 73
1203 [ 一日30分 인생승리의 학습법] google spreadsheet app script 로 코인 현황 : 거래소 API 접근할 때 알아두면 좋은 함수 file 졸리운_곰 2024.06.08 62
1202 [ 一日30分 인생승리의 학습법] 매크로 프로그램 정리 졸리운_곰 2024.06.08 90
1201 [ 一日30分 인생승리의 학습법] 스마트스토어 vs 아임웹 vs 카페24 file 졸리운_곰 2024.05.16 79
대표 김성준 주소 : 경기 용인 분당수지 U타워 등록번호 : 142-07-27414
통신판매업 신고 : 제2012-용인수지-0185호 출판업 신고 : 수지구청 제 123호 개인정보보호최고책임자 : 김성준 sjkim70@stechstar.com
대표전화 : 010-4589-2193 [fax] 02-6280-1294 COPYRIGHT(C) stechstar.com ALL RIGHTS RESERVED