Hadoop’s progression from a large scale, batch oriented analytics tool to an ecosystem full of vendors, applications, tools and services has coincided with the rise of the big data market.
While Hadoop has become almost synonymous with the market in which it operates, it is not the only option. Hadoop is well suited to very large scale data analysis, which is one of the reasons why companies such as Barclays, Facebook, eBay and more are using it.
Although it has found success, Hadoop has had its critics as something that isn’t well suited to the smaller jobs and is overly complex.
Here are the five Hadoop alternatives that may better suit your business needs
- Pachyderm
Pachyderm, put simply, is designed to let users store and analyse data using containers.
The company has built an open source platform to use containers for running big data analytics processing jobs. One of the benefits of using this is that users don’t have to know anything about how MapReduce works, nor do they have to write any lines of Java, which is what Hadoop is mostly written in.
Pachyderm hopes that this makes itself much more accessible and easy to use than Hadoop and thus will have greater appeal to developers.
With containers growing significantly in popularity of the past couple of years, Pachyderm is in a good position to capitalise on the increased interest in the area.
The software is available on GitHub with users just having to implement an http server that fits inside a Docker container. The company says that: “if you can fit it in a Docker container, Pachyderm will distribute it over petabytes of data for you.”
- Apache Spark
What can be said about Apache Spark that hasn’t been said already? The general compute engine for typically Hadoop data, is increasingly being looked at as the future of Hadoop given its popularity, the increased speed, and support for a wide range of applications that it offers.
However, while it may be typically associated with Hadoop implementations, it can be used with a number of different data stores and does not have to rely on Hadoop. It can for example use Apache Cassandra and Amazon S3.
Spark is even capable of having no dependence on Hadoop at all, running as an independent analytics tool.
Spark’s flexibility is what has helped make it one of the hottest topics in the world of big data and with companies like IBM aligning its analytics around it, the future is looking bright.
- Google BigQuery
Google seemingly has its fingers in every pie and as the inspiration for the creation of Hadoop, it is no surprise that the company has an effective alternative.
The fully-managed platform for large-scale analytics allows users to work with SQL and not have to worry about managing the infrastructure or database.
The RESTful web service is designed to enable interactive analysis of huge datasets working on conjunction with Google storage.
Users may be wary that it is cloud-based which could lead to latency issues when dealing with the large amounts of data, but given Google’s omnipresence it is unlikely that data will ever have to travel far, meaning that latency shouldn’t be a big issue.
Some key benefits include its ability to work with MapReduce and Google’s proactive approach to adding new features and generally improving the offering.
- Presto
Presto, an open source distributed SQL query engine that is designed for running interactive analytic queries against data of all sizes, was created by Facebook in 2012 as it looked for an interactive system that is optimised for low query latency.
Presto is capable of concurrently using a number of data stores, something that neither Spark nor Hadoop can do. This is possible through connectors that provide interfaces for metadata, data locations, and data access.
The benefit of this is that users don’t have to move data around from place to place in order to analyse it.
Like Spark, Presto is capable of offering real-time analytics, something that is in increasing demand from enterprises.
- Hydra
Developed by the social bookmarking service AddThis, which was recently acquired by Oracle, Hydra is a distributed task processing system that is available under the Apache license.
It is capable of delivering real-time analytics to its users and was developed due to a need for a scalable and distributed system.
Having decided that Hadoop wasn’t a viable option at the time, AddThis created Hydra in order to handle both streaming and batch operations through its tree-based structure.
This article was originally published on www.bigdataanalyticsnews.com and can be viewed in full


Archive
- October 2024(44)
- September 2024(94)
- August 2024(100)
- July 2024(99)
- June 2024(126)
- May 2024(155)
- April 2024(123)
- March 2024(112)
- February 2024(109)
- January 2024(95)
- December 2023(56)
- November 2023(86)
- October 2023(97)
- September 2023(89)
- August 2023(101)
- July 2023(104)
- June 2023(113)
- May 2023(103)
- April 2023(93)
- March 2023(129)
- February 2023(77)
- January 2023(91)
- December 2022(90)
- November 2022(125)
- October 2022(117)
- September 2022(137)
- August 2022(119)
- July 2022(99)
- June 2022(128)
- May 2022(112)
- April 2022(108)
- March 2022(121)
- February 2022(93)
- January 2022(110)
- December 2021(92)
- November 2021(107)
- October 2021(101)
- September 2021(81)
- August 2021(74)
- July 2021(78)
- June 2021(92)
- May 2021(67)
- April 2021(79)
- March 2021(79)
- February 2021(58)
- January 2021(55)
- December 2020(56)
- November 2020(59)
- October 2020(78)
- September 2020(72)
- August 2020(64)
- July 2020(71)
- June 2020(74)
- May 2020(50)
- April 2020(71)
- March 2020(71)
- February 2020(58)
- January 2020(62)
- December 2019(57)
- November 2019(64)
- October 2019(25)
- September 2019(24)
- August 2019(14)
- July 2019(23)
- June 2019(54)
- May 2019(82)
- April 2019(76)
- March 2019(71)
- February 2019(67)
- January 2019(75)
- December 2018(44)
- November 2018(47)
- October 2018(74)
- September 2018(54)
- August 2018(61)
- July 2018(72)
- June 2018(62)
- May 2018(62)
- April 2018(73)
- March 2018(76)
- February 2018(8)
- January 2018(7)
- December 2017(6)
- November 2017(8)
- October 2017(3)
- September 2017(4)
- August 2017(4)
- July 2017(2)
- June 2017(5)
- May 2017(6)
- April 2017(11)
- March 2017(8)
- February 2017(16)
- January 2017(10)
- December 2016(12)
- November 2016(20)
- October 2016(7)
- September 2016(102)
- August 2016(168)
- July 2016(141)
- June 2016(149)
- May 2016(117)
- April 2016(59)
- March 2016(85)
- February 2016(153)
- December 2015(150)