
The latest open source Big Data project to be advanced to top-level status by the Apache Software Foundation (ASF) is Kudu, a “columnar storage engine built for the Apache Hadoop ecosystem designed to enable flexible, high-performance analytic pipelines.” The project reportedly fills in an architectural gap left open by other storage options, providing a missing piece to fill out the columnar storage puzzle.
One of many open source Big Data project championed by Hadoop distributor Cloudera Inc., Kudu provides another storage option for the Hadoop framework to complement the Hadoop Distributed File System (HDFS) and HBase, the company said in debuting the technology last fall before moving it to ASF as an incubating project.
In the ASF scheme of things, projects moved from the incubation stage to top-level status have demonstrated good governance under the organization’s meritocratic process and principles. In being advanced to top-level status, Kudu enters a growing arena. It follows at least one other ASF columnar storage project, Apache Parquet (Cloudera again, with Twitter), which was moved up in April of last year. Another similar offering is Apache ORC, described as “the smallest, fastest columnar storage for Hadoop workloads.” Cloudera earlier this year introduced Apache Arrow, “a fast, interoperable in-memory columnar data structure standard,” in the hope that it becomes a de-facto reference for in-memory processing and interchange.
Along with those projects, Kudu has developed some momentum of its own.
“Under the Apache Incubator, the Kudu community has grown to more than 45 developers and hundreds of users,” said Todd Lipcon, vice president of Apache Kudu and software engineer at Cloudera, in a news release today. “We are excited to be recognized for our strong open source community and are looking forward to our upcoming 1.0 release.”
Earlier this month, Kudu was moved to version 0.9.1, according the Apache Kudu Blog, which posts weekly updates on the status of the project.
In anticipation of that 1.0 release mentioned by Lipcon (no timetable given), developers can get their hands on the source code in the form of a limited-functionality beta or a Kudu Quickstart Virtual Machine. To help with such early explorations of the technology, Kudu Developer Documentation is available on the GitHub source code repository.
Noting that Kudu was designed for “fast analytics on fast (rapidly changing) data,” the project site states, “Kudu provides a combination of fast inserts/updates and efficient columnar scans to enable multiple real-time analytic workloads across a single storage layer. As a new complement to HDFS and Apache HBase, Kudu gives architects the flexibility to address a wider variety of use cases without exotic workarounds.”
Before Kudu, such workarounds were required “when a use case requires the simultaneous availability of capabilities that cannot all be provided by a single tool,” Cloudera’s introductory blog post said. In such cases, “customers are forced to build hybrid architectures that stitch multiple tools together. Customers often choose to ingest and update data in one storage system, but later reorganize this data to optimize for an analytical reporting use-case served from another.”
Kudu, with optimization for fast scanning, is especially useful for tasks such as hosting time-series data (a growing use case with the burgeoning Internet of Things, or IoT) and different kinds of operational data, today’s news release said, noting that it’s already being used in the retail, online service delivery, risk management and digital advertising industries.
Touting a “bring your own SQL” philosophy, Kudu can be accessed from various different query engines, such as the Apache projects Drill, Spark and Impala. The latter is another Cloudera-championed project that can work with Kudu and which will possibly itself move from incubation to top-level status.
“The Internet of Things, cybersecurity and other fast data drivers highlight the demands that real-time analytics place on Big Data platforms,” said Arvind Prabhakar, ASF member and CTO of StreamSets, in today’s announcement. “Apache Kudu fills a key architectural gap by providing an elegant solution spanning both traditional analytics and fast data access. StreamSets provides native support for Apache Kudu to help build real-time ingestion and analytics for our users.”
This article was originally published on adtmag.com and can be viewed in full


Archive
- October 2024(44)
- September 2024(94)
- August 2024(100)
- July 2024(99)
- June 2024(126)
- May 2024(155)
- April 2024(123)
- March 2024(112)
- February 2024(109)
- January 2024(95)
- December 2023(56)
- November 2023(86)
- October 2023(97)
- September 2023(89)
- August 2023(101)
- July 2023(104)
- June 2023(113)
- May 2023(103)
- April 2023(93)
- March 2023(129)
- February 2023(77)
- January 2023(91)
- December 2022(90)
- November 2022(125)
- October 2022(117)
- September 2022(137)
- August 2022(119)
- July 2022(99)
- June 2022(128)
- May 2022(112)
- April 2022(108)
- March 2022(121)
- February 2022(93)
- January 2022(110)
- December 2021(92)
- November 2021(107)
- October 2021(101)
- September 2021(81)
- August 2021(74)
- July 2021(78)
- June 2021(92)
- May 2021(67)
- April 2021(79)
- March 2021(79)
- February 2021(58)
- January 2021(55)
- December 2020(56)
- November 2020(59)
- October 2020(78)
- September 2020(72)
- August 2020(64)
- July 2020(71)
- June 2020(74)
- May 2020(50)
- April 2020(71)
- March 2020(71)
- February 2020(58)
- January 2020(62)
- December 2019(57)
- November 2019(64)
- October 2019(25)
- September 2019(24)
- August 2019(14)
- July 2019(23)
- June 2019(54)
- May 2019(82)
- April 2019(76)
- March 2019(71)
- February 2019(67)
- January 2019(75)
- December 2018(44)
- November 2018(47)
- October 2018(74)
- September 2018(54)
- August 2018(61)
- July 2018(72)
- June 2018(62)
- May 2018(62)
- April 2018(73)
- March 2018(76)
- February 2018(8)
- January 2018(7)
- December 2017(6)
- November 2017(8)
- October 2017(3)
- September 2017(4)
- August 2017(4)
- July 2017(2)
- June 2017(5)
- May 2017(6)
- April 2017(11)
- March 2017(8)
- February 2017(16)
- January 2017(10)
- December 2016(12)
- November 2016(20)
- October 2016(7)
- September 2016(102)
- August 2016(168)
- July 2016(141)
- June 2016(149)
- May 2016(117)
- April 2016(59)
- March 2016(85)
- February 2016(153)
- December 2015(150)