
There is no workload in the datacenter that can’t, in theory and in practice, be supplied as a service from a public cloud. Big Data as a Service, or BDaaS for short, is an emerging category of services that delivers data processing for analytics in the cloud and it is getting a lot of buzz these days – and for good reason. These BDaaS products vary in features, functions, and target use cases, but all address the same basic problem: Big data and data warehousing in the cloud is deceptively challenging and customers want to abstract away the complexity.
Data analytics in the cloud is especially tough for companies with extensive datacenter investments hoping to create a hybrid architecture. Enterprises rarely ponder about wholesale migration of their IT infrastructure to the cloud, regardless of what Amazon Web Services would have you believe. It is just not feasible for most existing companies. Instead, a common goal is to create hybrid architectures, which leverage the on-premises systems and processes that are working well, but augment infrastructure with new cloud resources and analytic capabilities.
The business drivers for hybrid data architectures include better enabling data science, freeing up capacity on costly data warehouses, adding new data lakes or data pipelines, sharing and monetizing data, or collecting new data sources, especially high-volume cloud sources like social, mobile, or sensor data. But, it turns out that hybrid big data architectures are much easier on PowerPoint slides than in the real world. Even new cloud databases or data warehousing as a service resources (for example, Amazon Redshift and its equivalents) have been challenging for many enterprises to integrate with on-premises infrastructures – let alone more complex big data technologies.
While the components for cloud data processing are readily available, many companies lack the time and skills needed for integration, implementation and operations. According to Gartner analyst Adam Ronthal, “The ‘some assembly required’ approach for effectively integrating a range of data management and analytics-related services in large cloud service provider (CSP) ecosystems can be daunting to new cloud adopters.”
The irony is that the cloud promises the biggest impact for older enterprises that are often the most challenged by integration, architecture and lack of cloud skills. Newer companies had the advantage of architecting and staffing for the cloud, or at least with cloud capabilities solidly in mind. And, there is the well-covered “consumerization” trend that leaves workers wondering why it is so hard for their enterprise employers to deliver them the type of integrated cloud capabilities that they get from Apple. It is a frustrating situation for business and IT leaders alike, articulated colorfully by one CIO:
“It took us about eight months to create a new data warehousing environment in the cloud. We hired someone, sent a few people to training and brought in consultants. Now it is live, but it is like accessing a space station. Getting data up there and using it requires a significant effort – a major mission every time. So, cloud is not part of our normal processes and it is not saving us money yet. We’re barely using it.”
This is a common conversation in data circles, which is why several vendors have developed BDaaS in hopes of addressing these challenges.
What BDaaS Is And Is Not
BDaaS offerings vary greatly today. As the category matures, there will likely be more consistency in service functions, but for now, many analysts are painting it in broad brushstrokes. According to Gartner, “Vendors are combining components of analytic platforms in the cloud with multiple processing engines, hybrid on-premises integration, and secure data movement.” And Forrester reports “Big data as a service technology provides capture management and operations capability delivered as-a-service in the public or hybrid cloud. Uses generally include SQL analytics (data warehouse or data mart), data lake, machine learning, and operational analytics application support.” (From Big Data Tech Radar, Q12016.)
There is general agreement on a few key requirements: BDaaS services are always in the cloud. They provide data processing and analytic execution, using data processing technologies such as massively parallel SQL, Hadoop, or Spark. And, BDaaS vendors provide cloud operations and maintenance. But BDaaS may look very different across vendors and picking a supplier requires careful evaluation.
Ultimately, BDaaS is about enabling analytics. Some services are targeted to the data scientist, some more to the data engineer or data warehouse professional supporting business intelligence or analytics programs. BDaaS may simply replace an existing data warehouse or data mart, and analysts may not even know (or need to know or care) that the underlying platform has changed.
BDaaS is new enough that there are some common misconceptions related to the moniker. BDaaS is not the same as “data as a service” and vendors generally do not sell datasets. Another area of confusion is the availability of canned analytics or reports, especially in industries like retail, which are more accustomed to analytics outsourcing. Most BDaaS providers are focused on the processing platform, and do not prescribe which analytics to run or what questions to ask. To put it simply, a company’s analysts or strategic partners are still coming up with the questions and queries, and BDaaS makes it easy for them to get results quickly.
BDaaS Technical Characteristics
A technical review of BDaaS offerings yields more similarities and differences.
All are cloud-based, though with variations. Some leverage the public cloud infrastructure of Microsoft Azure, Amazon Web Services, Google Cloud Platform, or others. Others run in the BDaaS providers’ own clouds. Some are single tenant, running on dedicated servers or only sharing physical infrastructure. Many services with the BDaaS label are multi-tenant, where several customers share server infrastructure, a model that may reduce costs, but increases security and compliance concerns for regulated industries. Some BDaaS vendors support multiple public cloud platforms; some allow companies to move workloads between different clouds or on-premises platforms.
The core function of BDaaS is data processing and analytic execution. Some BDaaS providers are (or were) also labeled “Hadoop as a Service” or “Spark as a Service,” with automation that makes adopting those new technologies easier. But BDaaS is definitely not all about Hadoop.
More BDaaS vendors are offering multiple processing engines now, such as Hadoop, Spark, massively parallel SQL, or others. This gives enterprises more choices for matching the data technology to the workload. For example, some big data use cases call for handling large volumes of data with batch processing, which can be handled well via Hadoop or Spark. Other cases, such as ad hoc business intelligence, require less storage, but more compute power, and are better delivered by massively parallel SQL processing.
In all cases, exactly how the technology is provisioned and configured has a huge impact on cost. Provisioning and optimization is very challenging due to frequent changes and rapid innovation in both the cloud and data technology markets. Some BDaaS providers make it easy to move datasets between different engines; others require building your own integrations. Some BDaaS vendors have their own analytics interfaces; others support industry-standard visualization tools (Tableau, Spotfire, and so forth) or analytic languages like R and Python. BDaaS vendors have different approaches, which should be carefully evaluated.
The key thing about any cloud-based service is that it reduces operation and support costs, even if the infrastructure can be more expensive (by some measures) than on premises gear. Perhaps the most obvious, yet under-appreciated, characteristic of BDaaS providers is their expertise in cloud operations, security, maintenance and upgrades. BDaaS enables incredible economies of scale, meaning a highly-skilled BDaaS team can handle cloud operations, so you don’t have to. That often includes patching, upgrades, operating system updates and the like. Some enterprises may also require specialized encryption, security, audit, or compliance controls from their BDaaS provider. While those operational activities may sound like no big deal, they are also the ones most often requiring troubleshooting, special skills and late nights, particularly for security issues. These features are, in essence, what makes any software enterprise-grade.
These production and operations requirements have caught many companies by surprise, particularly on Hadoop-related projects. While learning and installing a new technology like Hadoop or Spark might be a fun side project, troubleshooting a cloud operating system upgrade during a rapidly closing maintenance window is not so glamorous. BDaaS providers have very different enterprise SLAs, monitoring, and management capabilities, so enterprises must closely review what will meet their requirements.
BDaaS has been transformational for many adopters. It is fast to implement, which means that IT can use BDaaS for quick wins, like delivering new analytic capabilities, freeing data warehouse capacity or implementing a new data lake in days, with no special skills required. For data scientists and analysts, BDaaS instantly enables cloud-scale compute and storage, along with expanded analytic capabilities.
In the long term, BDaaS radically simplifies data access for all enterprise workers, partners and customers – and enables faster integration and use of new data. BDaaS also future-proofs companies from the constant change inherent to the data and cloud markets, by providing regular upgrades and enhancements to leverage new technologies and best practices. Compared with on-premises or DIY cloud projects, BDaaS is significantly cheaper and has low maintenance overhead, thanks to its “as a service” delivery model.
That’s all why BDaaS is quickly gaining traction in enterprises. There are a variety of services with the label, but as you dig in, you will find differences in architecture, capabilities, and costs that will make it easy to shortlist the best options.
This article was originally published on www.nextplatform.com can be viewed in full


Archive
- October 2024(44)
- September 2024(94)
- August 2024(100)
- July 2024(99)
- June 2024(126)
- May 2024(155)
- April 2024(123)
- March 2024(112)
- February 2024(109)
- January 2024(95)
- December 2023(56)
- November 2023(86)
- October 2023(97)
- September 2023(89)
- August 2023(101)
- July 2023(104)
- June 2023(113)
- May 2023(103)
- April 2023(93)
- March 2023(129)
- February 2023(77)
- January 2023(91)
- December 2022(90)
- November 2022(125)
- October 2022(117)
- September 2022(137)
- August 2022(119)
- July 2022(99)
- June 2022(128)
- May 2022(112)
- April 2022(108)
- March 2022(121)
- February 2022(93)
- January 2022(110)
- December 2021(92)
- November 2021(107)
- October 2021(101)
- September 2021(81)
- August 2021(74)
- July 2021(78)
- June 2021(92)
- May 2021(67)
- April 2021(79)
- March 2021(79)
- February 2021(58)
- January 2021(55)
- December 2020(56)
- November 2020(59)
- October 2020(78)
- September 2020(72)
- August 2020(64)
- July 2020(71)
- June 2020(74)
- May 2020(50)
- April 2020(71)
- March 2020(71)
- February 2020(58)
- January 2020(62)
- December 2019(57)
- November 2019(64)
- October 2019(25)
- September 2019(24)
- August 2019(14)
- July 2019(23)
- June 2019(54)
- May 2019(82)
- April 2019(76)
- March 2019(71)
- February 2019(67)
- January 2019(75)
- December 2018(44)
- November 2018(47)
- October 2018(74)
- September 2018(54)
- August 2018(61)
- July 2018(72)
- June 2018(62)
- May 2018(62)
- April 2018(73)
- March 2018(76)
- February 2018(8)
- January 2018(7)
- December 2017(6)
- November 2017(8)
- October 2017(3)
- September 2017(4)
- August 2017(4)
- July 2017(2)
- June 2017(5)
- May 2017(6)
- April 2017(11)
- March 2017(8)
- February 2017(16)
- January 2017(10)
- December 2016(12)
- November 2016(20)
- October 2016(7)
- September 2016(102)
- August 2016(168)
- July 2016(141)
- June 2016(149)
- May 2016(117)
- April 2016(59)
- March 2016(85)
- February 2016(153)
- December 2015(150)