You want to get started with a big data project at your company, but you’re unfamiliar with Hadoop and you’re unsure your project will deliver value. Relax. Many organizations are struggling to implement Hadoop for a variety of reasons. In “The Current State of Hadoop in the Enterprise,” by the International Institute for Analytics and sponsored by SAS, you’ll find a handy list of five steps to maximize the value of a Hadoop big data project for your organization. It’s a great start. Here are some further considerations based on those recommendations:
1. Identify and define use cases that deliver competitive advantage and are strategic in nature.
First, choose your target. Let’s say you want to study customer behavior. Your focus should be on new data types that are not currently being studied in other initiatives, such as an enterprise data warehouse. It’s likely you will want to examine clickstream data, which tells you how customers are behaving online, and social media data, which tells you what people are saying about your brand.
Make sure your Hadoop project has a high profile and can deliver measurable value—such as more sales or repeat customers—fairly quickly. This will help justify your project and pave the way for future projects.
A good way to help identify and define use cases is the SAS Business Analytic Modernization Assessment (BAMA) service. Meant to help broaden the use of analytics in an organization, the BAMA is a workshop that facilitates conversation between IT and business units. Both sides work collaboratively to understand the key challenges with their current and future analytical processes.
2. Evaluate if and how Hadoop fits into your existing data and analyticsarchitecture. For many organizations, business intelligence and analytics projects such as data warehouses have been going on for decades. Even though the data storage cost of Hadoop might be significantly less than your data warehouse, it’s a mistake to scrap your warehouse investment for the sake of undertaking the same efforts in Hadoop. While Hadoop is ideal for storing things like sensor data, it’s not so good for real-time processing of a small number of records. Analytics expert Tom Davenport says many companies are storing large quantities of new data types in Hadoop clusters, and then moving that data to an enterprise data warehouse as needed for production applications.1
Let’s assume you have done an assessment and focused one of your Hadoop implementations on customer behavior. Next you’ll need to evaluate where the data supporting that behavioral analysis lives. The cost of using a traditional data warehouse for storage of clickstream data can skyrocket. Hadoop can store large amounts of data at a reasonable cost, but that is not the end of the story. To achieve your organization’s objective of better understanding customer behavior, you’ll need powerful analytics able to exploit the customer clickstream data now stored in Hadoop clusters.
3. Augment Hadoop with data management, data discovery and analytics to deliver value.
Once you’ve established the need to use Hadoop for your largest, fastest-moving data, you’ll need tools to manage, manipulate and analyze that data. But those tools must be able to keep pace.
Let’s say you’re storing sensor data in Hadoop. What are you doing with it? Alone it may not tell you much, but if you can join it with third-party data to build an analytics-based table, you may glean some valuable insights. This can pay dividends where mechanical devices are involved—for example, an analyst could predict where failures in an aircraft might occur so that maintenance can be performed to keep planes in the air, increasing revenue and saving money. That kind of bottom-line benefit is important for your project to be successful.
Streamlining your overall time to value will help you further realize the power of Hadoop. How can you do this? Be sure you can access and load your data—in Hadoop or elsewhere—as quickly as you need. Explore billions of rows of data in seconds, and work with your data inside Hadoop—without the need to move the data to a separate analytical platform. Ensuring high efficiency for your analytical process top-to-bottom is the key to delivering value from your Hadoop implementation.
4. Re-evaluate your data integration and data governance needs.
Remember that the results of your data analytics project may be used to determine major business strategies. Data integration and governance are as important as ever. You need to know where the data came from and that it’s clean. Data governance takes it a step beyond technology to incorporate people and processes. Find a technology partner such as SAS that has years of experience bringing IT and business divisions together and helping to develop data standards suited to your particular organizational culture. Your data governance practices should enable you to have a high level of confidence that when the data is manipulated, the results will have value and they will be auditable.
5. Assess skills/talent gaps early and develop a plan to mitigate those gaps before deployment.
Big data is still a relatively new field, and the skills required to manage a project effectively can be surprisingly scarce. Productive use of Hadoop requires expertise in programing languages like Sqoop, Hive, Pig and MapReduce.
You should also determine whether or not a data scientist is needed to make sense of the big data project and connect it with your business’s mission and strategy. It may be that a traditional business analyst can fill the need. For example, with an intuitive interface like the one included in SAS Data Loader for Hadoop, a user can acquire, discover, transform, cleanse, integrate and deliver data without being an expert in Sqoop, Hive, or Pig. But if you do hire a data scientist, it makes sense to allow him or her to focus on the tasks for which he or she is best equipped, such as modeling, rather than writing MapReduce. Ultimately, organizations that get the best results have a firm grasp of the skills needed—and come up with a plan to fill them—before they embark on a Hadoop project.
This article was originally published on www.techtarget.com and can be viewed in full
Archive
- October 2024(44)
- September 2024(94)
- August 2024(100)
- July 2024(99)
- June 2024(126)
- May 2024(155)
- April 2024(123)
- March 2024(112)
- February 2024(109)
- January 2024(95)
- December 2023(56)
- November 2023(86)
- October 2023(97)
- September 2023(89)
- August 2023(101)
- July 2023(104)
- June 2023(113)
- May 2023(103)
- April 2023(93)
- March 2023(129)
- February 2023(77)
- January 2023(91)
- December 2022(90)
- November 2022(125)
- October 2022(117)
- September 2022(137)
- August 2022(119)
- July 2022(99)
- June 2022(128)
- May 2022(112)
- April 2022(108)
- March 2022(121)
- February 2022(93)
- January 2022(110)
- December 2021(92)
- November 2021(107)
- October 2021(101)
- September 2021(81)
- August 2021(74)
- July 2021(78)
- June 2021(92)
- May 2021(67)
- April 2021(79)
- March 2021(79)
- February 2021(58)
- January 2021(55)
- December 2020(56)
- November 2020(59)
- October 2020(78)
- September 2020(72)
- August 2020(64)
- July 2020(71)
- June 2020(74)
- May 2020(50)
- April 2020(71)
- March 2020(71)
- February 2020(58)
- January 2020(62)
- December 2019(57)
- November 2019(64)
- October 2019(25)
- September 2019(24)
- August 2019(14)
- July 2019(23)
- June 2019(54)
- May 2019(82)
- April 2019(76)
- March 2019(71)
- February 2019(67)
- January 2019(75)
- December 2018(44)
- November 2018(47)
- October 2018(74)
- September 2018(54)
- August 2018(61)
- July 2018(72)
- June 2018(62)
- May 2018(62)
- April 2018(73)
- March 2018(76)
- February 2018(8)
- January 2018(7)
- December 2017(6)
- November 2017(8)
- October 2017(3)
- September 2017(4)
- August 2017(4)
- July 2017(2)
- June 2017(5)
- May 2017(6)
- April 2017(11)
- March 2017(8)
- February 2017(16)
- January 2017(10)
- December 2016(12)
- November 2016(20)
- October 2016(7)
- September 2016(102)
- August 2016(168)
- July 2016(141)
- June 2016(149)
- May 2016(117)
- April 2016(59)
- March 2016(85)
- February 2016(153)
- December 2015(150)