
Big data can bring huge benefits to businesses of all sizes. However, as with any business project, proper preparation and planning is essential, especially when it comes to infrastructure. Until recently it was hard for companies to get into big data without making heavy infrastructure investments (expensive data warehouses, software, analytics staff, etc.). But times have changed. Cloud computing in particular has opened up a lot of options for using big data, as it means businesses can tap into big data without having to invest in massive on-site storage and data processing facilities.
In order to get going with big data and turn it into insights and business value, it’s likely you’ll need to make investments in the following key infrastructure elements: data collection, data storage, data analysis, and data visualization/output. Let’s look at each area in turn.
Data collection
This is where the data arrives at your company. It includes everything from your sales records, customer database, feedback, social media channels, marketing lists, email archives and any data gleaned from monitoring or measuring aspects of your operations. You may already have the data you need, but chances are you need to source some or all of the data required.
If you do need to source new data, this may require new infrastructure investments. Infrastructure requirements for capturing data depend on the type or types of data required, but key options might include: sensors (that could sit in devices, machines, buildings, or on vehicles, packaging, or anywhere else you would like to capture data from); apps which generate user data (for example, a customer app which allows customers to order more easily); CCTV video; beacons (such as iBeacons from Apple, which allow you to capture and transmit data to and from mobile phones); changes to your website that prompt customers for more information; and social media profiles.
With a little technical knowledge, you can set many of these systems up yourself, or you can partner with a data company to set up the systems and capture the data on your behalf. Accessing external data sources, such as social media sites, may require little or no infrastructure changes on your part, since you’re accessing data that someone else is capturing and managing. If you’ve got a computer and an internet connection, you’re pretty much good to go.
Data storage
This is where you keep your data once it is gathered from your sources. As the volume of data generated and stored by companies has exploded, sophisticated but accessible systems and tools have been developed to help with this task. The main storage options include: a traditional data warehouse; a data lake; a distributed/cloud-based storage system; and your company server or a computer hard disk.
Regular hard disks are available at very high capacities and for very little cost these days and, if you’re a small business, this may be all you need. But when you start to deal with storing and analyzing a large amount of data, or if data is going to be a key part of your business going forward, a more sophisticated, distributed (usually cloud-based) system like Hadoop may be called for.
I think cloud-based storage is a brilliant option for most businesses. It’s flexible, you don’t need physical systems on-site and it reduces your data security burden. It’s also considerably cheaper than investing in expensive dedicated systems and data warehouses.
Data analysis
When you want to use the data you have stored to find out something useful, you will need to process and analyze it. So this layer is all about turning data into insights. This is where programing languages and platforms come into play.
There are three basic steps in this process: 1. preparing the data (identifying, cleaning and formatting the data so it is ready for analysis); 2. building the analytic model; and 3. drawing a conclusion from the insights gained.
Software exists from vendors such as IBM, Oracle and Google to help you do all of this: turning raw data into insights. Google has BigQuery, which is designed to let anyone with a bit of data science knowledge run queries against vast datasets. Other analytics options include Cloudera, Microsoft HDInsight and Amazon Web Services. And many startups are piling into the market, offering simple solutions which claim to let you feed it with all of your data, and sit back while it highlights the most important insights, and suggests actions for you to take.
Data visualization/output
This is how the insights gleaned from analyzing the data are passed on to the people who need them, i.e. the decision makers in your company. Clear and concise communication is essential, and this output can take the form of brief reports, charts, figures and key recommendations.
All too often I see businesses bury the real nuggets of information that could really impact strategy in a 50-page report or a complicated graphic that no one understands. It’s clearly unrealistic to expect busy people to wade through mountains of data with endless spreadsheet appendices and extract the key messages. Remember: if the key insights aren’t clearly presented, they won’t result in action.
Key data output options include management dashboards, commercial data visualization platforms that make the data attractive and easy to understand, and simple graphics (like charts and graphs) that communicate insights. In my experience, for most smaller businesses looking to improve their decision making, simple graphics or visualization tools like word clouds are more than enough to present insights from data.
Together, these four areas represent the key infrastructure requirements for big data projects.
This article was originally published on www.forbes.com and can be viewed in full


Archive
- October 2024(44)
- September 2024(94)
- August 2024(100)
- July 2024(99)
- June 2024(126)
- May 2024(155)
- April 2024(123)
- March 2024(112)
- February 2024(109)
- January 2024(95)
- December 2023(56)
- November 2023(86)
- October 2023(97)
- September 2023(89)
- August 2023(101)
- July 2023(104)
- June 2023(113)
- May 2023(103)
- April 2023(93)
- March 2023(129)
- February 2023(77)
- January 2023(91)
- December 2022(90)
- November 2022(125)
- October 2022(117)
- September 2022(137)
- August 2022(119)
- July 2022(99)
- June 2022(128)
- May 2022(112)
- April 2022(108)
- March 2022(121)
- February 2022(93)
- January 2022(110)
- December 2021(92)
- November 2021(107)
- October 2021(101)
- September 2021(81)
- August 2021(74)
- July 2021(78)
- June 2021(92)
- May 2021(67)
- April 2021(79)
- March 2021(79)
- February 2021(58)
- January 2021(55)
- December 2020(56)
- November 2020(59)
- October 2020(78)
- September 2020(72)
- August 2020(64)
- July 2020(71)
- June 2020(74)
- May 2020(50)
- April 2020(71)
- March 2020(71)
- February 2020(58)
- January 2020(62)
- December 2019(57)
- November 2019(64)
- October 2019(25)
- September 2019(24)
- August 2019(14)
- July 2019(23)
- June 2019(54)
- May 2019(82)
- April 2019(76)
- March 2019(71)
- February 2019(67)
- January 2019(75)
- December 2018(44)
- November 2018(47)
- October 2018(74)
- September 2018(54)
- August 2018(61)
- July 2018(72)
- June 2018(62)
- May 2018(62)
- April 2018(73)
- March 2018(76)
- February 2018(8)
- January 2018(7)
- December 2017(6)
- November 2017(8)
- October 2017(3)
- September 2017(4)
- August 2017(4)
- July 2017(2)
- June 2017(5)
- May 2017(6)
- April 2017(11)
- March 2017(8)
- February 2017(16)
- January 2017(10)
- December 2016(12)
- November 2016(20)
- October 2016(7)
- September 2016(102)
- August 2016(168)
- July 2016(141)
- June 2016(149)
- May 2016(117)
- April 2016(59)
- March 2016(85)
- February 2016(153)
- December 2015(150)