
The internet of things involves the placement of sensors on everything from cars to refrigerators to humans and transmitting that data via an internet connection to a central repository for storage. Once there, it becomes part of big data, which is the analysis of all that information.
Big data, however, extends far beyond just the internet of things (IoT). Big data projects can analyze data from traditional or modern databases and even unstructured data. Big data can also correlate the seemingly unrelated information that sensors collect with information in traditional databases to improve organizational efficiency. For example, a shipping company may use sensors in its vehicles to direct drivers along routes that improve delivery efficiency and reduce fuel costs.
The benefits of a big data or IoT project can lead to enhanced productivity, better health or simply a more enjoyable life. As users become more comfortable with the concept, and technology allows for the less obtrusive installation of more devices, the amount of data organizations gather increases exponentially. The challenge is to store this data, which is notably different in both type and quantity from traditional storage data.
Storage demands for a big data, IoT project
From a storage perspective, IoT and big data are similar, but they have different demands. The storage response for an IoT project is dependent on the use case. For sensors, an IoT storage system needs to handle rapid input from potentially millions of sensors simultaneously. Because the data these sensors produce is often tiny, the target storage system needs to store what might amount to trillions of small files without impairing performance.
But an IoT project can also include surveillance images from cameras or drones. This data type is typically a continuous stream, so its storage is dependent on high bandwidth and the ability to store fewer but much larger, high-capacity files than the sensor use case. What makes the challenges even more daunting is that it is not uncommon for an organization to require storage for both IoT use cases.
From a big data perspective, the storage system needs to have access to all, or at least most, of the data that the IoT project creates. You can also use the big data project to analyze existing databases and other unstructured data, as well as to correlate the disparate data sets.
By far, the most common foundation for big data is Hadoop. The Hadoop File System (HDFS) creates a cluster of processing servers and assigns an analytics job to the least busy node in the cluster. The intent is for the data that the node needs to analyze to be local on that node. This scenario eliminates the need for an expensive network infrastructure and enables the use of low-cost, server-class storage instead of expensive, shared enterprise-level storage.
The data footprint and storage I/O requirements of IoT and big data differ from those of the traditional data center application. First, IoT data is typically a continuous feed. Data sizes can vary from miniscule to enormous. The number of files to store can reach into the trillions. This makes it easy to quickly create large amounts of data, and, as a result, there is a constant demand for capacity growth.
And that growth must scale quickly and in ways that aren’t disruptive. Storage systems for an IoT project also need to scale cost-effectively so that an organization can store petabytes of data for a long time. That requires low administration costs and burdens. Most IT staff simply cannot manage a dozen storage systems from six different vendors. IT professionals need to drive their storage hardware requirements to one to three storage systems that cover Tier 1 and Tier 2 applications, as well as the immense amount of unstructured data that IoT and big data create.
Finding the answers to your IoT project challenges
IoT and big data create a number of challenges for IT professionals. IoT has two different file storage needs, and most organizations will eventually need both. The first requires high, random ingestion of trillions of small files. The second requires high-bandwidth streaming of much fewer, but much larger, files. It is extremely rare for a single storage system to provide both of these capabilities. Typically, they are tuned for handling trillions of small files or tuned for streaming large files.
There is also the challenge of Hadoop’s local storage design. Data protection takes place by replicating copies of data between nodes. Most organizations will select a three-way replication as a default. This means these challenges, from a capacity perspective, are now multiplied by a factor of three, plus the data already residing on the IoT storage systems.
Another challenge in the Hadoop design is that the most available node in the cluster to process the job may not actually have the data stored on it. This means the job will have a less capable node handling it, or the job needs to transfer the data to the most capable node.
The central question then becomes: Can a single storage system solve all of these problems?
The answer depends on the use case. Object storage systems are obvious candidates to be the back-end storage devices for IoT data. Experience shows us that they are more than adequate to support Hadoop environments.
For IoT environments, object storage systems are adept at handling high file object count environments. Most object storage systems can also be the back-end storage device for Hadoop environments, either through Amazon Simple Storage Service compatibility or, in some cases, native HDFS support. Providing the Hadoop infrastructure with a shared storage back end adds network latency, but it lessens the burden on the single master control node. It also eliminates the need for 3X replication, because most object storage systems use a parity-based data protection scheme, such as erasure coding.
The other advantage of using an object storage system is that the IoT devices can directly send data to the same storage the Hadoop environment is using. The sharing of the data means a reduction in capacity consumption and is not wasting time waiting for data to transfer between an IoT data storage device and a Hadoop storage device.
The challenge with that design is the data center will likely still need another storage system for its production application environment. The organization may also need to store and process video data from IP cameras and similar IoT devices. If that’s the case, then some object storage systems may not be appropriate; it would not be optimal to tune others to effectively handle both large and small files at the same time.
Beyond object storage
The protocols within the data center are starting to blend. Many storage systems on the market can provide a variety of protocol support, including object, network file system (NFS), server message block (SMB), internet small computer system interface (iSCSI) and even Fibre Channel (FC).
Each protocol performs well with different use cases. For example, FC is ideal for mission-critical databases, but often considered too expensive for Tier 2 and Tier 3 applications. ISCSI is often the protocol of choice for the lower-priority applications. NFS is excellent for high-performance file share and is gaining traction as a storage area for virtual machine images. Even for a big data or IoT project, there are times when NFS is more appropriate than object storage.
Most data centers will have to select at least one storage system to complement their primary storage system. While object storage is capturing a lot of attention, high-performance, cost-effective NFS/SMB answers are making a comeback. These systems scale out like object storage systems do, often have a similar erasure coding type of data protection and support a wide variety of protocols. In some cases, they can perform all of the above.
Which strategy an organization chooses will depend on what type of IoT and big data they expect to manage, and the scope of the project. Another consideration is the age and suitability of its current storage assets to solve the IoT and big data problems. If the data center’s current production storage is supporting high-performance requirements of Tier 1 and Tier 2 applications, adding object storage on the back end may be ideal.
If the performance requirement of the Tier 1 and Tier 2 applications is somewhat more modest, then a single storage infrastructure that delivers all protocols may be of interest. While these more general purpose systems don’t tend to perform as well as focused systems, they often provide more than adequate performance for a typical data center. Plus they offer the benefit of consolidation to a single storage system. The result should be lower costs and an increase in operational simplicity.
IoT and big data can change how an organization conducts its business. The insight that the combination can provide allows a company to make significant improvements to the way it creates new products and responds to customers. But these initiatives have a significant impact on an IT infrastructure, especially storage.
IT professionals need a strategy for a big data and IoT project that allows the storage infrastructure to live up to its full potential. The right products are available to meet the challenge, whether that’s for high file counts and high capacity or a consolidated storage answer.
This article was originally published on www.techtarget.com and can be viewed in full


Archive
- October 2024(44)
- September 2024(94)
- August 2024(100)
- July 2024(99)
- June 2024(126)
- May 2024(155)
- April 2024(123)
- March 2024(112)
- February 2024(109)
- January 2024(95)
- December 2023(56)
- November 2023(86)
- October 2023(97)
- September 2023(89)
- August 2023(101)
- July 2023(104)
- June 2023(113)
- May 2023(103)
- April 2023(93)
- March 2023(129)
- February 2023(77)
- January 2023(91)
- December 2022(90)
- November 2022(125)
- October 2022(117)
- September 2022(137)
- August 2022(119)
- July 2022(99)
- June 2022(128)
- May 2022(112)
- April 2022(108)
- March 2022(121)
- February 2022(93)
- January 2022(110)
- December 2021(92)
- November 2021(107)
- October 2021(101)
- September 2021(81)
- August 2021(74)
- July 2021(78)
- June 2021(92)
- May 2021(67)
- April 2021(79)
- March 2021(79)
- February 2021(58)
- January 2021(55)
- December 2020(56)
- November 2020(59)
- October 2020(78)
- September 2020(72)
- August 2020(64)
- July 2020(71)
- June 2020(74)
- May 2020(50)
- April 2020(71)
- March 2020(71)
- February 2020(58)
- January 2020(62)
- December 2019(57)
- November 2019(64)
- October 2019(25)
- September 2019(24)
- August 2019(14)
- July 2019(23)
- June 2019(54)
- May 2019(82)
- April 2019(76)
- March 2019(71)
- February 2019(67)
- January 2019(75)
- December 2018(44)
- November 2018(47)
- October 2018(74)
- September 2018(54)
- August 2018(61)
- July 2018(72)
- June 2018(62)
- May 2018(62)
- April 2018(73)
- March 2018(76)
- February 2018(8)
- January 2018(7)
- December 2017(6)
- November 2017(8)
- October 2017(3)
- September 2017(4)
- August 2017(4)
- July 2017(2)
- June 2017(5)
- May 2017(6)
- April 2017(11)
- March 2017(8)
- February 2017(16)
- January 2017(10)
- December 2016(12)
- November 2016(20)
- October 2016(7)
- September 2016(102)
- August 2016(168)
- July 2016(141)
- June 2016(149)
- May 2016(117)
- April 2016(59)
- March 2016(85)
- February 2016(153)
- December 2015(150)