Big Data and the Internet of Things (IoT)

Abhikhya Ashi
6 min readDec 11, 2020


The Internet has come a long way from a shaky dial up to the interconnected revolutionary world we are in. This new wave of interconnectedness is called the IoT (Internet of Things) or sometimes IoE (Internet of Everything)

What is IoT?

In its most simplistic term, IoT refers to the inter-networking of physical devices. These devices are capable of making network connections and exchange data with a cloud infrastructure or other similar devices in their existing network. Many of the devices in this arena would be information generating devices like GPS, temperature sensor, pressure sensor or information receiving devices mainly actuators like motors, relays etc., collectively known as the edge points of the network. Other components are the gateways that aggregate information from edge devices and the cloud that stores and processes information received from gateways/devices.

In some architecture, we can do not-so-complex data processing at the edge devices or gateways which is called Fog Computing. In such a configuration, it is possible to sense and control the “connected objects” remotely, thus reducing the amount of data transported to the cloud for storage/processing. It is also possible for the devices to communicate or share information with each other and run advanced machine learning techniques on the data and make decisions by themselves.

Launch of IPv6 had made it possible to connect billions of devices to the same network, enabling creation of more sophisticated networks. Other technology standards like Li-Fi, BLE and Z-Wave give IoT a push by providing very low energy means of communication between devices in the network.

Conceptual architecture of an IoT system:

Today, IoT applications range from tracking down your lost keys or mobile phone using Bluetooth and other wireless technologies to remotely monitor/manage your home to cut down on bills and resource usage to engage with the data exhaust produced from your city/neighborhood! These are just a few capabilities of what we can actually achieve with sensors and actuators and networked intelligence!! IoT will soon be part of many aspects of our lives such as consumer goods, smart homes/cities, manufacturing, transportation etc.

IoT’s Big Data Problem

As mentioned earlier, the continuum of sensors and devices interconnected through a variety of communication protocols like Bluetooth, BLE, ZigBee, GSM etc. generate huge volumes of data every second. Considering the fact that billions of such devices be connected to the same network, the amount of data a typical system generates runs into several million megabytes per second. For example, in 2015 Paris Air Show, Bombardier showcased its C Series jetliner which is fitted with 5,000 sensors that generate up to 10 GB of data per second. There are many similar industrial cases that produce TBs of operational data every day. The solution to processing such huge data comes with “Big Data technologies”.

The speed at which Big data and IoT is developing is tremendous and is affecting all areas of technologies and businesses as it increases the benefits for organizations and individuals. The growth of data produced via IoT has affected the big data landscape widely. This has made the big data analytics challenging because of the collection and processing of huge amounts of data through various sensors in the IoT environment. The analytics range from mere drilling down to complex optimizations performed on the ingested data.

IoT analytics challenges:

As IoT data is created by devices operating remotely under widely varying environmental conditions and the data is also communicated over long distances often across different networking technologies,the analyses become challenging. Some of the problems are:

Data volume: The data flowing into an organization can grow large very quickly with millions of IoT devices with various sensors sending data on a regular basis. Organizations need to adapt to processing such huge volume of data that inflows on ongoing basis. The data volumes and computing resource needs that IoT demands will soon outpace all the other organization data combined.

Problems with time and space: IoT devices are located in various time zones and geographical locations. This added information need to be captured for precise analytics which again increases the volume and complexity.

Data quality: The quality of the data generated is the decision maker in IoT analytics and needs to be trusted. The quality of analytics depends on how clean and authentic the data is and how quickly we can derive value from that data.

The key challenge to big data technologies is to visualize and uncover insights from various types of IoT data — structured, unstructured, real time etc.

Big IoT Data Analytics Big IoT data analytics should include both: Batch Analytics:

Tasks that require huge volumes of data are typically handled by batch operations. The datasets can be processed from permanent distributed storages using Hadoop MapReduce or in-memory computations using Apache Spark. Apache Pig and Hive are used for data querying and analyses. Since these run on cheap commodity servers on a distributed manner, they are the best bet for processing historical data and deriving insights and predictive models out of it.

Today, most modern data analytics tools like Teradata or Tableau can directly hook into HDFS and process data and generate reports/dashboards.

(Pseudo) Real-time Analytics:

These types of analytics refer to the system that depends on instantaneous feedback based on the data received from the sensors. For example, IoT based health care system which receives data from numerous sensors on a patient’s body. One important feature of such a system is to aggregate real time data from the sensors and run algorithms that can automatically detect situations that need immediate medical attention. The situation is detected; a medical provider or an emergency response system should be notified immediately. In this case the analysis-response cycle should only take few seconds as every second would be a matter of life and death. Other scenarios would be fraud detection, security breach etc. to flag unusual behaviors for immediate actions.

A classic Hadoop based solution might not work in the above cases because of the fact that it relies on MapReduce which is considerable slow involving costly IO operations. The solution is to augment Hadoop ecosystem with a faster real-time engine like Spark, Storm etc.

Following are different options for implementing the real-time layer:

  1. Apache Storm, Kafka and Trident : Highly scalable, reliable, distributed , fast and real-time computing to process high velocity data
  2. Spark Streaming — extension of the core SparkAPI that enables scalable, high-throughput, fault-tolerant stream processing of live data streams

How industries are using Big IoT data: GE’s Big Bet on Data and Analytics

GE is venturing with newer visions for operational technology (OT) on top of industrial machinery. They connect machines via cloud and use data analytics to help predict breakdowns and assess overall health of machineries.

The Supermarket of the Future

The Supermarket of the future will enhance human shopping experience providing off-the-shelf technology with airy layouts, easy to reach items and informative screens suspended at eye level. Imagine you can access every bit of info about the produce you are buying, say from the location and climatic conditions where it grew, the chemical treatments done to its journey to the shelf right in front of you! Coop Italia’s supermarket is designed with such a rich shopping experience for users.

For more industrial IoT case studies, refer 10 Case Studies for the Industrial Internet of Things — IoT Central

References:11 Internet of Things (IoT) Protocols You Need to Know AboutDefinition fog computing (fog networking, fogging)Internet Of Aircraft Things: An Industry Set To Be TransformedAbout the Author

Suhasini is a Senior Consultant @ MastechInfoTrellis expertized in designing and developing enterprise applications using Microsoft technologies. She also has an avid interest in Big Data and IoT technologies.

Originally published at