Musings of a Chief Analytics Officer: From Collections to Connections to IoT Analytics
In the recently concluded “Gartner Data Analytics Summit 2017”, there was an interesting session “Collections vs Connections”. The session delved into the current practice of “data hoarding” and how it is utterly useless, unless there is a concerted effort to explore the several multi-dimensional connectedness hidden in the vast drove of data, and hence the importance of connections. Extending this thought to IoT, I see a similar trend at large.
Data generated by people and data generated by machines is actually quite different. How?
- Data generated by things or machines is actually quite predictable: A sensor is programmed to produce only a specific type of data. Temperature, Pressure, rotations are few examples. Machines sending a signal at specified intervals makes the data highly temporal.
- Data generated by people, on the other hand, is highly unpredictable, depending on what you’re doing, from transactions to search to viewing preferences to listening to music to photos to tax returns.
In the IoT space, much of the work has happened in making the machines talk – sensors transmitting data, but I personally don’t subscribe to the notion that collecting the sensor data, loading it into a data lake, and then performing offline data science/machine learning to create predictive models qualifies as doing IoT. IoT is much more than that. To me, IoT means not only doing really big data management with all these sensor transmitted data, but also having capabilities to do streaming data management at the edge of the network and purpose-driven analytics, (both at the end and beyond the edge).
Connections allows information to be exchanged between the product and its operating environment, its maker, its users, and other products and systems. Few examples – Elevators are using IoT to reduce wait times, by as much as 50% by predicting demand patterns, calculating the fastest time to destination, and assigning the appropriate elevator to move passengers quickly. In consumer goods, ceiling fans are sensing and engaging automatically when a person enters a room, regulating speed on the basis of temperature and humidity, and recognizing individual user preferences and adjusting accordingly.
Specific to IoT and the connected new world where everything is increasingly becoming “smart”, we are witnessing a data-in-motion set, which is location-based and streaming and highly temporal in nature. The data that tells you that one of your capital intensive heavy equipment machine is about to breakdown is not valuable if it’s just stored, right? This data-in-motion set is most valuable when it’s captured, processed, analyzed and acted upon at that very moment!
If you are not managing real-time streaming data, doing real-time analytics and real-time decisions at the edge, then you are not doing IoT or IoT analytics.
Essential Complexity: To crunch at the edge or sail through the Cloud
IoT realm is all about machine-scale, it constitutes machine-to-machine generated data comprising of discrete fast observations (e.g., temperature, vibration, pressure, humidity) at very high signal rates (1,000s of messages/sec). These observations rarely change (e.g., temperature operates within an acceptably small range); however, when the values do change. Thus, one should be quick enough to spot those, do a quick analysis of the pattern associated and immediately provide interventions or corrective measures.
This is precisely the reason why collecting all the sensor generated data is less of importance, rather it is all about how do you spot anomalies quickly and act on it. IoT analytics is hence not about connecting the devices and collecting everything emitting out of it, it is all about analyzing data from these connected things.
Performing analytics at the edge or executing analytics models at the edge?
“At the edge” refers to the devices or sensors that are embedded in the machine that is generating data about the operations and performance of that specific device or sensor. These devices or sensors generate huge and real-time volumes of data. Just collecting these data doesn’t do anything to directly create business advantage. It is what you do with that data that drives the business value, which brings us to the interesting question of “what do we really mean by IoT Analytics?”
“Performing analytics” at the edge means, collecting the data, storing the data, preparing the data, running analytic algorithms, validating the analytic outputs, and then acting on the insights, right there at the edges.
Whereas, “executing the analytic models” at the edges means, executing pre-built analytics modes (e.g., scores, rules, recommendations, anomaly detections) at the edge. These models were first built in an offline manner by bringing the detailed sensor data, to a data and analytics platform (the Data Lake), and then apply sophisticated algorithms on the IoT historical data. Once the models are built and tested, they can then be deployed on the edge through a JVM on the machine itself or can be deployed on the cloud to act on the streaming data transported from the machines.
Its one thing to “execute the analytic models” at the edges, but something entirely different to actually “perform analytics” at the edges. Why?
At the edge, typically consists of an environment that is minimal on memory and processing power, hence heavy duty activities like storing data, executables and then running the executable becomes a challenge. For example, if your fridge has a place to house a Java Virtual Machine (JVM) and an analytic model (i.e., lightweight rules based model), then you can execute the analytic model on the fridge itself. On the other hand, you can stream data from the fridge to a network, and then execute the analytic model on the network.
Smart Things, Digital Twins, IoT Analytics Platforms
Analytics has become an integral part of business strategy, and the connected world is further pushing the envelope to make everything smart – smart homes, smart products, smart plants, you name it. But like with everything else, using the right tool for the right problem is the key to success.
For manufacturers, the changes are huge. Historically, once their product leaves the factory, they are less mindful of the actual product usage and behavior, unless there are complaints about the product quality and subsequent returns or recalls initiated by the manufacturer. With smart connected products, manufacturers get the ability to experience true closed-loop, product lifecycle management where they can track, manage and analyze product information at any phase of its lifecycle at any time and any place in the world.
However, the creation of smart things is not easy.
It requires capabilities to securely collect and respond to data from customers, suppliers, and now the products themselves. It requires a digital transformation cutting across the entire product management lifecycle starting from product design to raw materials sourcing to production to sales and service.
Solution: A purpose-built IoT ecosystem and analytic platform.
Finally, even if you decide to do all the sensor/IoT analysis at the edges, you would still need to bring the raw IoT data into the data lake for more extensive analysis and persist with IoT history. Why? How do you operate, maintain, or repair systems when you aren’t within physical proximity to them?
You need a bridge between the physical and digital world (we are talking about Digital Twin and Digital Thread). Sensors gathering data about real-time status and working condition from the physical item (machines), relaying these information to a cloud-based system that receives and processes all the data the sensors monitor. This physical-digital bridge creates a virtual and visual environment, where input is analyzed to understand the product behavior at real-time, the trends, the usage patterns and most importantly to innovate for your next set of product offerings using business drivers and other contextual data.
Result: This pairing of the virtual and physical worlds allows analysis of data and monitoring of systems to head off problems before they even occur, prevent downtime, develop new opportunities and even plan for the future by using simulations.
More from Soumendra Mohanty
Last week, I was in Johannesburg meeting some clients, and the conversation turned toward a…
AI (Artificial Intelligence) will make up for the lack of data scientists and the next frontier…
It’s hard to not notice that in almost everything (starting from our mundane day to day activities…
The infinite monkey theorem states that a monkey hitting keys at random, on a typewriter keyboard…
Latest Blogs
The business world is moving quickly and the only way to make informed decisions is to leverage…
As businesses turn to cloud services to meet their growing technology needs, the promise of…
Clinical trials are at the heart of drug development, producing vast, complex datasets that…
The rise of machine customers introduces essential questions that stretch our technological…