(republished on linkedin)

Nowadays the Internet of Things (IoT) is a huge cosmos with many different sensors and actuators. These promise deep insights into processes and possibilities for control. The combination of different sensors allows the identification of specific situations or anomalies. However, such combinations might be extremely complex. This often means that costs or constraints might hinder the introduction of IoT and dedicated sensors. Additionally, these sensors might not be attached to existing solutions which means that they are not suitable for retrofitting scenarios. Then it might be time to think about cameras as an alternative IoT sensor.

Cameras are cheap but powerful. If you think about the amount of information which is already transported with a single image, animated pictures are even more powerful. There is a lot of information that can be communicated, for example, movement, state changes, objects and even emotions.

Cameras in Tesla

How many information is transferred via a camera is demonstrated by the autopilot functionality from Tesla. Most manufacturers use a combination of depth and distance sensors. In comparison to them Tesla bases its autopilot mainly on cameras, video streams and software that analyses the stream in real time.

The following video on youtube gives an impression how the camera based input looks like.

Tesla’s general idea is that cameras provide enough information to steer a car autonomously if a 360° view is given. As a result, the software becomes the key component in their solution as it has to understand the video stream. It is the most important part which needs to be adapted and made intelligent enough to react to different conditions. From my point of view this is a very interesting concept as this might result in greater flexibility. The software can be simply updated and no additional sensors need to be installed in the car to improve the autopilot performance. However, the upcoming years will show if this is the right approach.

AI drives image processing

AI makes it possible. During the last couple of years the area of computer vision made huge progress which was driven by machine learning and underlying processing algorithms. There are a lot of preprocessed machine learning models on the market for cloud and on-premise settings that help to extract information from video streams to realise different use cases. This means, that you do not have to train your own AI.

Here are a few examples of IoT use cases which can leverage from camera input streams:

  • Detection of the amount of people in a room
  • The amount of visitors or passers-by of a shop
  • State changes of a machine if control lights or movements are tracked
  • Pathing recognition and tracking of goods
  • Counting objects for example produced goods

Prototyping and demonstration

We had to run a workshop which is why we restructured our knowledge about AI and built a few things for demonstration.


We used the Python programming language in combination with OpenCV which is an open source library for computer vision. OpenCV contains already pre-trained machine learning models. Furthermore, additional ones can be found in the Internet or directly by OpenCV.

We used the Haar-cascade algorithm to identify faces of a camera input stream to determine if there are people in a room. By positioning the camera at the entrance to our offices we can now tell if the office is empty or if colleagues are already in.

By placing a second device in front of the coffee machine we can actually tell how often the coffee machine is visited. Although these are simple examples, they show the possibilities if you apply such a device, for example, in production.

To demonstrate and check the implementation, we added green rectangles around the recognised faces and recorded the manipulated video. An extract of such a result is shown above.

All in all it took us 2h to get everything set up and running and we were able to gather information from our environment. From my point of view, that is impressively efficient and only possible due to the great and open minded developer community.

The example shows the powerful combination of AI and video streams to detect specific situations in the wild and that it might be worth to invest some time into this technology to evaluate opportunities. It might help to broaden knowledge about processes and machines and, therefore, might generate new insights.