DT Developer Docs
REST APIDT StudioStatus Page
  • Getting Started
  • Overview
  • Concepts
    • Devices
    • Events
    • Topics
      • Temperature Measurement Interval
      • Motion Sensor Activity Timer
  • Data Connectors
    • Introduction to Data Connectors
    • Creating a Data Connector
    • Configuring a Data Connector
    • Receiving Events
    • Best Practices
    • Example Integrations
      • Heroku
      • Google Cloud Functions
      • AWS Lambda
      • Azure HTTP Triggers
      • IBM Cloud Actions
    • Development Guides
      • Local Development with ngrok
  • REST API
  • Introduction to REST API
  • Explore Our Endpoints
    • with cURL
    • with Python API
    • with Postman
  • Authentication
    • OAuth2
    • Basic Auth
  • Error Codes
  • Emulator API
  • Examples
    • Pagination
    • Streaming Events
    • Touch to Identify
    • Refreshing Access Token
  • Reference
  • Status Page
  • Service Accounts
    • Introduction to Service Accounts
    • Creating a Service Account
    • Managing Access Rights
    • Permissions
    • Organizational Structures
  • Other
    • Application Notes
      • Generating a Room Temperature Heatmap
      • Modeling Fridge Content Temperatures
      • Outlier Detection on Multiple Temperature Sensors
      • Simple Temperature Forecasting for Substation Transformers
      • Sensor Data Insight with Power BI and Azure
      • Third-Party Sensor Data in DT Cloud
    • Frequently Asked Question
Powered by GitBook
On this page
  • Introduction
  • Sensor Placement
  • DT Studio Project Configuration
  • Project Authentication
  • Devices
  • Example Code
  • Source Access
  • Environment Setup
  • Usage
  • Implementation Details
  • Preprocessing
  • DBSCAN
  • Real-time Application
  • References

Was this helpful?

  1. Other
  2. Application Notes

Outlier Detection on Multiple Temperature Sensors

Last updated 1 year ago

Was this helpful?

Introduction

When running large-scale services, continuously monitoring asset temperatures can provide essential information for smooth long-term operation. Whether it is large office spaces, machinery in a production line, or server racks in a data center, multiple sensors are in some applications placed at once. If one or more sensors report temperature values deviating too far from the norm, preventative steps can be taken to avoid further degradation.

Due to their small size and long-lasting battery life, Disruptive Technologies (DT) Wireless Temperature Sensors are well suited for monitoring large amounts of assets in parallel. Employable in almost any environment, by measuring the temperature every 15 minutes, the data trend and behavior can be monitored and possible outliers can be caught in real-time.

In this application note, the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm is applied on a stream of 25 temperature sensors with the aim of catching outlier events. As shown in figure 1, the data from most sensors are pretty similar in both level and trend. Occurrences of sudden spikes or level shifts caught by the algorithm are therefore considered to be outliers where appropriate action can be taken.

Sensor Placement

If the aim is to highlight outlier behavior in the temperature originating from a specific device or environment, certain considerations should be taken when mountain the sensors. For instance, if room temperatures throughout a building are the source of interest, sensors should be placed away from external heating sources such as air-conditioning or direct sunlight. Otherwise, the algorithm might classify said external intervention as an outlier, resulting in false alarms.

DT Studio Project Configuration

The implementation is built around using the DT Developer API to interact with a single DT Studio project containing all temperature sensors for which outlier detection is performed. If not already done, a project needs to be created and configured to enable the API functionality.

Project Authentication

Devices

The script will use all temperature devices in the target project. Note that DBSCAN works better the more devices you include, preferably 10 or more.

Example Code

Source Access

Environment Setup

The code has been written in and tested for Python 3.9+. Dependencies can be installed using pip and the provided requirements text file.

pip3 install -r requirements.txt

Using your authentication details, set the following environment variables.

sensor_stream.py
export DT_SERVICE_ACCOUNT_KEY_ID='<YOUR_SERVICE_ACCOUNT_KEY_ID>'
export DT_SERVICE_ACCOUNT_SECRET='<YOUR_SERVICE_ACCOUNT_SECRET>'
export DT_SERVICE_ACCOUNT_EMAIL='<YOUR_SERVICE_ACCOUNT_EMAIL>'

Usage

If the example code is correctly authenticated to the DT Studio project as described above, running the script main.py will start streaming data from each desk sensor in the project for which outlier detection is performed as new data arrive.

python3 main.py

Use the -h flag to print additional flags available.

Implementation Details

Classifying data for outlier detection is an ongoing research field that has seen many approaches over the years. Lately, machine learning techniques have been the new frontier in this area at the cost of complexity. In contrast, clustering techniques can be comparably simple while still providing good performance. In particular, the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm has been found to provide good performance with relatively little parameter tweaking [1].

Preprocessing

Depending on the application, time-series data are often feature-engineered before being applied to a classification scheme. However, each sample in a time series of length NN can also be considered a feature in an NN-dimensional space and be applied directly. This was, during testing, found to result in much better performance than by extracting mean, kurtosis, skew, and other typical time-series features for cluster input.

DBSCAN

Real-time Application

The script can be extended to work in real time by utilizing the disruptive.Stream module in our Python API. Below is a short visualization of how outlier classification can work in real time. The GIF is significantly sped up here.

References

For authenticating the developer API against a Service Account in your DT Studio project, three separate authentication details have to be located, later to be used in the example code. If you're unfamiliar with the concept, refer to our .

An example code repository is provided in this application note. It illustrates one way of detecting outliers in multistream data and is meant to serve as a precursor for further development and implementation. It uses our to interact with the DT Studio project.

The example code source is publicly hosted on the official Disruptive Technologies GitHub repository under the MIT license. It can be found by following .

Compared to the likes of k-means clustering, DBSCAN does not require prior knowledge about the number of clusters in the data. It is also unsupervised, simplifying its use in many applications. One feature that makes it particularly useful for outlier detection is its notion of noise in the data. If a point does not fit in any cluster, it is classified as noise instead of the closest match. Figure 4 shows the result of applying DBSCAN on some synthetic data with two features. provides excellent animated visualizations of the clustering procedure.

When grouping the features into clusters, DBSCAN uses a distance metric, here , to determine if two or more points should be linked. For this, the two search parameters ϵϵ and pp must be given, where ϵϵϵ is the search radius and pp the minimum number of points that can define a cluster. When scanning the dataset, each NN-dimensional point is classified as one out of three possible categories. A core point is defined as one that neighbors at least pp other points within a distance of ϵϵϵ. A border point is one that can be reached by a core point, but does not fulfill the requirement itself, marking the edge of a cluster. If a point is not reached by any core point, it is defined as noise. Figure 5 shows an example of how points are classified to form a cluster.

Finding a balance between generalized behavior and performance is one of the challenges when choosing ϵϵϵ and ppp. Here, if we assume that an outlier does not correlate with other potential outliers, setting p=2p=2p=2 should result in said outliers being classified as noise by DBSCAN as there should be no other similar series. On the other hand, ϵϵϵ dynamically recalculated on each call to compensate for changes in the data. By finding the average of every time series in a window, ϵ is calculated as the median Euclidean distance from each series to the average.

Introduction to Service Accounts
Python API
this link
This website
https://scikit-learn.org/stable/auto_examples/cluster/plot_cluster_comparison.html
https://www.naftaliharris.com/blog/visualizing-dbscan-clustering/
https://en.wikipedia.org/wiki/Euclidean_distance
Euclidean distance
Figure 1: One week of temperature data from 25 DT Wireless Temperature Sensors where outlier events in the data caught by the DBSCAN algorithm are highlighted for visibility.
Figure 3: Windowing of the most recent 24 hours of data that are uniformly resampled before providing it as an N -dimensional input for the DBSCAN clustering algorithm.
Figure 4: DBSCAN applied to data in 2 dimensions, identifying two individual clusters and noise.
Figure 5: Cluster generation procedure of the DBSCAN algorithm where the ϵ neighborhood is found for each point, classifying said point as either noise, border, or core.
Figure 6: DBSCAN being continuously applied to 25 different temperature data streams in realtime as they arrive in the stream, highlighting outlier data that differentiates itself.