The Smart Grid: June 2020

Thursday, June 25, 2020

Investigating Synthetic Data for Load Disaggregation

Electrical consumption data contain a wealth of information, and their collection at scale is facilitated by the deployment of smart meters. Data collected this way is an aggregation of the power demands of all appliances within a building, hence inferences on the operation of individual devices cannot be drawn directly. By using methods to disaggregate data collected from a single measurement location, however, appliance-level detail can often be reconstructed. A major impediment to the improvement of such disaggregation algorithms lies in the way they are evaluated so far: Their performance is generally assessed using a small number of publicly available electricity consumption data sets recorded from actual buildings. As a result, algorithm parameters are often tuned to produce optimal results for the used datasets, but do not necessarily generalize to different input data well.

"We propose to break this tradition by presenting a toolchain to create synthetic benchmarking data sets for the evaluation of disaggregation performance in this work. Generated synthetic data with a configurable amount of concurrent appliance activity is subsequently used to comparatively evaluate eight existing disaggregation algorithms." - Christoph Klemenjak

Instead of attempting to compile a benchmarking corpus from existing data sets, we present a methodological way to synthetically create data sets of definable disaggregation complexity. A high degree of realism can be accomplished by using accurate models of existing appliances and user activities. By forwarding synthetically generated data of gradually increasing levels of concurrent appliance activity to state-of-the-art disaggregation algorithms, we determine their sensitivity to specific data characteristics in a much more fine-grained way.

We present a toolchain, ANTgen, that generates synthetic macroscopic load signatures for their use in conjunction with NILM (load disaggregation) tools. By default, it runs in scripted mode (i.e., with no graphical user interface) and processes an input configuration file into a set of CSV output files containing power consumption values and the timestamps of their occurrence, as well as a file summarizing the events that have occurred during the simulation). If you find this tool useful and use it (or parts of it), we ask you to cite the following work in your publications:

Andreas Reinhardt and Christoph Klemenjak. 2020. How does Load Disaggregation Performance Depend on Data Characteristics? Insights from a Benchmarking Study. In Proceedings of the Eleventh ACM International Conference on Future Energy Systems (e-Energy ’20). Association for Computing Machinery, New York, NY, USA, 167–177.

Learn more about the authors Andreas Reinhardt and Christoph Klemenjak.