1 Introduction

The ECO (Electricity Consumption and Occupancy) data set is a comprehensive open-source (Creative Commons License CC BY 4.0) data set for non-intrusive load monitoring and occupancy detection research. It was collected in 6 Swiss households over a period of 8 months. For each of the households, the ECO data set provides:

1 Hz aggregate consumption data. Each measurement contains data on current, voltage, and phase shift for each of the three phases in the household.
1 Hz plug-level data measured from selected appliances.
Occupancy information measured through a tablet computer (manual labeling) and a passive infrared sensor (in some of the households).

2 Story

Gunther is a member of the state electricity department who has been assigned to do an analysis over the power consumption per appliance of 3 particular households. These households are namely:

‘04’ where the Tribbianis live,
‘05’ that belongs to the Gellers, and
‘06’, home of the Bings

Daily patterns of power consumption can be understood by examining the hourly usage of various appliances. Similarly, the daily use of various appliances can be used to understand monthly patterns of the same.

Usage of different appliances could mean what the household generally does around that time or on that day and when studied over a period of time, these insights can turn into patterns.

3 Data

Data from three households are provided, comprising power consumption of particular plugs/appliances per second over a period of time. The plugs data provides appliance-level consumption.

The files are coded in the following way. The first two digits (e.g., 01-06 identify the household). The next part {sm, plugs, occupancy} identifies the type of data (i.e., readings from the smart meter, the plugs and the occupancy ground truth). Lastly, where applicable, the suffix indicates the format of the data (either Matlab or plain CSV).

For the plugs data, the first two digits(e.g, 01-08 identify the appliance).

4 Data Science Questions

How do different households use different appliances on an hourly scale? Is there a common trend?
Based upon hourly usage of an appliance, can it be figured out what the household members generally do in a day? Accordingly, what suggestions can be given?
How is the plug-level data distributed among different appliances?

5 Data preparation

Gunther used the data for the 3 households by merging the entire data in two different frames:

hourly_df: This dataframe was created to study the trends of the power consumption of an appliance on an hourly basis.
daily_df: This dataframe was created to study the trends of the power consumption of an appliance on a daily basis.

The Measurement period for this data set is from 01.06.12 to 23.01.13. The data provides a value which is the real power measured by the plug of a particular appliance of every household.

For hourly analysis, he chose to group the data by household, appliance and hour so that it gives the power consumption of a particular appliance for 24 hours over a period of 7 months.

Note: Every power consumption value is in Watts.

Lets walk through code-book that he created.

5.1 Importing the libraries

This step needs no explanation. Required packages must always be loaded.

Code

import pandas as pd
import altair as alt
import plotly.graph_objects as go

import glob
import os

import warnings
warnings.filterwarnings('ignore')

5.2 Data wrangling, munging and cleaning

This is an interesting section. We will witness how the data was merged and what other techiniques were used to preprocess it.

The function below as it’s name suggests, will return the appliance name against the number mentioned in the document for different households.

Code

# Function to get the appliance name from their respective number. 
# Please note that appliances in different households might have the same name but different numbers.
def get_appliance_name(house, appliance_num):
  # Create a dictionary for the appliances of household 04 as per 04_doc.txt
  house_04_plugs = {'01': 'Fridge', '02': 'Kitchen appliances', '03': 'Lamp', '04': 'Stereo and laptop', '05': 'Freezer', 
                  '06': 'Tablet', '07': 'Entertainment', '08': 'Microwave'}
  
  # Create a dictionary for the appliances of household 05 as per 05_doc.txt
  house_05_plugs = {'01': 'Tablet', '02': 'Coffee machine', '03': 'Fountain', '04': 'Microwave', '05': 'Fridge', 
                '06': 'Entertainment', '07': 'PC', '08': 'Kettle'}

  # Create a dictionary for the appliances of household 06 as per 06_doc.txt
  house_06_plugs = { '01': 'Lamp', '02': 'Laptop', '03': 'Router', '04': 'Coffee machine', '05': 'Entertainment', 
                  '06': 'Fridge', '07': 'Kettle'}
  
  # 'if-else' case that will return the respective name for the given number
  if house == '04':
      return house_04_plugs[appliance_num]
  elif house == '05':
      return house_05_plugs[appliance_num]
  else:
      return house_06_plugs[appliance_num]

Next we come across the function which Gunther might’ve used for creating the dataframe(daily_df) mentioned above. This function accepts the csv file, the filename and the house number and returns a dataframe containing the arguments passed along with the total power consumed by the appliance in a day.

Code

# Function that returns a dataframe comprising of the appliance's total daily power consumption
def consump_per_day(df, filename, house):
    # Create a empty dataframe with columns household, appliance, date and total consumption
    daily_consump = pd.DataFrame(columns=['household', 'appliance', 'date', 'total_consumption'])
    # House number
    daily_consump['household'] = [house]
    # Extract appliance number from filename using string slicing. Pass the house number and appliance to the utility function created above to get the name of the appliance.
    daily_consump['appliance'] = [get_appliance_name(house, filename[-17:-15])]
    # Extract the date from filename using string slicing. Convert the date using pd.to_datetime() in a YYYY-MM-DD format.
    daily_consump['date'] = [pd.to_datetime(filename[-14:-4]).strftime('%Y-%m-%d')]
    # Since the csv file contains power consumption per second and has 86,400 rows. Sum up the consumption so that we get the total power consumption for 24 hours/day.
    daily_consump['total_consumption'] = [df['consumption'].sum()]

    # Return the created dataframe
    return daily_consump

After creating the dataframe for daily power consumption, Gunther defined the below function for the hourly analysis. This function accepts the csv file, the filename and the house number and returns a dataframe containing the arguments passed along with the hour and the power consumed by the appliance in that particular hour.

Code

# Function that returns a dataframe comprising of the appliance's total hourly power consumption
def consump_per_hour(df, filename, house):
    # Extract appliance number from filename using string slicing. Pass the house number and appliance to the utility function created above to get the name of the appliance.
    plug_name = get_appliance_name(house, filename[-17:-15])
    # Extract the date from filename using string slicing. Convert the date using pd.to_datetime() in a YYYY-MM-DD format.
    date = pd.to_datetime(filename[-14:-4]).strftime('%Y-%m-%d')
    # Create a empty dataframe with columns household, appliance, date, hour and hourly consumption
    hourly_consump = pd.DataFrame(columns=['household', 'appliance', 'date', 'hour', 'hourly_consumption'])
    # Total number of seconds i.e 86,400 will be used ahead to calculate the hour
    total_time = len(df['consumption'])

    # List that will append the house number
    house_list= []
    # List that will append the appliance name
    appliance_list= []
    # List that will append the date
    date_list= []
    # List that will append the hour
    hour_list= []
    # List that will append the hourly consumption
    consump_list= []

    # Initialize variable for the while loop
    i = 0
    while i < total_time:
      # Append the values to their respective lists created above
      house_list.append(house)
      appliance_list.append(plug_name)
      date_list.append(date)
      # Offset of 3600 has been used to calculate the hourly power consumption
      consump_list.append(df['consumption'][i:i+3600].sum())
      i += 3600
      # Calculate the hour and append it to the list
      hour_list.append(i//3600)

    # Use the lists to fill up the dataframe.
    hourly_consump['household'] = house_list
    hourly_consump['appliance'] = appliance_list
    hourly_consump['date'] = date_list
    hourly_consump['hour'] = hour_list
    hourly_consump['hourly_consumption'] = consump_list

    # Return the created dataframe
    return hourly_consump

Now that we’ve seen three utility functions, it’s time that we come across the driver method. Gunther used the below function to create the dataframes required. It takes the path as an input argument and returns a tuple of two dataframes (daily_df and hourly_df). Lets try to understand what he has done so far.

Code

# Function to create the required dataframe for analysis.
def create_plug_df(path):
  # House number list for the 3 households
  house_list = ['04', '05', '06']
  # This string will be used to create the complete path
  plug_st = '_plugs_csv'
  # Create utility lists for power consumption
  # This list will contain individual dataframes for total daily power consumption of every appliance in every house
  total_daily_consump = []
  # This list will contain individual dataframes for total hourly power consumption of every appliance in every house
  total_hourly_consump = []
  
  # iterate over the houses
  for house in house_list:
    # Since there are multiple folders for multiple appliances per house, lets create a variable which can be used later on to iterate over, get data and do calculations
    folders = 0
    # Create the path using the input argument, the house-number and the plug_st sting.
    # The path should look like: eco/0X/0X_plugs_csv/
    DIR = path+ '/'+house+'/'+house+plug_st+'/'
    # Calculate the number of folders for each appliance by walking in the above directory path using os.walk(DIR)
    for _, dirnames, filenames in os.walk(DIR):
        folders += len(dirnames)
    
    # Create utility lists for power consumption
    # This list will contain individual dataframes for total daily power consumption of every appliance per household
    plug_daily_use = []
    # This list will contain individual dataframes for total hourly power consumption of every appliance per household
    plug_hourly_use = []

    # iterate over the number of folders/appliances
    for i in range(1, folders+1):
      # Form the file path using the directory path created above and the folder number
      file_path = DIR+ '0' + str(i)+ '/'
      # Get every file in the folder using glob
      all_files = glob.glob(os.path.join(file_path, "*.csv"))
      # Create utility lists for days and hours.
      # This list will contain individual dataframes for total daily power consumption per appliance per household
      day_list = []
      # This list will contain individual dataframes for total hourly power consumption per appliance per household
      hour_list = []
      # Iterate over the files in the folder
      for filename in all_files:
        # Read the csv file
        df = pd.read_csv(filename, names=['consumption'], header=None)
        # Take care of missing data. Head over to the missing data section to check the significance of the below code line.
        df = df.replace(-1, 0)
        
        # Form the dataframe that contains the total daily consumption of the appliance in the house
        day_df = consump_per_day(df, filename, house)
        # Form the dataframe that contains the total hourly consumption of the appliance in the house
        hour_df = consump_per_hour(df, filename, house)

        # Append the dataframes to their respective lists.
        day_list.append(day_df)
        hour_list.append(hour_df)
      
      # Concatenate the dataframes in the list to get the merged dataframe that contains the power consumption of every appliance per house
      day_frame = pd.concat(day_list, axis=0, ignore_index=True).sort_values(by='date')
      hour_frame = pd.concat(hour_list, axis=0, ignore_index=True).sort_values(by=['date', 'hour'])

      # Append the dataframes to their respective lists
      plug_daily_use.append(day_frame)
      plug_hourly_use.append(hour_frame)

    # Concatenate the dataframes in the list to get the merged dataframe that contains the power consumption of every appliance in every house
    house_daily_consump= pd.concat(plug_daily_use, axis=0, ignore_index=True)
    house_hourly_consump= pd.concat(plug_hourly_use, axis=0, ignore_index=True)
    
    # Append the dataframes to their respective lists
    total_daily_consump.append(house_daily_consump)
    total_hourly_consump.append(house_hourly_consump)
  
  # Concatenate the dataframes in the list to get the merged dataframe that contains the total daily power consumption of every appliance in every house 
  daily_df = pd.concat(total_daily_consump, axis=0, ignore_index=True)
  # Concatenate the dataframes in the list to get the merged dataframe that contains the total hourly power consumption of every appliance in every house 
  hourly_df = pd.concat(total_hourly_consump, axis=0, ignore_index=True)

  # Return the created dataframes
  return daily_df, hourly_df

Now that the driver and utility functions have been defined, it’s time to execute them with the necessary input argument(s). Below code cell does exactly that. Notice that Gunther kept the data outside of the github cloned folder reason being the size of the data was too large to be hosted on github. Lets see the how the two required dataframes for this analysis look like.

Code

# Define path
path = './eco'
# Execute the driver function to get the tuple containing dataframes
plug_df = create_plug_df(path)
# Get the daily_df
daily_df = plug_df[0]
# Get the hourly_df
hourly_df = plug_df[1]

Below is how the dataframe containing total daily power consumption per appliance per household looks like.

Code

daily_df.head()

	household	appliance	date	total_consumption
0	04	Fridge	2012-06-27	2.633352e+06
1	04	Fridge	2012-06-28	2.357230e+06
2	04	Fridge	2012-06-29	2.197841e+06
3	04	Fridge	2012-06-30	3.081946e+06
4	04	Fridge	2012-07-01	2.777104e+06

Below is how the dataframe containing total hourly power consumption per appliance per household looks like.

Code

hourly_df.head()

	household	appliance	date	hour	hourly_consumption
0	04	Fridge	2012-06-27	1	92781.80686
1	04	Fridge	2012-06-27	2	109692.01842
2	04	Fridge	2012-06-27	3	71697.31514
3	04	Fridge	2012-06-27	4	96917.73950
4	04	Fridge	2012-06-27	5	110460.75213

5.2.1 Taking care of missing data

It’s mentioned in the document that missing values are present in the data. However, they are denoted as ‘-1’. After looking at the dataframes created, we can say that Gunther has replaced these values with ‘0’. Replacing with zero sounds optimal reason being ‘-1’ may have an effect over the visualization and can make it absurd. A zero value won’t hurt the analysis.

Just to make sure that information provided was true, Gunther created the below function to have a look at the missing value statistics of a dataframe.

Code

# Define a function that returns a data-frame of missing data statistics
def missing_val_stats(df):
    # Define columns of the data-frame
    df_stats = pd.DataFrame(columns = ['column', 'unique_val', 'num_unique_val', 'num_unique_val_nona', 
                                       'num_miss', 'pct_miss'])
    tmp = pd.DataFrame()
    
    for c in df.columns:
        # Column
        tmp['column'] = [c]
        # Unique values in the column
        tmp['unique_val'] = [df[c].unique()]
        # Number of unique values in the column
        tmp['num_unique_val'] = len(list(df[c].unique()))
        # Number of unique values in the column without nan
        tmp['num_unique_val_nona'] = int(df[c].nunique())
        # Number of missing values in the column
        tmp['num_miss'] = df[c].isnull().sum()
        # Percentage of missing values in the column
        tmp['pct_miss'] = (df[c].isnull().sum()/ len(df)).round(3)*100
        # Append the values to the dataframe
        df_stats = df_stats.append(tmp)
    
    # Return the created dataframe
    return df_stats

Lets find out if there is any missing data in either of the two dataframes created.

Code

# Missing value statistics for daily_df
missing_val_stats_daily = missing_val_stats(daily_df)
missing_val_stats_daily

column	unique_val	num_unique_val	num_unique_val_nona
household	[04, 05, 06]	3	3
appliance	[Fridge, Kitchen appliances, Lamp, Stereo and ...	14	14
date	[2012-06-27, 2012-06-28, 2012-06-29, 2012-06-3...	220	220
total_consumption	[2633352.2361399997, 2357230.3723299997, 21978...	3678	3678

Code

# Missing value statistics for hourly_df
missing_val_stats_hourly = missing_val_stats(hourly_df)
missing_val_stats_hourly

column	unique_val	num_unique_val	num_unique_val_nona
household	[04, 05, 06]	3	3
appliance	[Fridge, Kitchen appliances, Lamp, Stereo and ...	14	14
date	[2012-06-27, 2012-06-28, 2012-06-29, 2012-06-3...	220	220
hour	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	24	24
hourly_consumption	[92781.80686, 109692.01842000001, 71697.315139...	55271	55271

We can clearly see that there are no missing values in both the dataframes. Gunther also has very well handled the data.

Now that the data has been pre-processed, lets move on to the EDA section.

6 EDA

Gunther began by looking at the total daily power consumption of every appliance per house, by date. He built a heatmap with the date(YYYY-MM-DD) on the x axis and the total daily power consumption(Watts) on the y axis.

6.0.1 Heatmap of average daily power consumption of every appliance per house, by date for the entire period

Code

alt.renderers.enable('default')
# Define dropdown menus for filtering
house_dropdown = alt.binding_select(options=list(daily_df['household'].unique()), name='House')

# Create selection objects
house_selection = alt.selection_single(fields=['household'], bind=house_dropdown, name='Select')
selection = alt.selection_single(fields=['appliance'], name='Select')

# Filter the dataframe as per total power consumption and appliance
plugs_filtered = daily_df[['total_consumption','appliance']]
# Get the average daily consumption of each appliance per house using groupby and mean()
plugs_averages = pd.DataFrame(plugs_filtered.groupby('appliance')['total_consumption'].mean()) 
# Reset index of the groupby object
plugs_averages = plugs_averages.reset_index()

# Plot the heatmap
heatmap_daily_consump = (alt.Chart(daily_df)
 .mark_rect()
 .encode(x='date:O',
         y='appliance:N',
         color=alt.Color('total_consumption:Q',
                         scale=alt.Scale(scheme='orangered'),
                         legend=alt.Legend(type='symbol')
                        ),
         tooltip=['total_consumption','appliance:N'],
        ).add_selection(house_selection).transform_filter(selection).transform_filter(house_selection)
)

# Add title to the heatmap
heatmap_daily_consump.title ="Average daily power consumption in Watts of appliance per house"
# Add x-axis label to the heatmap
heatmap_daily_consump.encoding.x.title = 'Date (YYYY-MM_DD)'
# Add y-axis label to the heatmap
heatmap_daily_consump.encoding.y.title = 'Appliance name'

# Display the heatmap
heatmap_daily_consump

6.0.2 Findings

Using the information from the heatmap above, Gunther found out that out of all appliances in every household, Freezer uses the most power.

Freezer belongs to the Tribbianis(House number ‘04’). Joey Tribbiani needs to know the power consumption of this plug and get used to more homely foods rather than just depending upon eating frozen stuff. This will help him live a healthier life and save some money on the electricity bill too.

Entertainment devices and Fridge account for the highest used appliances in the Geller (House number ‘05’) household. Maybe Ross Geller is used to keeping a six-pack of beer handy in the fridge so he can just chill later while watching a dinosaur documentary on TV.

As for the Bings (House number ‘06’), looks like Chandler Bing is without a job. Entertainment devices consume most of the power in this house. Even if we compare the power consumption of the 3 houses, the Bings will have lowest total power consumption. Either the Bings are saving, or the former hypothesis about Chandler Bing is correct.

6.0.3 Histogram showcasing total daily power consumption of all appliances per house.

Gunther has plotted the below chart that displays histogram of the total daily power consumption of all appliances per house in the dataset.

He has also provided the list of appliances available in each household below so that it’s easier for the audience to play with the chart and observe the trends.

Appliances available in house 04:
1. Entertainment
2. Freezer
3. Fridge
4. Kitchen appliances
5. Lamp
6. Microwave
7. Stereo and laptop
8. Tablet

Appliances available in house 05:
1. Coffee machine
2. Entertainment
3. Fountain
4. Fridge
5. Kettle
6. Microwave
7. PC
8. Tablet

Appliances available in house 06:
1. Coffee machine
2. Entertainment
3. Fridge
4. Kettle
5. Lamp
6. Laptop
7. Router

Code

alt.renderers.enable('default')
# Define dropdown menus for filtering
house_dropdown = alt.binding_select(options=list(daily_df['household'].unique()), name='House')
appliance_dropdown = alt.binding_select(options=list(daily_df['appliance'].unique()), name='Appliance')

# Create selection objects
house_selection = alt.selection_single(fields=['household'], bind=house_dropdown, name='House')
appliance_selection = alt.selection_single(fields=['appliance'], bind=appliance_dropdown, name='Appliance')

# Create the histogram
hist = alt.Chart(daily_df).mark_bar().encode(
    alt.X('date:T', title='Month'),
    alt.Y('total_consumption:Q', title='Daily Power Consumption (Watts)'),
    alt.Color('appliance:N', title='Appliance', scale=alt.Scale(scheme='category20')),
    alt.Column('household:Q', title='House', sort=list(daily_df['household'].unique())),
    tooltip=['date:T', 'appliance', 'total_consumption']
).properties(
    title='Daily Power Consumption of different appliances in different Houses'   # title of the chart
).add_selection(
    appliance_selection, house_selection
).transform_filter(
    appliance_selection
).transform_filter(
    house_selection
)

# Display the histogram
hist

6.0.4 Findings

We see a similar pattern like the heatmap. One advantage of this plot over the heatmap is the two dropdowns that make this chart more interactive. Perhaps why Gunther chose to plot this chart. We are able to filter the chart using the dropdown for house and further filter the histograms using the dropdown for the appliances.

The above chart also solidifies the hypothesis that Freezer is by far the appliance/plug that uses the most power, meaning it will likely be the differentiting factor that determines if a household is on the higher or lower end of power consumption.

7 Results

7.1 Plotly

Lets have a look at the analysis of the average hourly power consumption of the appliances used in different households done by Gunther.

To calculate the average hourly power consumption of every appliance, Gunther has used ‘groupby’ to filter hourly_df as per household, appliance and hour.

Code

plug_df = hourly_df.groupby(['household', 'appliance', 'hour']).hourly_consumption.mean().reset_index()
plug_df.rename(columns={'hourly_consumption': 'avg_hourly_consump'}, inplace=True)
plug_df

	household	appliance	hour	avg_hourly_consump
0	04	Entertainment	1	33151.008792
1	04	Entertainment	2	33105.222588
2	04	Entertainment	3	33432.210976
3	04	Entertainment	4	33560.410371
4	04	Entertainment	5	34033.396577
...	...	...	...	...
547	06	Router	20	70623.687756
548	06	Router	21	70188.732572
549	06	Router	22	70078.231133
550	06	Router	23	69755.983373
551	06	Router	24	69847.458119

552 rows × 4 columns

Now that the data has been grouped, Gunther has plotted a bar chart for his analysis.

7.1.1 Design Decisions:

Visual Encoding : Color has been chosen as a visual encoder which splits the appliances from each other. This has been done to keep the attention span of the audience on the graph and not get lost in multiple bars. This even makes the graphs look simple and elegant and pleasing to the eye.
Axis Choices : Appliances have been kept on the x axis. Visually it makes more sense to keep the text in a horizontal way and thus this decision has been made. Average power consumption has been kept at the y axis to mark values.
Frames : Every single hour of the day has been kept as a frame to study daily patterns. This is important as it is very pertinent to study the change in power consumption on an hourly basis.
Dropdown : A dropwdown for house number has been included which helps the user interact with the graph and see the changes across different appliances of different houses every hour of the day. This makes the graph more interactive which further helps in engaging the audience.
Titles and Axis labels : Titles and axis labels have been kept as per the analysis so that audience knows what is there in the chart.

Code

# GET UNIQUE VALUES
appliances = plug_df["appliance"].unique()
hours = plug_df["hour"].unique()
houses = plug_df["household"].unique()

# INITIALIZE GRAPH OBJECT
fig = go.Figure()

# Make a trace for each possible plotting scenario, by iterating over a double for loop over houses and date.
for house in houses:
    for hr in hours:
        # ISOLATE ONE HOUR OF DATA FOR PLOTTING
        df_hour = plug_df.query(f"hour == {hr}")

        # Add trace to the figure
        fig.add_trace(
            # Specify the type of the trace
            go.Bar(x = df_hour.loc[df_hour["household"] == house]['appliance'], 
                   y=df_hour.loc[df_hour["household"] == house]['avg_hourly_consump'],
                   # Specify whether or not to make data-visible when rendered
                   visible= False))

# Change the color of individual bars
fig.update_traces(marker_color=['red', 'green', 'blue', 'yellow', 'brown', 'purple', 'orange', 'pink'])

# MAKE FIRST TRACE VISIBLE
fig.data[0].visible = True

# Create and add slider for hours
# Step 1: Define steps for slider
steps = []
for i,house in enumerate(houses):
    step = []
    for j,hour in enumerate(hours):
        visible = [False] * len(fig.data)
        visible[i*len(hours)+j] = True
        step_dict = {'method': 'update', 
        'args': [{'visible': visible}],
        'label':str(hour)}
        step.append(step_dict)
    steps.append(step)

# Step 2: Define sliders for the figure
sliders=[]
for i,house in enumerate(houses):   
    slider = [
        dict(
            active=0, 
            currentvalue={"prefix": "~hours: "},
            transition={"duration": 300, "easing": "cubic-in-out"},
            pad={"t": 50},
            steps=steps[i])
    ]
    sliders.append(slider)

# Initialize slider for number of hours
fig.update_layout(sliders=sliders[0])

# DEFINE VISIBILITY OF PLOTS FOR DROPDOWN BUTTONS
button_visible = []
for i in range(0,len(fig.data),len(houses)):
    visible = [False] * len(fig.data)
    visible[i] = True
    button_visible.append(visible)

# DEFINE BUTTONS FOR DROPDOWN
buttons =[]
for i,house in enumerate(houses):
    button = dict(
        label=house,

        # MODIFICATION TYPE
        method="update",

        # BOOLEAN VALUES FOR EACH TRACE
        # Note that each house has it's own slider
        args=[{"visible" :button_visible[i]},{"sliders" : sliders[i]}],
    )
    buttons.append(button)

# ADD DROPDOWN TO CHANGE TYPE
fig.update_layout(
    updatemenus=[
        dict(
            buttons=buttons,
            # VARIABLES FOR DROPDOWN DIRECTION AND LOCATION
            direction="down",
            showactive=True,
            pad={"r": 10, "t": 10},
            x=0.935,
            xanchor="left",
            y=1.3,
            yanchor="top",
        ),
    ]
)

# Set figure layout
fig.update_layout(
    title="Average hourly power consumption of different appliances in different households",
    xaxis_title="Appliances",
    yaxis_title="Average power consumption (W)",
)

fig.show()

7.1.2 Findings

Below are Gunther’s findings and insights regarding the bar chart plotted above.

1. House number ‘04’:

There is no change and also very less power consumption from the time 1 a.m to 6 a.m which might be because the Tribbianis generally sleep at that time. This becomes our first finding, that the Tribbianis generally sleep from around 12 a.m to 6 a.m.

The first major change occurs at 7 a.m where consumption of Kitchen Appliances goes high. From the documentation we know the kitchen appliances constitute coffee machine, the bread baking machine and the toaster. This tells us that the Tribbianis tend to have their breakfast around 7 a.m. Their breakfast habits seems to be them having coffee with some bread as the usage of other kitchen appliances like microwave is very low around this time.

Another huge change can be seen at 12 pm where the consumption of Microwave shoots up. This tells us about the lunch time of the Tribbianis. This could potentially also mean that this is working household as the usage of only microwave is only high around this time which could mean that they just came to grab their lunch on the go.

The entertainment bar goes high for the first time around 2 pm. This could mean after making or heating their lunches, the Joey Tribbiani genrally likes to enjoy his meal with some entertainment.

Microwave and Kitchen Appliances usage shoots up again around 7 pm which could be their dinner time.

The entertainment goes high up again around 8 - 9 pm which is again right after their meal.

2. House number ‘05’:

Again, there is no change and also very less power consumption from the time 1 a.m to 5 a.m which might be because the Gellers are sleeping too at that time. But here’s something absurd that happens at 5 a.m. The entertainment shoots up a bit from 5 a.m to 6 a.m. It can be that someone wakes up at that time to watch maybe their favourite TV series episode which might be airing in a different continent at that moment.

The first major change occurs at 7 a.m where consumption of Coffee machine goes high. The Gellers wake up at the this time, have their morning coffee and start their routine.

Next we witness that at 9 a.m, the power consumption of PC goes up. It can be that someone in the Geller house (Ross/Monica) is working remotely and their job starts at 9 a.m.

Again at 12 pm a similar huge change that was observed for the Tribbiani househole can be observed where the consumption of Microwave increases. This can mean that Gellers too prefer to have their lunch at the same time. Upon seeing the usage of PC, we can rightfully assume that this is a working household and that they too prefer to just grab their lunch on the go.

1 p.m again is the coffee break since the Coffee machine consumption goes higher.

One crucial trend that we can witness is that from 5.pm to 11 p.m, the entertainment bar goes high. This can mean that the Gellers got free from their daily jobs and now are watching TV or playing PS/XBOX or maybe listeing to their favourite songs on a speaker. This interval accounts for their leisure time.

Gellers like to have coffee during their leisure time too as we can observe that at 6 p.m, the consumption of coffee machine increases again!

From 9 p.m - 11 p.m, the power usage of Fountain increases. Again this can be a part of the leisure time of the Gellers. Maybe they are chilling in their living/ bed room with a view of the fountain to relax themselves.

Overall, we can say that Gellers look like the caffiene dependent working household who relax once they get free from their jobs.

3. House number ‘06’:

One thing to note for this household is that entertainment bar always remains high. This can either mean that the Bings prefer enterntainment too much or that the reading has not been correctly recorded. Similar is the case with the router. Looks like the router for this household is never swtiched off which means that the Bings are constantly using wifi.

At & a.m, we see a major change with the bar for Coffee machine going up. We can say that Bings sleep from 1 a.m - 7 a.m. However, given the entertainment bar, not sure if every Bing prefers to sleep during this interval.

At 8 a.m, the consumption of Kettle rises. At 9 a.m, the bar for Coffee machine again increases. This can mean that the Bings prefer to have their breakfast and coffee in the 7 a.m - 9 a.m interval.

This data seems a bit weird for a household. First of all, this house does not have a microwave or any other Kitchen appliance for food. There is a lamp, but it is barely used. Entertainment and Router account for most of the hourly average power consumption in a day.

This can mean that either the data recorded for this household might be incorrect which will be a limitation or the fact that this is not a household but an entertainment centre maybe.

7.1.3 Summary

Using Gunther’s analysis, we are able to witness how different households use different appliances on an hourly basis. We see that there is a common trend of waking up in the morning, having the morning coffee to kickstart the day, and a common time for lunch and entertainment.

Gunther’s visualization of the hourly usage of an appliance in a household led us to assume the activities of that particular household. It also led us to predict whether it was a working household or not and whether the house members preferred having coffee or not.

Using appropriate visualization, Gunther has been able to answer the first two data science questions.

7.2 Altair

The distribution of power consumption among various appliances was another area of concern for Gunther, who was looking for any notable variations that would lead to inconsistent usage patterns. He created a collection of Altair plots to accomplish this, allowing users to visualize both the overall distribution and specific distributions for each appliance. The interactive feature makes it simple to compare both the overall and individual appliance usage because users may click on a specific appliance bar to get a segmented view of the relevant distribution. This function offers a comprehensive viewpoint, allowing users to spot any potential opportunities for more effective energy utilization.

Overall, the mix of Altair graphs provides a practical and educational way to examine how much power is used by various plugs, providing insights into how to best use energy.

7.2.1 Design Decisions:

Visual Encoding : Color has been chosen as a visual encoder which splits the appliances from each other. This has been done to keep the attention span of the audience on the graph and not get lost in multiple lines. For the histogram, a single color (blue) has been chosen. Initially everything is selected so the audience will see every line in the chart. However, once the audience selects an appliance from the histogram, the line chart gets filtered. This makes the graph look elegant and pleasing to the eye.
Axis Choices : For histogram, appliances have been kept on the x axis and for the line chart, the date has been kept on the x axis. Total daily power consumption has been kept at the y axis for the both the charts to mark values.
Selector : Every single appliance can be chosen from the histogram as a filter panel. This has been done so that the audience can see the different trends of different appliances in the line chart over a period of time.
Legend : A legend has been included that indicates different colors of different line patterns of every appliance. This will help the audience distinguish and select the appliance they wish to observe the trend of the power consumption for.
Titles and Axis labels : Titles and axis labels have been kept as per the analysis so that audience knows what is there in the chart.

Code

alt.renderers.enable('default')
# Create a copy of the DataFrame
plugs_copy = daily_df.copy()

# Define dropdown menus for filtering
house_dropdown = alt.binding_select(options=list(daily_df['household'].unique()), name='House')

# Create selection objects
house_selection = alt.selection_single(fields=['household'], bind=house_dropdown, name='House')

# Create a selection object
selection = alt.selection_single(fields=['appliance'], name='Select')

# Set the color based on whether an appliance is selected or not
color = alt.condition(selection,
                      alt.value('#0569e3'),
                      alt.value('#c2c3c4'))

# Create a bar chart of the total daily consumption of different appliances
bar = (alt.Chart(daily_df)
       .mark_bar()
       .encode(
           y=alt.Y('total_consumption:Q', axis=alt.Axis(title='Daily Consumption (Watts)')),
           x=alt.X('appliance:N', sort='-y', axis=alt.Axis(title='Appliance')),
           color=color
       )
       .properties(title = "Daily Power Consumption of Appliance")
       .add_selection(selection)
      )


# Set chart properties
bar.properties(
    width=600,
    height=600,
    title='Daily Power Consumption of Appliance'
)

# Set the color based on whether an appliance is selected or not
color2 = alt.condition(selection,
                      alt.Color('appliance:N'),
                      alt.value('white'))

# Create a line chart of the total daily consumption of different appliances
line1=(alt.Chart(daily_df)
 .mark_line()
 .encode(x=alt.X('date:T'),
         y='total_consumption:Q',
         color=color2,
         tooltip=['date:T', 'appliance', 'total_consumption']
        )
 .properties(width=800, height=400)
).interactive()

# Set chart layout
line1.title ="Daily Consumption by Appliance in every household"
line1.encoding.x.title = 'Date (YYYY-MM-DD) (Zoom in to see dates)'
line1.encoding.y.title = 'Daily Consumption (W)'
line1.add_selection(selection)

# Display the charts
bar | line1

7.2.2 Findings

Gunther’s analysis’s most surprising finding is that the freezer’s power usage has multiple peaks, separated by a sizable gap. This pattern is probably caused due to a change of season. There is a big spike in freezer usage in the summer followed by a marked decline in the winter. The utilization patterns of the other appliances, however, are less inconsistent and varied.

This finding helps in answering the third data science question.

7.2.3 Summary

His research emphasizes how crucial it is to take seasonal elements into account when examining power consumption patterns in order to maximize energy utilization. Understanding the seasonal fluctuations in appliance usage enables homes to implement efficient energy-saving measures, such as lowering appliance usage during times of high demand, to lower overall power consumption.

8 References

Wilhelm Kleiminger, Christian Beckel, Silvia Santini Household Occupancy Monitoring Using Electricity Meters. Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp 2015). Osaka, Japan, September 2015.
Christian Beckel, Wilhelm Kleiminger, Romano Cicchetti, Thorsten Staake, and Silvia Santini The ECO Data Set and the Performance of Non-Intrusive Load Monitoring Algorithms. Proceedings of the 1st ACM International Conference on Embedded Systems for Energy-Efficient Buildings (BuildSys 2014). Memphis, TN, USA. ACM, November 2014.
ANLY-503 - Lab 4.1
ANLY-503 - Lab 4.2
https://plotly.com/graphing-libraries/