The ECO (Electricity Consumption and Occupancy) data set is a comprehensive open-source (Creative Commons License CC BY 4.0) data set for non-intrusive load monitoring and occupancy detection research. It was collected in 6 Swiss households over a period of 8 months. For each of the households, the ECO data set provides:
1 Hz aggregate consumption data. Each measurement contains data on current, voltage, and phase shift for each of the three phases in the household.
1 Hz plug-level data measured from selected appliances.
Occupancy information measured through a tablet computer (manual labeling) and a passive infrared sensor (in some of the households).
2 Story
Gunther is a member of the state electricity department who has been assigned to do an analysis over the power consumption per appliance of 3 particular households. These households are namely:
‘04’ where the Tribbianis live,
‘05’ that belongs to the Gellers, and
‘06’, home of the Bings
Daily patterns of power consumption can be understood by examining the hourly usage of various appliances. Similarly, the daily use of various appliances can be used to understand monthly patterns of the same.
Usage of different appliances could mean what the household generally does around that time or on that day and when studied over a period of time, these insights can turn into patterns.
3 Data
Data from three households are provided, comprising power consumption of particular plugs/appliances per second over a period of time. The plugs data provides appliance-level consumption.
The files are coded in the following way. The first two digits (e.g., 01-06 identify the household). The next part {sm, plugs, occupancy} identifies the type of data (i.e., readings from the smart meter, the plugs and the occupancy ground truth). Lastly, where applicable, the suffix indicates the format of the data (either Matlab or plain CSV).
For the plugs data, the first two digits(e.g, 01-08 identify the appliance).
4 Data Science Questions
How do different households use different appliances on an hourly scale? Is there a common trend?
Based upon hourly usage of an appliance, can it be figured out what the household members generally do in a day? Accordingly, what suggestions can be given?
How is the plug-level data distributed among different appliances?
5 Data preparation
Gunther used the data for the 3 households by merging the entire data in two different frames:
hourly_df: This dataframe was created to study the trends of the power consumption of an appliance on an hourly basis.
daily_df: This dataframe was created to study the trends of the power consumption of an appliance on a daily basis.
The Measurement period for this data set is from 01.06.12 to 23.01.13. The data provides a value which is the real power measured by the plug of a particular appliance of every household.
For hourly analysis, he chose to group the data by household, appliance and hour so that it gives the power consumption of a particular appliance for 24 hours over a period of 7 months.
Note: Every power consumption value is in Watts.
Lets walk through code-book that he created.
5.1 Importing the libraries
This step needs no explanation. Required packages must always be loaded.
Code
import pandas as pdimport altair as altimport plotly.graph_objects as goimport globimport osimport warningswarnings.filterwarnings('ignore')
5.2 Data wrangling, munging and cleaning
This is an interesting section. We will witness how the data was merged and what other techiniques were used to preprocess it.
The function below as it’s name suggests, will return the appliance name against the number mentioned in the document for different households.
Code
# Function to get the appliance name from their respective number. # Please note that appliances in different households might have the same name but different numbers.def get_appliance_name(house, appliance_num):# Create a dictionary for the appliances of household 04 as per 04_doc.txt house_04_plugs = {'01': 'Fridge', '02': 'Kitchen appliances', '03': 'Lamp', '04': 'Stereo and laptop', '05': 'Freezer', '06': 'Tablet', '07': 'Entertainment', '08': 'Microwave'}# Create a dictionary for the appliances of household 05 as per 05_doc.txt house_05_plugs = {'01': 'Tablet', '02': 'Coffee machine', '03': 'Fountain', '04': 'Microwave', '05': 'Fridge', '06': 'Entertainment', '07': 'PC', '08': 'Kettle'}# Create a dictionary for the appliances of household 06 as per 06_doc.txt house_06_plugs = { '01': 'Lamp', '02': 'Laptop', '03': 'Router', '04': 'Coffee machine', '05': 'Entertainment', '06': 'Fridge', '07': 'Kettle'}# 'if-else' case that will return the respective name for the given numberif house =='04':return house_04_plugs[appliance_num]elif house =='05':return house_05_plugs[appliance_num]else:return house_06_plugs[appliance_num]
Next we come across the function which Gunther might’ve used for creating the dataframe(daily_df) mentioned above. This function accepts the csv file, the filename and the house number and returns a dataframe containing the arguments passed along with the total power consumed by the appliance in a day.
Code
# Function that returns a dataframe comprising of the appliance's total daily power consumptiondef consump_per_day(df, filename, house):# Create a empty dataframe with columns household, appliance, date and total consumption daily_consump = pd.DataFrame(columns=['household', 'appliance', 'date', 'total_consumption'])# House number daily_consump['household'] = [house]# Extract appliance number from filename using string slicing. Pass the house number and appliance to the utility function created above to get the name of the appliance. daily_consump['appliance'] = [get_appliance_name(house, filename[-17:-15])]# Extract the date from filename using string slicing. Convert the date using pd.to_datetime() in a YYYY-MM-DD format. daily_consump['date'] = [pd.to_datetime(filename[-14:-4]).strftime('%Y-%m-%d')]# Since the csv file contains power consumption per second and has 86,400 rows. Sum up the consumption so that we get the total power consumption for 24 hours/day. daily_consump['total_consumption'] = [df['consumption'].sum()]# Return the created dataframereturn daily_consump
After creating the dataframe for daily power consumption, Gunther defined the below function for the hourly analysis. This function accepts the csv file, the filename and the house number and returns a dataframe containing the arguments passed along with the hour and the power consumed by the appliance in that particular hour.
Code
# Function that returns a dataframe comprising of the appliance's total hourly power consumptiondef consump_per_hour(df, filename, house):# Extract appliance number from filename using string slicing. Pass the house number and appliance to the utility function created above to get the name of the appliance. plug_name = get_appliance_name(house, filename[-17:-15])# Extract the date from filename using string slicing. Convert the date using pd.to_datetime() in a YYYY-MM-DD format. date = pd.to_datetime(filename[-14:-4]).strftime('%Y-%m-%d')# Create a empty dataframe with columns household, appliance, date, hour and hourly consumption hourly_consump = pd.DataFrame(columns=['household', 'appliance', 'date', 'hour', 'hourly_consumption'])# Total number of seconds i.e 86,400 will be used ahead to calculate the hour total_time =len(df['consumption'])# List that will append the house number house_list= []# List that will append the appliance name appliance_list= []# List that will append the date date_list= []# List that will append the hour hour_list= []# List that will append the hourly consumption consump_list= []# Initialize variable for the while loop i =0while i < total_time:# Append the values to their respective lists created above house_list.append(house) appliance_list.append(plug_name) date_list.append(date)# Offset of 3600 has been used to calculate the hourly power consumption consump_list.append(df['consumption'][i:i+3600].sum()) i +=3600# Calculate the hour and append it to the list hour_list.append(i//3600)# Use the lists to fill up the dataframe. hourly_consump['household'] = house_list hourly_consump['appliance'] = appliance_list hourly_consump['date'] = date_list hourly_consump['hour'] = hour_list hourly_consump['hourly_consumption'] = consump_list# Return the created dataframereturn hourly_consump
Now that we’ve seen three utility functions, it’s time that we come across the driver method. Gunther used the below function to create the dataframes required. It takes the path as an input argument and returns a tuple of two dataframes (daily_df and hourly_df). Lets try to understand what he has done so far.
Code
# Function to create the required dataframe for analysis.def create_plug_df(path):# House number list for the 3 households house_list = ['04', '05', '06']# This string will be used to create the complete path plug_st ='_plugs_csv'# Create utility lists for power consumption# This list will contain individual dataframes for total daily power consumption of every appliance in every house total_daily_consump = []# This list will contain individual dataframes for total hourly power consumption of every appliance in every house total_hourly_consump = []# iterate over the housesfor house in house_list:# Since there are multiple folders for multiple appliances per house, lets create a variable which can be used later on to iterate over, get data and do calculations folders =0# Create the path using the input argument, the house-number and the plug_st sting.# The path should look like: eco/0X/0X_plugs_csv/ DIR = path+'/'+house+'/'+house+plug_st+'/'# Calculate the number of folders for each appliance by walking in the above directory path using os.walk(DIR)for _, dirnames, filenames in os.walk(DIR): folders +=len(dirnames)# Create utility lists for power consumption# This list will contain individual dataframes for total daily power consumption of every appliance per household plug_daily_use = []# This list will contain individual dataframes for total hourly power consumption of every appliance per household plug_hourly_use = []# iterate over the number of folders/appliancesfor i inrange(1, folders+1):# Form the file path using the directory path created above and the folder number file_path = DIR+'0'+str(i)+'/'# Get every file in the folder using glob all_files = glob.glob(os.path.join(file_path, "*.csv"))# Create utility lists for days and hours.# This list will contain individual dataframes for total daily power consumption per appliance per household day_list = []# This list will contain individual dataframes for total hourly power consumption per appliance per household hour_list = []# Iterate over the files in the folderfor filename in all_files:# Read the csv file df = pd.read_csv(filename, names=['consumption'], header=None)# Take care of missing data. Head over to the missing data section to check the significance of the below code line. df = df.replace(-1, 0)# Form the dataframe that contains the total daily consumption of the appliance in the house day_df = consump_per_day(df, filename, house)# Form the dataframe that contains the total hourly consumption of the appliance in the house hour_df = consump_per_hour(df, filename, house)# Append the dataframes to their respective lists. day_list.append(day_df) hour_list.append(hour_df)# Concatenate the dataframes in the list to get the merged dataframe that contains the power consumption of every appliance per house day_frame = pd.concat(day_list, axis=0, ignore_index=True).sort_values(by='date') hour_frame = pd.concat(hour_list, axis=0, ignore_index=True).sort_values(by=['date', 'hour'])# Append the dataframes to their respective lists plug_daily_use.append(day_frame) plug_hourly_use.append(hour_frame)# Concatenate the dataframes in the list to get the merged dataframe that contains the power consumption of every appliance in every house house_daily_consump= pd.concat(plug_daily_use, axis=0, ignore_index=True) house_hourly_consump= pd.concat(plug_hourly_use, axis=0, ignore_index=True)# Append the dataframes to their respective lists total_daily_consump.append(house_daily_consump) total_hourly_consump.append(house_hourly_consump)# Concatenate the dataframes in the list to get the merged dataframe that contains the total daily power consumption of every appliance in every house daily_df = pd.concat(total_daily_consump, axis=0, ignore_index=True)# Concatenate the dataframes in the list to get the merged dataframe that contains the total hourly power consumption of every appliance in every house hourly_df = pd.concat(total_hourly_consump, axis=0, ignore_index=True)# Return the created dataframesreturn daily_df, hourly_df
Now that the driver and utility functions have been defined, it’s time to execute them with the necessary input argument(s). Below code cell does exactly that. Notice that Gunther kept the data outside of the github cloned folder reason being the size of the data was too large to be hosted on github. Lets see the how the two required dataframes for this analysis look like.
Code
# Define pathpath ='./eco'# Execute the driver function to get the tuple containing dataframesplug_df = create_plug_df(path)# Get the daily_dfdaily_df = plug_df[0]# Get the hourly_dfhourly_df = plug_df[1]
Below is how the dataframe containing total daily power consumption per appliance per household looks like.
Code
daily_df.head()
household
appliance
date
total_consumption
0
04
Fridge
2012-06-27
2.633352e+06
1
04
Fridge
2012-06-28
2.357230e+06
2
04
Fridge
2012-06-29
2.197841e+06
3
04
Fridge
2012-06-30
3.081946e+06
4
04
Fridge
2012-07-01
2.777104e+06
Below is how the dataframe containing total hourly power consumption per appliance per household looks like.
Code
hourly_df.head()
household
appliance
date
hour
hourly_consumption
0
04
Fridge
2012-06-27
1
92781.80686
1
04
Fridge
2012-06-27
2
109692.01842
2
04
Fridge
2012-06-27
3
71697.31514
3
04
Fridge
2012-06-27
4
96917.73950
4
04
Fridge
2012-06-27
5
110460.75213
5.2.1 Taking care of missing data
It’s mentioned in the document that missing values are present in the data. However, they are denoted as ‘-1’. After looking at the dataframes created, we can say that Gunther has replaced these values with ‘0’. Replacing with zero sounds optimal reason being ‘-1’ may have an effect over the visualization and can make it absurd. A zero value won’t hurt the analysis.
Just to make sure that information provided was true, Gunther created the below function to have a look at the missing value statistics of a dataframe.
Code
# Define a function that returns a data-frame of missing data statisticsdef missing_val_stats(df):# Define columns of the data-frame df_stats = pd.DataFrame(columns = ['column', 'unique_val', 'num_unique_val', 'num_unique_val_nona', 'num_miss', 'pct_miss']) tmp = pd.DataFrame()for c in df.columns:# Column tmp['column'] = [c]# Unique values in the column tmp['unique_val'] = [df[c].unique()]# Number of unique values in the column tmp['num_unique_val'] =len(list(df[c].unique()))# Number of unique values in the column without nan tmp['num_unique_val_nona'] =int(df[c].nunique())# Number of missing values in the column tmp['num_miss'] = df[c].isnull().sum()# Percentage of missing values in the column tmp['pct_miss'] = (df[c].isnull().sum()/len(df)).round(3)*100# Append the values to the dataframe df_stats = df_stats.append(tmp)# Return the created dataframereturn df_stats
Lets find out if there is any missing data in either of the two dataframes created.
Code
# Missing value statistics for daily_dfmissing_val_stats_daily = missing_val_stats(daily_df)missing_val_stats_daily
column
unique_val
num_unique_val
num_unique_val_nona
num_miss
pct_miss
0
household
[04, 05, 06]
3
3
0
0.0
0
appliance
[Fridge, Kitchen appliances, Lamp, Stereo and ...
14
14
0
0.0
0
date
[2012-06-27, 2012-06-28, 2012-06-29, 2012-06-3...
220
220
0
0.0
0
total_consumption
[2633352.2361399997, 2357230.3723299997, 21978...
3678
3678
0
0.0
Code
# Missing value statistics for hourly_dfmissing_val_stats_hourly = missing_val_stats(hourly_df)missing_val_stats_hourly
column
unique_val
num_unique_val
num_unique_val_nona
num_miss
pct_miss
0
household
[04, 05, 06]
3
3
0
0.0
0
appliance
[Fridge, Kitchen appliances, Lamp, Stereo and ...
14
14
0
0.0
0
date
[2012-06-27, 2012-06-28, 2012-06-29, 2012-06-3...
220
220
0
0.0
0
hour
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...
24
24
0
0.0
0
hourly_consumption
[92781.80686, 109692.01842000001, 71697.315139...
55271
55271
0
0.0
We can clearly see that there are no missing values in both the dataframes. Gunther also has very well handled the data.
Now that the data has been pre-processed, lets move on to the EDA section.
6 EDA
Gunther began by looking at the total daily power consumption of every appliance per house, by date. He built a heatmap with the date(YYYY-MM-DD) on the x axis and the total daily power consumption(Watts) on the y axis.
6.0.1 Heatmap of average daily power consumption of every appliance per house, by date for the entire period
Code
alt.renderers.enable('default')# Define dropdown menus for filteringhouse_dropdown = alt.binding_select(options=list(daily_df['household'].unique()), name='House')# Create selection objectshouse_selection = alt.selection_single(fields=['household'], bind=house_dropdown, name='Select')selection = alt.selection_single(fields=['appliance'], name='Select')# Filter the dataframe as per total power consumption and applianceplugs_filtered = daily_df[['total_consumption','appliance']]# Get the average daily consumption of each appliance per house using groupby and mean()plugs_averages = pd.DataFrame(plugs_filtered.groupby('appliance')['total_consumption'].mean()) # Reset index of the groupby objectplugs_averages = plugs_averages.reset_index()# Plot the heatmapheatmap_daily_consump = (alt.Chart(daily_df) .mark_rect() .encode(x='date:O', y='appliance:N', color=alt.Color('total_consumption:Q', scale=alt.Scale(scheme='orangered'), legend=alt.Legend(type='symbol') ), tooltip=['total_consumption','appliance:N'], ).add_selection(house_selection).transform_filter(selection).transform_filter(house_selection))# Add title to the heatmapheatmap_daily_consump.title ="Average daily power consumption in Watts of appliance per house"# Add x-axis label to the heatmapheatmap_daily_consump.encoding.x.title ='Date (YYYY-MM_DD)'# Add y-axis label to the heatmapheatmap_daily_consump.encoding.y.title ='Appliance name'# Display the heatmapheatmap_daily_consump
6.0.2 Findings
Using the information from the heatmap above, Gunther found out that out of all appliances in every household, Freezer uses the most power.
Freezer belongs to the Tribbianis(House number ‘04’). Joey Tribbiani needs to know the power consumption of this plug and get used to more homely foods rather than just depending upon eating frozen stuff. This will help him live a healthier life and save some money on the electricity bill too.
Entertainment devices and Fridge account for the highest used appliances in the Geller (House number ‘05’) household. Maybe Ross Geller is used to keeping a six-pack of beer handy in the fridge so he can just chill later while watching a dinosaur documentary on TV.
As for the Bings (House number ‘06’), looks like Chandler Bing is without a job. Entertainment devices consume most of the power in this house. Even if we compare the power consumption of the 3 houses, the Bings will have lowest total power consumption. Either the Bings are saving, or the former hypothesis about Chandler Bing is correct.
6.0.3 Histogram showcasing total daily power consumption of all appliances per house.
Gunther has plotted the below chart that displays histogram of the total daily power consumption of all appliances per house in the dataset.
He has also provided the list of appliances available in each household below so that it’s easier for the audience to play with the chart and observe the trends.
Appliances available in house 04: 1. Entertainment 2. Freezer 3. Fridge 4. Kitchen appliances 5. Lamp 6. Microwave 7. Stereo and laptop 8. Tablet
Appliances available in house 05: 1. Coffee machine 2. Entertainment 3. Fountain 4. Fridge 5. Kettle 6. Microwave 7. PC 8. Tablet
Appliances available in house 06: 1. Coffee machine 2. Entertainment 3. Fridge 4. Kettle 5. Lamp 6. Laptop 7. Router
Code
alt.renderers.enable('default')# Define dropdown menus for filteringhouse_dropdown = alt.binding_select(options=list(daily_df['household'].unique()), name='House')appliance_dropdown = alt.binding_select(options=list(daily_df['appliance'].unique()), name='Appliance')# Create selection objectshouse_selection = alt.selection_single(fields=['household'], bind=house_dropdown, name='House')appliance_selection = alt.selection_single(fields=['appliance'], bind=appliance_dropdown, name='Appliance')# Create the histogramhist = alt.Chart(daily_df).mark_bar().encode( alt.X('date:T', title='Month'), alt.Y('total_consumption:Q', title='Daily Power Consumption (Watts)'), alt.Color('appliance:N', title='Appliance', scale=alt.Scale(scheme='category20')), alt.Column('household:Q', title='House', sort=list(daily_df['household'].unique())), tooltip=['date:T', 'appliance', 'total_consumption']).properties( title='Daily Power Consumption of different appliances in different Houses'# title of the chart).add_selection( appliance_selection, house_selection).transform_filter( appliance_selection).transform_filter( house_selection)# Display the histogramhist
6.0.4 Findings
We see a similar pattern like the heatmap. One advantage of this plot over the heatmap is the two dropdowns that make this chart more interactive. Perhaps why Gunther chose to plot this chart. We are able to filter the chart using the dropdown for house and further filter the histograms using the dropdown for the appliances.
The above chart also solidifies the hypothesis that Freezer is by far the appliance/plug that uses the most power, meaning it will likely be the differentiting factor that determines if a household is on the higher or lower end of power consumption.
7 Results
7.1 Plotly
Lets have a look at the analysis of the average hourly power consumption of the appliances used in different households done by Gunther.
To calculate the average hourly power consumption of every appliance, Gunther has used ‘groupby’ to filter hourly_df as per household, appliance and hour.
Now that the data has been grouped, Gunther has plotted a bar chart for his analysis.
7.1.1 Design Decisions:
Visual Encoding : Color has been chosen as a visual encoder which splits the appliances from each other. This has been done to keep the attention span of the audience on the graph and not get lost in multiple bars. This even makes the graphs look simple and elegant and pleasing to the eye.
Axis Choices : Appliances have been kept on the x axis. Visually it makes more sense to keep the text in a horizontal way and thus this decision has been made. Average power consumption has been kept at the y axis to mark values.
Frames : Every single hour of the day has been kept as a frame to study daily patterns. This is important as it is very pertinent to study the change in power consumption on an hourly basis.
Dropdown : A dropwdown for house number has been included which helps the user interact with the graph and see the changes across different appliances of different houses every hour of the day. This makes the graph more interactive which further helps in engaging the audience.
Titles and Axis labels : Titles and axis labels have been kept as per the analysis so that audience knows what is there in the chart.
Code
# GET UNIQUE VALUESappliances = plug_df["appliance"].unique()hours = plug_df["hour"].unique()houses = plug_df["household"].unique()# INITIALIZE GRAPH OBJECTfig = go.Figure()# Make a trace for each possible plotting scenario, by iterating over a double for loop over houses and date.for house in houses:for hr in hours:# ISOLATE ONE HOUR OF DATA FOR PLOTTING df_hour = plug_df.query(f"hour == {hr}")# Add trace to the figure fig.add_trace(# Specify the type of the trace go.Bar(x = df_hour.loc[df_hour["household"] == house]['appliance'], y=df_hour.loc[df_hour["household"] == house]['avg_hourly_consump'],# Specify whether or not to make data-visible when rendered visible=False))# Change the color of individual barsfig.update_traces(marker_color=['red', 'green', 'blue', 'yellow', 'brown', 'purple', 'orange', 'pink'])# MAKE FIRST TRACE VISIBLEfig.data[0].visible =True# Create and add slider for hours# Step 1: Define steps for slidersteps = []for i,house inenumerate(houses): step = []for j,hour inenumerate(hours): visible = [False] *len(fig.data) visible[i*len(hours)+j] =True step_dict = {'method': 'update', 'args': [{'visible': visible}],'label':str(hour)} step.append(step_dict) steps.append(step)# Step 2: Define sliders for the figuresliders=[]for i,house inenumerate(houses): slider = [dict( active=0, currentvalue={"prefix": "~hours: "}, transition={"duration": 300, "easing": "cubic-in-out"}, pad={"t": 50}, steps=steps[i]) ] sliders.append(slider)# Initialize slider for number of hoursfig.update_layout(sliders=sliders[0])# DEFINE VISIBILITY OF PLOTS FOR DROPDOWN BUTTONSbutton_visible = []for i inrange(0,len(fig.data),len(houses)): visible = [False] *len(fig.data) visible[i] =True button_visible.append(visible)# DEFINE BUTTONS FOR DROPDOWNbuttons =[]for i,house inenumerate(houses): button =dict( label=house,# MODIFICATION TYPE method="update",# BOOLEAN VALUES FOR EACH TRACE# Note that each house has it's own slider args=[{"visible" :button_visible[i]},{"sliders" : sliders[i]}], ) buttons.append(button)# ADD DROPDOWN TO CHANGE TYPEfig.update_layout( updatemenus=[dict( buttons=buttons,# VARIABLES FOR DROPDOWN DIRECTION AND LOCATION direction="down", showactive=True, pad={"r": 10, "t": 10}, x=0.935, xanchor="left", y=1.3, yanchor="top", ), ])# Set figure layoutfig.update_layout( title="Average hourly power consumption of different appliances in different households", xaxis_title="Appliances", yaxis_title="Average power consumption (W)",)fig.show()
7.1.2 Findings
Below are Gunther’s findings and insights regarding the bar chart plotted above.
1. House number ‘04’:
There is no change and also very less power consumption from the time 1 a.m to 6 a.m which might be because the Tribbianis generally sleep at that time. This becomes our first finding, that the Tribbianis generally sleep from around 12 a.m to 6 a.m.
The first major change occurs at 7 a.m where consumption of Kitchen Appliances goes high. From the documentation we know the kitchen appliances constitute coffee machine, the bread baking machine and the toaster. This tells us that the Tribbianis tend to have their breakfast around 7 a.m. Their breakfast habits seems to be them having coffee with some bread as the usage of other kitchen appliances like microwave is very low around this time.
Another huge change can be seen at 12 pm where the consumption of Microwave shoots up. This tells us about the lunch time of the Tribbianis. This could potentially also mean that this is working household as the usage of only microwave is only high around this time which could mean that they just came to grab their lunch on the go.
The entertainment bar goes high for the first time around 2 pm. This could mean after making or heating their lunches, the Joey Tribbiani genrally likes to enjoy his meal with some entertainment.
Microwave and Kitchen Appliances usage shoots up again around 7 pm which could be their dinner time.
The entertainment goes high up again around 8 - 9 pm which is again right after their meal.
2. House number ‘05’:
Again, there is no change and also very less power consumption from the time 1 a.m to 5 a.m which might be because the Gellers are sleeping too at that time. But here’s something absurd that happens at 5 a.m. The entertainment shoots up a bit from 5 a.m to 6 a.m. It can be that someone wakes up at that time to watch maybe their favourite TV series episode which might be airing in a different continent at that moment.
The first major change occurs at 7 a.m where consumption of Coffee machine goes high. The Gellers wake up at the this time, have their morning coffee and start their routine.
Next we witness that at 9 a.m, the power consumption of PC goes up. It can be that someone in the Geller house (Ross/Monica) is working remotely and their job starts at 9 a.m.
Again at 12 pm a similar huge change that was observed for the Tribbiani househole can be observed where the consumption of Microwave increases. This can mean that Gellers too prefer to have their lunch at the same time. Upon seeing the usage of PC, we can rightfully assume that this is a working household and that they too prefer to just grab their lunch on the go.
1 p.m again is the coffee break since the Coffee machine consumption goes higher.
One crucial trend that we can witness is that from 5.pm to 11 p.m, the entertainment bar goes high. This can mean that the Gellers got free from their daily jobs and now are watching TV or playing PS/XBOX or maybe listeing to their favourite songs on a speaker. This interval accounts for their leisure time.
Gellers like to have coffee during their leisure time too as we can observe that at 6 p.m, the consumption of coffee machine increases again!
From 9 p.m - 11 p.m, the power usage of Fountain increases. Again this can be a part of the leisure time of the Gellers. Maybe they are chilling in their living/ bed room with a view of the fountain to relax themselves.
Overall, we can say that Gellers look like the caffiene dependent working household who relax once they get free from their jobs.
3. House number ‘06’:
One thing to note for this household is that entertainment bar always remains high. This can either mean that the Bings prefer enterntainment too much or that the reading has not been correctly recorded. Similar is the case with the router. Looks like the router for this household is never swtiched off which means that the Bings are constantly using wifi.
At & a.m, we see a major change with the bar for Coffee machine going up. We can say that Bings sleep from 1 a.m - 7 a.m. However, given the entertainment bar, not sure if every Bing prefers to sleep during this interval.
At 8 a.m, the consumption of Kettle rises. At 9 a.m, the bar for Coffee machine again increases. This can mean that the Bings prefer to have their breakfast and coffee in the 7 a.m - 9 a.m interval.
This data seems a bit weird for a household. First of all, this house does not have a microwave or any other Kitchen appliance for food. There is a lamp, but it is barely used. Entertainment and Router account for most of the hourly average power consumption in a day.
This can mean that either the data recorded for this household might be incorrect which will be a limitation or the fact that this is not a household but an entertainment centre maybe.
7.1.3 Summary
Using Gunther’s analysis, we are able to witness how different households use different appliances on an hourly basis. We see that there is a common trend of waking up in the morning, having the morning coffee to kickstart the day, and a common time for lunch and entertainment.
Gunther’s visualization of the hourly usage of an appliance in a household led us to assume the activities of that particular household. It also led us to predict whether it was a working household or not and whether the house members preferred having coffee or not.
Using appropriate visualization, Gunther has been able to answer the first two data science questions.
7.2 Altair
The distribution of power consumption among various appliances was another area of concern for Gunther, who was looking for any notable variations that would lead to inconsistent usage patterns. He created a collection of Altair plots to accomplish this, allowing users to visualize both the overall distribution and specific distributions for each appliance. The interactive feature makes it simple to compare both the overall and individual appliance usage because users may click on a specific appliance bar to get a segmented view of the relevant distribution. This function offers a comprehensive viewpoint, allowing users to spot any potential opportunities for more effective energy utilization.
Overall, the mix of Altair graphs provides a practical and educational way to examine how much power is used by various plugs, providing insights into how to best use energy.
7.2.1 Design Decisions:
Visual Encoding : Color has been chosen as a visual encoder which splits the appliances from each other. This has been done to keep the attention span of the audience on the graph and not get lost in multiple lines. For the histogram, a single color (blue) has been chosen. Initially everything is selected so the audience will see every line in the chart. However, once the audience selects an appliance from the histogram, the line chart gets filtered. This makes the graph look elegant and pleasing to the eye.
Axis Choices : For histogram, appliances have been kept on the x axis and for the line chart, the date has been kept on the x axis. Total daily power consumption has been kept at the y axis for the both the charts to mark values.
Selector : Every single appliance can be chosen from the histogram as a filter panel. This has been done so that the audience can see the different trends of different appliances in the line chart over a period of time.
Legend : A legend has been included that indicates different colors of different line patterns of every appliance. This will help the audience distinguish and select the appliance they wish to observe the trend of the power consumption for.
Titles and Axis labels : Titles and axis labels have been kept as per the analysis so that audience knows what is there in the chart.
Code
alt.renderers.enable('default')# Create a copy of the DataFrameplugs_copy = daily_df.copy()# Define dropdown menus for filteringhouse_dropdown = alt.binding_select(options=list(daily_df['household'].unique()), name='House')# Create selection objectshouse_selection = alt.selection_single(fields=['household'], bind=house_dropdown, name='House')# Create a selection objectselection = alt.selection_single(fields=['appliance'], name='Select')# Set the color based on whether an appliance is selected or notcolor = alt.condition(selection, alt.value('#0569e3'), alt.value('#c2c3c4'))# Create a bar chart of the total daily consumption of different appliancesbar = (alt.Chart(daily_df) .mark_bar() .encode( y=alt.Y('total_consumption:Q', axis=alt.Axis(title='Daily Consumption (Watts)')), x=alt.X('appliance:N', sort='-y', axis=alt.Axis(title='Appliance')), color=color ) .properties(title ="Daily Power Consumption of Appliance") .add_selection(selection) )# Set chart propertiesbar.properties( width=600, height=600, title='Daily Power Consumption of Appliance')# Set the color based on whether an appliance is selected or notcolor2 = alt.condition(selection, alt.Color('appliance:N'), alt.value('white'))# Create a line chart of the total daily consumption of different appliancesline1=(alt.Chart(daily_df) .mark_line() .encode(x=alt.X('date:T'), y='total_consumption:Q', color=color2, tooltip=['date:T', 'appliance', 'total_consumption'] ) .properties(width=800, height=400)).interactive()# Set chart layoutline1.title ="Daily Consumption by Appliance in every household"line1.encoding.x.title ='Date (YYYY-MM-DD) (Zoom in to see dates)'line1.encoding.y.title ='Daily Consumption (W)'line1.add_selection(selection)# Display the chartsbar | line1
7.2.2 Findings
Gunther’s analysis’s most surprising finding is that the freezer’s power usage has multiple peaks, separated by a sizable gap. This pattern is probably caused due to a change of season. There is a big spike in freezer usage in the summer followed by a marked decline in the winter. The utilization patterns of the other appliances, however, are less inconsistent and varied.
This finding helps in answering the third data science question.
7.2.3 Summary
His research emphasizes how crucial it is to take seasonal elements into account when examining power consumption patterns in order to maximize energy utilization. Understanding the seasonal fluctuations in appliance usage enables homes to implement efficient energy-saving measures, such as lowering appliance usage during times of high demand, to lower overall power consumption.
8 References
Wilhelm Kleiminger, Christian Beckel, Silvia Santini Household Occupancy Monitoring Using Electricity Meters. Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp 2015). Osaka, Japan, September 2015.
Christian Beckel, Wilhelm Kleiminger, Romano Cicchetti, Thorsten Staake, and Silvia Santini The ECO Data Set and the Performance of Non-Intrusive Load Monitoring Algorithms. Proceedings of the 1st ACM International Conference on Embedded Systems for Energy-Efficient Buildings (BuildSys 2014). Memphis, TN, USA. ACM, November 2014.
---author: - name: Raghav Sharma affiliations: - name: Georgetown University---# IntroductionThe ECO (Electricity Consumption and Occupancy) data set is a comprehensive open-source (Creative Commons License CC BY 4.0) data set for non-intrusive load monitoring and occupancy detection research. It was collected in 6 Swiss households over a period of 8 months. For each of the households, the ECO data set provides:1. 1 Hz aggregate consumption data. Each measurement contains data on current, voltage, and phase shift for each of the three phases in the household.2. 1 Hz plug-level data measured from selected appliances.3. Occupancy information measured through a tablet computer (manual labeling) and a passive infrared sensor (in some of the households).# StoryGunther is a member of the state electricity department who has been assigned to do an analysis over the power consumption per appliance of 3 particular households. These households are namely:1. '04' where the Tribbianis live,2. '05' that belongs to the Gellers, and3. '06', home of the BingsDaily patterns of power consumption can be understood by examining the hourly usage of various appliances. Similarly, the daily use of various appliances can be used to understand monthly patterns of the same. Usage of different appliances could mean what the household generally does around that time or on that day and when studied over a period of time, these insights can turn into patterns.# DataData from three households are provided, comprising power consumption of particular plugs/appliances per second over a period of time. The plugs data provides appliance-level consumption.The files are coded in the following way. The first two digits (e.g., 01-06 identify the household). The next part {sm, plugs, occupancy} identifies the type of data (i.e., readings from the smart meter, the plugs and the occupancy ground truth). Lastly, where applicable, the suffix indicates the format of the data (either Matlab or plain CSV).For the plugs data, the first two digits(e.g, 01-08 identify the appliance).# Data Science Questions1. How do different households use different appliances on an hourly scale? Is there a common trend?2. Based upon hourly usage of an appliance, can it be figured out what the household members generally do in a day? Accordingly, what suggestions can be given?3. How is the plug-level data distributed among different appliances?# Data preparationGunther used the data for the 3 households by merging the entire data in two different frames:1. <b>hourly_df:</b> This dataframe was created to study the trends of the power consumption of an appliance on an hourly basis.2. <b>daily_df:</b> This dataframe was created to study the trends of the power consumption of an appliance on a daily basis.The Measurement period for this data set is from 01.06.12 to 23.01.13. The data provides a value which is the real power measured by the plug of a particular appliance of every household. For hourly analysis, he chose to group the data by household, appliance and hour so that it gives the power consumption of a particular appliance for 24 hours over a period of 7 months.<b>Note:</b> Every power consumption value is in Watts.Lets walk through code-book that he created.## Importing the librariesThis step needs no explanation. Required packages must always be loaded.```{python}import pandas as pdimport altair as altimport plotly.graph_objects as goimport globimport osimport warningswarnings.filterwarnings('ignore')```## Data wrangling, munging and cleaningThis is an interesting section. We will witness how the data was merged and what other techiniques were used to preprocess it.The function below as it's name suggests, will return the appliance name against the number mentioned in the document for different households.```{python}# Function to get the appliance name from their respective number. # Please note that appliances in different households might have the same name but different numbers.def get_appliance_name(house, appliance_num):# Create a dictionary for the appliances of household 04 as per 04_doc.txt house_04_plugs = {'01': 'Fridge', '02': 'Kitchen appliances', '03': 'Lamp', '04': 'Stereo and laptop', '05': 'Freezer', '06': 'Tablet', '07': 'Entertainment', '08': 'Microwave'}# Create a dictionary for the appliances of household 05 as per 05_doc.txt house_05_plugs = {'01': 'Tablet', '02': 'Coffee machine', '03': 'Fountain', '04': 'Microwave', '05': 'Fridge', '06': 'Entertainment', '07': 'PC', '08': 'Kettle'}# Create a dictionary for the appliances of household 06 as per 06_doc.txt house_06_plugs = { '01': 'Lamp', '02': 'Laptop', '03': 'Router', '04': 'Coffee machine', '05': 'Entertainment', '06': 'Fridge', '07': 'Kettle'}# 'if-else' case that will return the respective name for the given numberif house =='04':return house_04_plugs[appliance_num]elif house =='05':return house_05_plugs[appliance_num]else:return house_06_plugs[appliance_num]```Next we come across the function which Gunther might've used for creating the dataframe(daily_df) mentioned above. This function accepts the csv file, the filename and the house number and returns a dataframe containing the arguments passed along with the total power consumed by the appliance in a day.```{python}# Function that returns a dataframe comprising of the appliance's total daily power consumptiondef consump_per_day(df, filename, house):# Create a empty dataframe with columns household, appliance, date and total consumption daily_consump = pd.DataFrame(columns=['household', 'appliance', 'date', 'total_consumption'])# House number daily_consump['household'] = [house]# Extract appliance number from filename using string slicing. Pass the house number and appliance to the utility function created above to get the name of the appliance. daily_consump['appliance'] = [get_appliance_name(house, filename[-17:-15])]# Extract the date from filename using string slicing. Convert the date using pd.to_datetime() in a YYYY-MM-DD format. daily_consump['date'] = [pd.to_datetime(filename[-14:-4]).strftime('%Y-%m-%d')]# Since the csv file contains power consumption per second and has 86,400 rows. Sum up the consumption so that we get the total power consumption for 24 hours/day. daily_consump['total_consumption'] = [df['consumption'].sum()]# Return the created dataframereturn daily_consump```After creating the dataframe for daily power consumption, Gunther defined the below function for the hourly analysis. This function accepts the csv file, the filename and the house number and returns a dataframe containing the arguments passed along with the hour and the power consumed by the appliance in that particular hour.```{python}# Function that returns a dataframe comprising of the appliance's total hourly power consumptiondef consump_per_hour(df, filename, house):# Extract appliance number from filename using string slicing. Pass the house number and appliance to the utility function created above to get the name of the appliance. plug_name = get_appliance_name(house, filename[-17:-15])# Extract the date from filename using string slicing. Convert the date using pd.to_datetime() in a YYYY-MM-DD format. date = pd.to_datetime(filename[-14:-4]).strftime('%Y-%m-%d')# Create a empty dataframe with columns household, appliance, date, hour and hourly consumption hourly_consump = pd.DataFrame(columns=['household', 'appliance', 'date', 'hour', 'hourly_consumption'])# Total number of seconds i.e 86,400 will be used ahead to calculate the hour total_time =len(df['consumption'])# List that will append the house number house_list= []# List that will append the appliance name appliance_list= []# List that will append the date date_list= []# List that will append the hour hour_list= []# List that will append the hourly consumption consump_list= []# Initialize variable for the while loop i =0while i < total_time:# Append the values to their respective lists created above house_list.append(house) appliance_list.append(plug_name) date_list.append(date)# Offset of 3600 has been used to calculate the hourly power consumption consump_list.append(df['consumption'][i:i+3600].sum()) i +=3600# Calculate the hour and append it to the list hour_list.append(i//3600)# Use the lists to fill up the dataframe. hourly_consump['household'] = house_list hourly_consump['appliance'] = appliance_list hourly_consump['date'] = date_list hourly_consump['hour'] = hour_list hourly_consump['hourly_consumption'] = consump_list# Return the created dataframereturn hourly_consump```Now that we've seen three utility functions, it's time that we come across the driver method.Gunther used the below function to create the dataframes required. It takes the path as an input argument and returns a tuple of two dataframes (daily_df and hourly_df). Lets try to understand what he has done so far.```{python}# Function to create the required dataframe for analysis.def create_plug_df(path):# House number list for the 3 households house_list = ['04', '05', '06']# This string will be used to create the complete path plug_st ='_plugs_csv'# Create utility lists for power consumption# This list will contain individual dataframes for total daily power consumption of every appliance in every house total_daily_consump = []# This list will contain individual dataframes for total hourly power consumption of every appliance in every house total_hourly_consump = []# iterate over the housesfor house in house_list:# Since there are multiple folders for multiple appliances per house, lets create a variable which can be used later on to iterate over, get data and do calculations folders =0# Create the path using the input argument, the house-number and the plug_st sting.# The path should look like: eco/0X/0X_plugs_csv/ DIR = path+'/'+house+'/'+house+plug_st+'/'# Calculate the number of folders for each appliance by walking in the above directory path using os.walk(DIR)for _, dirnames, filenames in os.walk(DIR): folders +=len(dirnames)# Create utility lists for power consumption# This list will contain individual dataframes for total daily power consumption of every appliance per household plug_daily_use = []# This list will contain individual dataframes for total hourly power consumption of every appliance per household plug_hourly_use = []# iterate over the number of folders/appliancesfor i inrange(1, folders+1):# Form the file path using the directory path created above and the folder number file_path = DIR+'0'+str(i)+'/'# Get every file in the folder using glob all_files = glob.glob(os.path.join(file_path, "*.csv"))# Create utility lists for days and hours.# This list will contain individual dataframes for total daily power consumption per appliance per household day_list = []# This list will contain individual dataframes for total hourly power consumption per appliance per household hour_list = []# Iterate over the files in the folderfor filename in all_files:# Read the csv file df = pd.read_csv(filename, names=['consumption'], header=None)# Take care of missing data. Head over to the missing data section to check the significance of the below code line. df = df.replace(-1, 0)# Form the dataframe that contains the total daily consumption of the appliance in the house day_df = consump_per_day(df, filename, house)# Form the dataframe that contains the total hourly consumption of the appliance in the house hour_df = consump_per_hour(df, filename, house)# Append the dataframes to their respective lists. day_list.append(day_df) hour_list.append(hour_df)# Concatenate the dataframes in the list to get the merged dataframe that contains the power consumption of every appliance per house day_frame = pd.concat(day_list, axis=0, ignore_index=True).sort_values(by='date') hour_frame = pd.concat(hour_list, axis=0, ignore_index=True).sort_values(by=['date', 'hour'])# Append the dataframes to their respective lists plug_daily_use.append(day_frame) plug_hourly_use.append(hour_frame)# Concatenate the dataframes in the list to get the merged dataframe that contains the power consumption of every appliance in every house house_daily_consump= pd.concat(plug_daily_use, axis=0, ignore_index=True) house_hourly_consump= pd.concat(plug_hourly_use, axis=0, ignore_index=True)# Append the dataframes to their respective lists total_daily_consump.append(house_daily_consump) total_hourly_consump.append(house_hourly_consump)# Concatenate the dataframes in the list to get the merged dataframe that contains the total daily power consumption of every appliance in every house daily_df = pd.concat(total_daily_consump, axis=0, ignore_index=True)# Concatenate the dataframes in the list to get the merged dataframe that contains the total hourly power consumption of every appliance in every house hourly_df = pd.concat(total_hourly_consump, axis=0, ignore_index=True)# Return the created dataframesreturn daily_df, hourly_df```Now that the driver and utility functions have been defined, it's time to execute them with the necessary input argument(s). Below code cell does exactly that. Notice that Gunther kept the data outside of the github cloned folder reason being the size of the data was too large to be hosted on github. Lets see the how the two required dataframes for this analysis look like.```{python}# Define pathpath ='./eco'# Execute the driver function to get the tuple containing dataframesplug_df = create_plug_df(path)# Get the daily_dfdaily_df = plug_df[0]# Get the hourly_dfhourly_df = plug_df[1]```Below is how the dataframe containing total daily power consumption per appliance per household looks like.```{python}daily_df.head()```Below is how the dataframe containing total hourly power consumption per appliance per household looks like.```{python}hourly_df.head()```### Taking care of missing dataIt's mentioned in the document that missing values are present in the data. However, they are denoted as '-1'. After looking at the dataframes created, we can say that Gunther has replaced these values with '0'. Replacing with zero sounds optimal reason being '-1' may have an effect over the visualization and can make it absurd. A zero value won't hurt the analysis.Just to make sure that information provided was true, Gunther created the below function to have a look at the missing value statistics of a dataframe.```{python}# Define a function that returns a data-frame of missing data statisticsdef missing_val_stats(df):# Define columns of the data-frame df_stats = pd.DataFrame(columns = ['column', 'unique_val', 'num_unique_val', 'num_unique_val_nona', 'num_miss', 'pct_miss']) tmp = pd.DataFrame()for c in df.columns:# Column tmp['column'] = [c]# Unique values in the column tmp['unique_val'] = [df[c].unique()]# Number of unique values in the column tmp['num_unique_val'] =len(list(df[c].unique()))# Number of unique values in the column without nan tmp['num_unique_val_nona'] =int(df[c].nunique())# Number of missing values in the column tmp['num_miss'] = df[c].isnull().sum()# Percentage of missing values in the column tmp['pct_miss'] = (df[c].isnull().sum()/len(df)).round(3)*100# Append the values to the dataframe df_stats = df_stats.append(tmp)# Return the created dataframereturn df_stats```Lets find out if there is any missing data in either of the two dataframes created.```{python}# Missing value statistics for daily_dfmissing_val_stats_daily = missing_val_stats(daily_df)missing_val_stats_daily``````{python}# Missing value statistics for hourly_dfmissing_val_stats_hourly = missing_val_stats(hourly_df)missing_val_stats_hourly```We can clearly see that there are no missing values in both the dataframes. Gunther also has very well handled the data.Now that the data has been pre-processed, lets move on to the EDA section.# EDAGunther began by looking at the total daily power consumption of every appliance per house, by date. He built a heatmap with the date(YYYY-MM-DD) on the x axis and the total daily power consumption(Watts) on the y axis.### Heatmap of average daily power consumption of every appliance per house, by date for the entire period```{python}alt.renderers.enable('default')# Define dropdown menus for filteringhouse_dropdown = alt.binding_select(options=list(daily_df['household'].unique()), name='House')# Create selection objectshouse_selection = alt.selection_single(fields=['household'], bind=house_dropdown, name='Select')selection = alt.selection_single(fields=['appliance'], name='Select')# Filter the dataframe as per total power consumption and applianceplugs_filtered = daily_df[['total_consumption','appliance']]# Get the average daily consumption of each appliance per house using groupby and mean()plugs_averages = pd.DataFrame(plugs_filtered.groupby('appliance')['total_consumption'].mean()) # Reset index of the groupby objectplugs_averages = plugs_averages.reset_index()# Plot the heatmapheatmap_daily_consump = (alt.Chart(daily_df) .mark_rect() .encode(x='date:O', y='appliance:N', color=alt.Color('total_consumption:Q', scale=alt.Scale(scheme='orangered'), legend=alt.Legend(type='symbol') ), tooltip=['total_consumption','appliance:N'], ).add_selection(house_selection).transform_filter(selection).transform_filter(house_selection))# Add title to the heatmapheatmap_daily_consump.title ="Average daily power consumption in Watts of appliance per house"# Add x-axis label to the heatmapheatmap_daily_consump.encoding.x.title ='Date (YYYY-MM_DD)'# Add y-axis label to the heatmapheatmap_daily_consump.encoding.y.title ='Appliance name'# Display the heatmapheatmap_daily_consump```### FindingsUsing the information from the heatmap above, Gunther found out that out of all appliances in every household, Freezer uses the most power. Freezer belongs to the Tribbianis(House number '04'). Joey Tribbiani needs to know the power consumption of this plug and get used to more homely foods rather than just depending upon eating frozen stuff. This will help him live a healthier life and save some money on the electricity bill too.Entertainment devices and Fridge account for the highest used appliances in the Geller (House number '05') household. Maybe Ross Geller is used to keeping a six-pack of beer handy in the fridge so he can just chill later while watching a dinosaur documentary on TV.As for the Bings (House number '06'), looks like Chandler Bing is without a job. Entertainment devices consume most of the power in this house. Even if we compare the power consumption of the 3 houses, the Bings will have lowest total power consumption. Either the Bings are saving, or the former hypothesis about Chandler Bing is correct. ### Histogram showcasing total daily power consumption of all appliances per house.Gunther has plotted the below chart that displays histogram of the total daily power consumption of all appliances per house in the dataset.He has also provided the list of appliances available in each household below so that it's easier for the audience to play with the chart and observe the trends.Appliances available in house 04:<br>1. Entertainment<br>2. Freezer<br>3. Fridge<br>4. Kitchen appliances<br>5. Lamp<br>6. Microwave<br>7. Stereo and laptop<br>8. Tablet<br><br>Appliances available in house 05:<br>1. Coffee machine<br>2. Entertainment<br>3. Fountain<br>4. Fridge<br>5. Kettle<br>6. Microwave<br>7. PC<br>8. Tablet<br><br>Appliances available in house 06:<br>1. Coffee machine<br>2. Entertainment<br>3. Fridge<br>4. Kettle<br>5. Lamp<br>6. Laptop<br>7. Router<br>```{python}alt.renderers.enable('default')# Define dropdown menus for filteringhouse_dropdown = alt.binding_select(options=list(daily_df['household'].unique()), name='House')appliance_dropdown = alt.binding_select(options=list(daily_df['appliance'].unique()), name='Appliance')# Create selection objectshouse_selection = alt.selection_single(fields=['household'], bind=house_dropdown, name='House')appliance_selection = alt.selection_single(fields=['appliance'], bind=appliance_dropdown, name='Appliance')# Create the histogramhist = alt.Chart(daily_df).mark_bar().encode( alt.X('date:T', title='Month'), alt.Y('total_consumption:Q', title='Daily Power Consumption (Watts)'), alt.Color('appliance:N', title='Appliance', scale=alt.Scale(scheme='category20')), alt.Column('household:Q', title='House', sort=list(daily_df['household'].unique())), tooltip=['date:T', 'appliance', 'total_consumption']).properties( title='Daily Power Consumption of different appliances in different Houses'# title of the chart).add_selection( appliance_selection, house_selection).transform_filter( appliance_selection).transform_filter( house_selection)# Display the histogramhist```### FindingsWe see a similar pattern like the heatmap. One advantage of this plot over the heatmap is the two dropdowns that make this chart more interactive. Perhaps why Gunther chose to plot this chart. We are able to filter the chart using the dropdown for house and further filter the histograms using the dropdown for the appliances. The above chart also solidifies the hypothesis that Freezer is by far the appliance/plug that uses the most power, meaning it will likely be the differentiting factor that determines if a household is on the higher or lower end of power consumption.# Results## PlotlyLets have a look at the analysis of the average hourly power consumption of the appliances used in different households done by Gunther.To calculate the average hourly power consumption of every appliance, Gunther has used 'groupby' to filter hourly_df as per household, appliance and hour.```{python}plug_df = hourly_df.groupby(['household', 'appliance', 'hour']).hourly_consumption.mean().reset_index()plug_df.rename(columns={'hourly_consumption': 'avg_hourly_consump'}, inplace=True)plug_df```Now that the data has been grouped, Gunther has plotted a bar chart for his analysis.### Design Decisions:1. Visual Encoding : Color has been chosen as a visual encoder which splits the appliances from each other. This has been done to keep the attention span of the audience on the graph and not get lost in multiple bars. This even makes the graphs look simple and elegant and pleasing to the eye.2. Axis Choices : Appliances have been kept on the x axis. Visually it makes more sense to keep the text in a horizontal way and thus this decision has been made. Average power consumption has been kept at the y axis to mark values.3. Frames : Every single hour of the day has been kept as a frame to study daily patterns. This is important as it is very pertinent to study the change in power consumption on an hourly basis.4. Dropdown : A dropwdown for house number has been included which helps the user interact with the graph and see the changes across different appliances of different houses every hour of the day. This makes the graph more interactive which further helps in engaging the audience. 5. Titles and Axis labels : Titles and axis labels have been kept as per the analysis so that audience knows what is there in the chart.```{python}# GET UNIQUE VALUESappliances = plug_df["appliance"].unique()hours = plug_df["hour"].unique()houses = plug_df["household"].unique()# INITIALIZE GRAPH OBJECTfig = go.Figure()# Make a trace for each possible plotting scenario, by iterating over a double for loop over houses and date.for house in houses:for hr in hours:# ISOLATE ONE HOUR OF DATA FOR PLOTTING df_hour = plug_df.query(f"hour == {hr}")# Add trace to the figure fig.add_trace(# Specify the type of the trace go.Bar(x = df_hour.loc[df_hour["household"] == house]['appliance'], y=df_hour.loc[df_hour["household"] == house]['avg_hourly_consump'],# Specify whether or not to make data-visible when rendered visible=False))# Change the color of individual barsfig.update_traces(marker_color=['red', 'green', 'blue', 'yellow', 'brown', 'purple', 'orange', 'pink'])# MAKE FIRST TRACE VISIBLEfig.data[0].visible =True# Create and add slider for hours# Step 1: Define steps for slidersteps = []for i,house inenumerate(houses): step = []for j,hour inenumerate(hours): visible = [False] *len(fig.data) visible[i*len(hours)+j] =True step_dict = {'method': 'update', 'args': [{'visible': visible}],'label':str(hour)} step.append(step_dict) steps.append(step)# Step 2: Define sliders for the figuresliders=[]for i,house inenumerate(houses): slider = [dict( active=0, currentvalue={"prefix": "~hours: "}, transition={"duration": 300, "easing": "cubic-in-out"}, pad={"t": 50}, steps=steps[i]) ] sliders.append(slider)# Initialize slider for number of hoursfig.update_layout(sliders=sliders[0])# DEFINE VISIBILITY OF PLOTS FOR DROPDOWN BUTTONSbutton_visible = []for i inrange(0,len(fig.data),len(houses)): visible = [False] *len(fig.data) visible[i] =True button_visible.append(visible)# DEFINE BUTTONS FOR DROPDOWNbuttons =[]for i,house inenumerate(houses): button =dict( label=house,# MODIFICATION TYPE method="update",# BOOLEAN VALUES FOR EACH TRACE# Note that each house has it's own slider args=[{"visible" :button_visible[i]},{"sliders" : sliders[i]}], ) buttons.append(button)# ADD DROPDOWN TO CHANGE TYPEfig.update_layout( updatemenus=[dict( buttons=buttons,# VARIABLES FOR DROPDOWN DIRECTION AND LOCATION direction="down", showactive=True, pad={"r": 10, "t": 10}, x=0.935, xanchor="left", y=1.3, yanchor="top", ), ])# Set figure layoutfig.update_layout( title="Average hourly power consumption of different appliances in different households", xaxis_title="Appliances", yaxis_title="Average power consumption (W)",)fig.show()```### FindingsBelow are Gunther's findings and insights regarding the bar chart plotted above.<b>1. House number '04':</b>There is no change and also very less power consumption from the time 1 a.m to 6 a.m which might be because the Tribbianis generally sleep at that time.This becomes our first finding, that the Tribbianis generally sleep from around 12 a.m to 6 a.m.The first major change occurs at 7 a.m where consumption of Kitchen Appliances goes high. From the documentation we know the kitchen appliances constitute coffee machine, the bread baking machine and the toaster. This tells us that the Tribbianis tend to have their breakfast around 7 a.m. Their breakfast habits seems to be them having coffee with some bread as the usage of other kitchen appliances like microwave is very low around this time.Another huge change can be seen at 12 pm where the consumption of Microwave shoots up. This tells us about the lunch time of the Tribbianis. This could potentially also mean that this is working household as the usage of only microwave is only high around this time which could mean that they just came to grab their lunch on the go.The entertainment bar goes high for the first time around 2 pm. This could mean after making or heating their lunches, the Joey Tribbiani genrally likes to enjoy his meal with some entertainment.Microwave and Kitchen Appliances usage shoots up again around 7 pm which could be their dinner time.The entertainment goes high up again around 8 - 9 pm which is again right after their meal.<b>2. House number '05':</b>Again, there is no change and also very less power consumption from the time 1 a.m to 5 a.m which might be because the Gellers are sleeping too at that time. But here's something absurd that happens at 5 a.m. The entertainment shoots up a bit from 5 a.m to 6 a.m. It can be that someone wakes up at that time to watch maybe their favourite TV series episode which might be airing in a different continent at that moment.The first major change occurs at 7 a.m where consumption of Coffee machine goes high. The Gellers wake up at the this time, have their morning coffee and start their routine. Next we witness that at 9 a.m, the power consumption of PC goes up. It can be that someone in the Geller house (Ross/Monica) is working remotely and their job starts at 9 a.m.Again at 12 pm a similar huge change that was observed for the Tribbiani househole can be observed where the consumption of Microwave increases. This can mean that Gellers too prefer to have their lunch at the same time. Upon seeing the usage of PC, we can rightfully assume that this is a working household and that they too prefer to just grab their lunch on the go.1 p.m again is the coffee break since the Coffee machine consumption goes higher. One crucial trend that we can witness is that from 5.pm to 11 p.m, the entertainment bar goes high. This can mean that the Gellers got free from their daily jobs and now are watching TV or playing PS/XBOX or maybe listeing to their favourite songs on a speaker. This interval accounts for their leisure time.Gellers like to have coffee during their leisure time too as we can observe that at 6 p.m, the consumption of coffee machine increases again!From 9 p.m - 11 p.m, the power usage of Fountain increases. Again this can be a part of the leisure time of the Gellers. Maybe they are chilling in their living/ bed room with a view of the fountain to relax themselves.Overall, we can say that Gellers look like the caffiene dependent working household who relax once they get free from their jobs.<b>3. House number '06':</b>One thing to note for this household is that entertainment bar always remains high. This can either mean that the Bings prefer enterntainment too much or that the reading has not been correctly recorded. Similar is the case with the router. Looks like the router for this household is never swtiched off which means that the Bings are constantly using wifi.At & a.m, we see a major change with the bar for Coffee machine going up. We can say that Bings sleep from 1 a.m - 7 a.m. However, given the entertainment bar, not sure if every Bing prefers to sleep during this interval.At 8 a.m, the consumption of Kettle rises. At 9 a.m, the bar for Coffee machine again increases. This can mean that the Bings prefer to have their breakfast and coffee in the 7 a.m - 9 a.m interval.This data seems a bit weird for a household. First of all, this house does not have a microwave or any other Kitchen appliance for food. There is a lamp, but it is barely used. Entertainment and Router account for most of the hourly average power consumption in a day.This can mean that either the data recorded for this household might be incorrect which will be a limitation or the fact that this is not a household but an entertainment centre maybe. ### SummaryUsing Gunther's analysis, we are able to witness how different households use different appliances on an hourly basis. We see that there is a common trend of waking up in the morning, having the morning coffee to kickstart the day, and a common time for lunch and entertainment.Gunther's visualization of the hourly usage of an appliance in a household led us to assume the activities of that particular household. It also led us to predict whether it was a working household or not and whether the house members preferred having coffee or not.Using appropriate visualization, Gunther has been able to answer the first two data science questions.## AltairThe distribution of power consumption among various appliances was another area of concern for Gunther, who was looking for any notable variations that would lead to inconsistent usage patterns. He created a collection of Altair plots to accomplish this, allowing users to visualize both the overall distribution and specific distributions for each appliance. The interactive feature makes it simple to compare both the overall and individual appliance usage because users may click on a specific appliance bar to get a segmented view of the relevant distribution. This function offers a comprehensive viewpoint, allowing users to spot any potential opportunities for more effective energy utilization. Overall, the mix of Altair graphs provides a practical and educational way to examine how much power is used by various plugs, providing insights into how to best use energy.### Design Decisions:1. Visual Encoding : Color has been chosen as a visual encoder which splits the appliances from each other. This has been done to keep the attention span of the audience on the graph and not get lost in multiple lines. For the histogram, a single color (blue) has been chosen. Initially everything is selected so the audience will see every line in the chart. However, once the audience selects an appliance from the histogram, the line chart gets filtered. This makes the graph look elegant and pleasing to the eye. <br><br>2. Axis Choices : For histogram, appliances have been kept on the x axis and for the line chart, the date has been kept on the x axis. Total daily power consumption has been kept at the y axis for the both the charts to mark values.<br><br>3. Selector : Every single appliance can be chosen from the histogram as a filter panel. This has been done so that the audience can see the different trends of different appliances in the line chart over a period of time.<br><br>4. Legend : A legend has been included that indicates different colors of different line patterns of every appliance. This will help the audience distinguish and select the appliance they wish to observe the trend of the power consumption for. <br><br>5. Titles and Axis labels : Titles and axis labels have been kept as per the analysis so that audience knows what is there in the chart.```{python}alt.renderers.enable('default')# Create a copy of the DataFrameplugs_copy = daily_df.copy()# Define dropdown menus for filteringhouse_dropdown = alt.binding_select(options=list(daily_df['household'].unique()), name='House')# Create selection objectshouse_selection = alt.selection_single(fields=['household'], bind=house_dropdown, name='House')# Create a selection objectselection = alt.selection_single(fields=['appliance'], name='Select')# Set the color based on whether an appliance is selected or notcolor = alt.condition(selection, alt.value('#0569e3'), alt.value('#c2c3c4'))# Create a bar chart of the total daily consumption of different appliancesbar = (alt.Chart(daily_df) .mark_bar() .encode( y=alt.Y('total_consumption:Q', axis=alt.Axis(title='Daily Consumption (Watts)')), x=alt.X('appliance:N', sort='-y', axis=alt.Axis(title='Appliance')), color=color ) .properties(title ="Daily Power Consumption of Appliance") .add_selection(selection) )# Set chart propertiesbar.properties( width=600, height=600, title='Daily Power Consumption of Appliance')# Set the color based on whether an appliance is selected or notcolor2 = alt.condition(selection, alt.Color('appliance:N'), alt.value('white'))# Create a line chart of the total daily consumption of different appliancesline1=(alt.Chart(daily_df) .mark_line() .encode(x=alt.X('date:T'), y='total_consumption:Q', color=color2, tooltip=['date:T', 'appliance', 'total_consumption'] ) .properties(width=800, height=400)).interactive()# Set chart layoutline1.title ="Daily Consumption by Appliance in every household"line1.encoding.x.title ='Date (YYYY-MM-DD) (Zoom in to see dates)'line1.encoding.y.title ='Daily Consumption (W)'line1.add_selection(selection)# Display the chartsbar | line1```### FindingsGunther's analysis's most surprising finding is that the freezer's power usage has multiple peaks, separated by a sizable gap. This pattern is probably caused due to a change of season. There is a big spike in freezer usage in the summer followed by a marked decline in the winter. The utilization patterns of the other appliances, however, are less inconsistent and varied.This finding helps in answering the third data science question.### SummaryHis research emphasizes how crucial it is to take seasonal elements into account when examining power consumption patterns in order to maximize energy utilization. Understanding the seasonal fluctuations in appliance usage enables homes to implement efficient energy-saving measures, such as lowering appliance usage during times of high demand, to lower overall power consumption.# References1. Wilhelm Kleiminger, Christian Beckel, Silvia Santini Household Occupancy Monitoring Using Electricity Meters. Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp 2015). Osaka, Japan, September 2015.2. Christian Beckel, Wilhelm Kleiminger, Romano Cicchetti, Thorsten Staake, and Silvia Santini The ECO Data Set and the Performance of Non-Intrusive Load Monitoring Algorithms. Proceedings of the 1st ACM International Conference on Embedded Systems for Energy-Efficient Buildings (BuildSys 2014). Memphis, TN, USA. ACM, November 2014.3. ANLY-503 - Lab 4.14. ANLY-503 - Lab 4.25. <ahref='https://plotly.com/graphing-libraries/'>https://plotly.com/graphing-libraries/</a>