Scope of the notebook
This notebook collects some explorations of Altair’s most interesting features on the Kaggle’s House Prices competition.
For a basic tutorial on Altair, I created a notebook with the Titanic dataset!
import altair as alt
import numpy as np
import pandas as pd
'html') alt.renderers.enable(
= pd.read_csv('train.csv') train
Charts
Bar Chart with Highlighted Bar
Basic bar chart with a bars highlighted based on the percentage of missing values.
= train.isnull().sum()*100/train.isnull().sum().sum()
missing = missing[missing > 0].reset_index()
missing = ['Column', 'Count missing']
missing.columns missing.head()
Column | Count missing | |
---|---|---|
0 | LotFrontage | 3.718593 |
1 | Alley | 19.655420 |
2 | MasVnrType | 0.114860 |
3 | MasVnrArea | 0.114860 |
4 | BsmtQual | 0.531228 |
alt.Chart(missing).mark_bar().encode(=alt.X('Column', sort='-y'),
x='Count missing',
y=alt.condition(
color'Count missing'] >10, # If count missing is > 10%, returns True,
alt.datum['orange'), # which sets the bar orange.
alt.value('steelblue') # And if it's not true it sets the bar steelblue.
alt.value(
)
).properties(=500,
width=300
height
).configure_axis(=False
grid )
Boxplot
Creation of a basic boxplot using .mark_boxplot()
method
='min-max').encode(
alt.Chart(train).mark_boxplot(extent='OverallQual:O',
x='SalePrice:Q',
y='OverallQual:N'
color
).properties(=500,
width=300
height )
Heatmaps
Creation of a basic heatmap using .mark_rect()
method.
alt.Chart(train).mark_rect().encode(='MSZoning',
x='ExterQual',
y='average(SalePrice)'
color
).properties(=500,
width=300
height )
Bindings, Selections & Conditions
Here you can select the KitchenQual
feature from a dropdown menu and see how the graph changes color!
= alt.binding_select(options=list(train['KitchenQual'].unique()), name='Lot Shape')
input_dropdown = alt.selection_single(fields=['KitchenQual'], bind=input_dropdown)
selection = alt.condition(selection,
color 'KitchenQual:N', legend=None),
alt.Color('lightgray'))
alt.value(
alt.Chart(train).mark_point().encode(='GrLivArea',
x='SalePrice',
y=color
color
).properties(=500,
width=300
height
).add_selection(
selection
).configure_axis(=False
grid )
Interactive Chart with Cross-Highlight
In this more advanced example, I use the ExterQual
feature as a filter for a binned heatmap.
Click on the bar chart bars to change the heatmap!
= alt.selection(type="single", encodings=['x'])
pts
= alt.Chart(train).mark_rect().encode(
rect =alt.X('GrLivArea', bin=alt.Bin(maxbins=40)),
x=alt.Y('GarageArea', bin=alt.Bin(maxbins=40)),
y='average(SalePrice)',
color
).properties(=500,
width=300
height
).transform_filter(
pts
)
= alt.Chart(train).mark_bar().encode(
bar ='ExterQual:N',
x='count()',
y=alt.condition(pts, alt.ColorValue("steelblue"), alt.ColorValue("grey"))
color
).properties(=550,
width=200
height
).add_selection(
pts
)
alt.vconcat(
rect,
bar
).resolve_legend(="independent",
color="independent"
size
).configure_axis(=False
grid )
Dot Dash Plot
A Dot Dash Plot is basically a scatter plot with both axis removed and replaced with barcode plots (aka strip plots), which allow you to see the distribution of values of each measure used in the scatter plot.
# Configure the options common to all layers
= alt.selection(type='interval')
brush = alt.Chart(train).add_selection(brush)
base
# Configure the points
= base.mark_point().encode(
points =alt.X('GrLivArea', title=''),
x=alt.Y('SalePrice', title=''),
y=alt.condition(brush, 'KitchenQual', alt.value('grey'))
color
)
# Configure the ticks
= alt.Axis(labels=False, domain=False, ticks=False)
tick_axis
= base.mark_tick().encode(
x_ticks 'GrLivArea', axis=tick_axis),
alt.X('KitchenQual', title='', axis=tick_axis),
alt.Y(=alt.condition(brush, 'KitchenQual', alt.value('lightgrey'))
color
)
= base.mark_tick().encode(
y_ticks 'KitchenQual', title='', axis=tick_axis),
alt.X('SalePrice', axis=tick_axis),
alt.Y(=alt.condition(brush, 'KitchenQual', alt.value('lightgrey'))
color
)
# Build the chart
(| (points & x_ticks)
y_ticks
).configure_axis(=False
grid )
Multifeature Scatter Plot
Let’s create a scatter plot with multiple feature encodings.
With .interactive()
you can zoom in. You can also click on legend to select specific KitchenQual
values.
= alt.selection_multi(fields=['KitchenQual'], bind='legend')
selection
alt.Chart(train).mark_circle().encode('GrLivArea', scale=alt.Scale(zero=False)),
alt.X('GarageArea', scale=alt.Scale(zero=False, padding=1)),
alt.Y(='KitchenQual',
color=alt.Size('SalePrice', bin=alt.Bin(maxbins=10), title='SalePrice'),
size=alt.condition(selection, alt.value(1), alt.value(0.2))
opacity
).properties(=500,
width=500
height
).add_selection(
selection
).configure_axis(=False
grid ).interactive()
Scatter Matrix
Scatter matrix are one of the most common graph you’ll see on Kaggle. It consists of several pair-wise scatter plots of variables presented in a matrix format, useful to visualize multiple relationships between a pair of variables.
In Altair this can be achieved using a RepeatChart
, let’s see how!
alt.Chart(train).mark_circle().encode("column"), type='quantitative'),
alt.X(alt.repeat("row"), type='quantitative'),
alt.Y(alt.repeat(='KitchenQual'
color
).properties(=300,
width=300
height
).repeat(# Here we tell Altair we want to repeat out scatter plots for each row-column pair
=['GrLivArea', 'GarageArea', 'TotalBsmtSF'],
row=['TotalBsmtSF', 'GarageArea', 'GrLivArea']
column
).configure_axis(=False
grid )