Scope of the notebook
This notebook collects some explorations of Altair’s most interesting features on the Kaggle’s House Prices competition.
For a basic tutorial on Altair, I created a notebook with the Titanic dataset!
import altair as alt
import numpy as np
import pandas as pd
alt.renderers.enable('html')train = pd.read_csv('train.csv')Charts
Bar Chart with Highlighted Bar
Basic bar chart with a bars highlighted based on the percentage of missing values.
missing = train.isnull().sum()*100/train.isnull().sum().sum()
missing = missing[missing > 0].reset_index()
missing.columns = ['Column', 'Count missing']
missing.head()| Column | Count missing | |
|---|---|---|
| 0 | LotFrontage | 3.718593 |
| 1 | Alley | 19.655420 |
| 2 | MasVnrType | 0.114860 |
| 3 | MasVnrArea | 0.114860 |
| 4 | BsmtQual | 0.531228 |
alt.Chart(missing).mark_bar().encode(
x=alt.X('Column', sort='-y'),
y='Count missing',
color=alt.condition(
alt.datum['Count missing'] >10, # If count missing is > 10%, returns True,
alt.value('orange'), # which sets the bar orange.
alt.value('steelblue') # And if it's not true it sets the bar steelblue.
)
).properties(
width=500,
height=300
).configure_axis(
grid=False
)Boxplot
Creation of a basic boxplot using .mark_boxplot() method
alt.Chart(train).mark_boxplot(extent='min-max').encode(
x='OverallQual:O',
y='SalePrice:Q',
color='OverallQual:N'
).properties(
width=500,
height=300
)Heatmaps
Creation of a basic heatmap using .mark_rect() method.
alt.Chart(train).mark_rect().encode(
x='MSZoning',
y='ExterQual',
color='average(SalePrice)'
).properties(
width=500,
height=300
)Bindings, Selections & Conditions
Here you can select the KitchenQual feature from a dropdown menu and see how the graph changes color!
input_dropdown = alt.binding_select(options=list(train['KitchenQual'].unique()), name='Lot Shape')
selection = alt.selection_single(fields=['KitchenQual'], bind=input_dropdown)
color = alt.condition(selection,
alt.Color('KitchenQual:N', legend=None),
alt.value('lightgray'))
alt.Chart(train).mark_point().encode(
x='GrLivArea',
y='SalePrice',
color=color
).properties(
width=500,
height=300
).add_selection(
selection
).configure_axis(
grid=False
)Interactive Chart with Cross-Highlight
In this more advanced example, I use the ExterQual feature as a filter for a binned heatmap.
Click on the bar chart bars to change the heatmap!
pts = alt.selection(type="single", encodings=['x'])
rect = alt.Chart(train).mark_rect().encode(
x=alt.X('GrLivArea', bin=alt.Bin(maxbins=40)),
y=alt.Y('GarageArea', bin=alt.Bin(maxbins=40)),
color='average(SalePrice)',
).properties(
width=500,
height=300
).transform_filter(
pts
)
bar = alt.Chart(train).mark_bar().encode(
x='ExterQual:N',
y='count()',
color=alt.condition(pts, alt.ColorValue("steelblue"), alt.ColorValue("grey"))
).properties(
width=550,
height=200
).add_selection(
pts
)
alt.vconcat(
rect,
bar
).resolve_legend(
color="independent",
size="independent"
).configure_axis(
grid=False
)Dot Dash Plot
A Dot Dash Plot is basically a scatter plot with both axis removed and replaced with barcode plots (aka strip plots), which allow you to see the distribution of values of each measure used in the scatter plot.
# Configure the options common to all layers
brush = alt.selection(type='interval')
base = alt.Chart(train).add_selection(brush)
# Configure the points
points = base.mark_point().encode(
x=alt.X('GrLivArea', title=''),
y=alt.Y('SalePrice', title=''),
color=alt.condition(brush, 'KitchenQual', alt.value('grey'))
)
# Configure the ticks
tick_axis = alt.Axis(labels=False, domain=False, ticks=False)
x_ticks = base.mark_tick().encode(
alt.X('GrLivArea', axis=tick_axis),
alt.Y('KitchenQual', title='', axis=tick_axis),
color=alt.condition(brush, 'KitchenQual', alt.value('lightgrey'))
)
y_ticks = base.mark_tick().encode(
alt.X('KitchenQual', title='', axis=tick_axis),
alt.Y('SalePrice', axis=tick_axis),
color=alt.condition(brush, 'KitchenQual', alt.value('lightgrey'))
)
# Build the chart
(
y_ticks | (points & x_ticks)
).configure_axis(
grid=False
)Multifeature Scatter Plot
Let’s create a scatter plot with multiple feature encodings.
With .interactive() you can zoom in. You can also click on legend to select specific KitchenQual values.
selection = alt.selection_multi(fields=['KitchenQual'], bind='legend')
alt.Chart(train).mark_circle().encode(
alt.X('GrLivArea', scale=alt.Scale(zero=False)),
alt.Y('GarageArea', scale=alt.Scale(zero=False, padding=1)),
color='KitchenQual',
size=alt.Size('SalePrice', bin=alt.Bin(maxbins=10), title='SalePrice'),
opacity=alt.condition(selection, alt.value(1), alt.value(0.2))
).properties(
width=500,
height=500
).add_selection(
selection
).configure_axis(
grid=False
).interactive()Scatter Matrix
Scatter matrix are one of the most common graph you’ll see on Kaggle. It consists of several pair-wise scatter plots of variables presented in a matrix format, useful to visualize multiple relationships between a pair of variables.
In Altair this can be achieved using a RepeatChart, let’s see how!
alt.Chart(train).mark_circle().encode(
alt.X(alt.repeat("column"), type='quantitative'),
alt.Y(alt.repeat("row"), type='quantitative'),
color='KitchenQual'
).properties(
width=300,
height=300
).repeat(
# Here we tell Altair we want to repeat out scatter plots for each row-column pair
row=['GrLivArea', 'GarageArea', 'TotalBsmtSF'],
column=['TotalBsmtSF', 'GarageArea', 'GrLivArea']
).configure_axis(
grid=False
)