# Pandas meets probabilities

## Showcasing Pandas on-the-fly computation capabilities to calculate probabilities on timeseries.

Posted by dr. Etienne B. Roesch on April 19, 2017

## New Notebook

#### Showcasing Jupyter uNotebook Translator Layout

tl;dr Pandas is extremely good at manipulating data. It can also compute new information on the fly. In this example, I compute the probability that a variable (called connection_status) be of values {-1, 0, 1, 2} given past values in a timeseries, for the same half hour, every week on the same day.

The problem: Given timeseries data representing the connection status for a device, coded {-1, 0, 1, 2}, what is the probability that the device be in any given state, for a given day and a given half hour?

The solution: I first extract the data and use Pandas to chunk and label the data for any given half hour in the past, and then use this subset of data to compute the probability for all four possible states. Plotting this with stacked histogram gives a nice representation of how, for a given half hour, the device can be in several states.

Python version 2.7.12 |Continuum Analytics, Inc.| (default, Jul 2 2016, 17:43:17)
[GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.11.00)]
Pandas version 0.19.2
Matplotlib version 2.0.0

CPU times: user 13.9 s, sys: 204 ms, total: 14.1 s
Wall time: 14.1 s

CPU times: user 75.3 ms, sys: 27.7 ms, total: 103 ms
Wall time: 103 ms

CPU times: user 1.53 s, sys: 8.54 ms, total: 1.54 s
Wall time: 1.54 s

CPU times: user 15.5 ms, sys: 3.07 ms, total: 18.6 ms
Wall time: 17.3 ms

CPU times: user 53.2 ms, sys: 3.45 ms, total: 56.6 ms
Wall time: 56.6 ms

half_hour
0 0 0.040000
1 0.746000
2 0.214000
1 0 0.040000
1 0.810000
2 0.150000
2 0 0.040000
1 0.840000
2 0.120000
3 0 0.040000
1 0.840000
2 0.120000
4 0 0.040000
1 0.898000
2 0.062000
5 0 0.040000
1 0.900000
2 0.060000
6 0 0.040000
1 0.914000
2 0.046000
7 0 0.040000
1 0.960000
8 0 0.040000
1 0.960000
9 0 0.040000
1 0.960000
10 0 0.040000
1 0.960000
11 0 0.040080
...
38 0 0.139535
1 0.153488
2 0.706977
39 0 0.129310
1 0.219828
2 0.650862
40 0 0.116466
1 0.297189
2 0.586345
41 0 0.104478
1 0.447761
2 0.447761
42 0 0.104478
1 0.503731
2 0.391791
43 0 0.104478
1 0.559701
2 0.335821
44 0 0.104869
1 0.621723
2 0.273408
45 0 0.104478
1 0.671642
2 0.223881
46 0 0.080906
1 0.627832
2 0.291262
47 0 0.079755
1 0.644172
2 0.276074
Name: connection_status, dtype: float64

<matplotlib.text.Text at 0x113eaff50>

<matplotlib.figure.Figure at 0x118256c90>