What is the appropriate SQL statement/s to produce a stacked bar chart for multiple random variable distributions?


The Objective

Graph the difference between the probability distributions of sets of random variables that come from a number of different sources using a stacked bar chart (or another type of graph suited for such visualization).

The Problem

I’m not sure of the right combination of SQL SELECT statement(s)/condition(s) to present the data to the graphing tool in the right way to achieve the objective above.

The Method

Summary

Feed the source data stored in Google Cloud Platform (GCP)’s BigQuery into Google Data Studio and use a stacked bar graph to visualize the numerous probability distributions.

The Source Data

I’m dealing with simple binary outcomes, either a heads or tails. One trial (== one second) per source yields between 0 to 200 how many heads (1s) were counted.

The schema of the data we have stored in BigQuery is as follows:

time      | source1 | source 2 | ... sourceN --------------------------------------------- 09:00:00  | 99      | 110      | 95         | 09:00:01  | 101     | 115      | 107        | 09:00:02  | 99      | 91       | 97         | 

For a Single Source

The following works fine if I’m trying to graph the results from just a single source. And as the law of probability would say, the graphed distribution is moreorless in fit with the expected bell curve.

SELECT   source1 AS src1,   COUNT(source1) AS count FROM   `rnddata` WHERE   recorded_at >= "2021-03-11 08:45:00" AND   recorded_at < "2021-03-11 09:45:00" GROUP BY src1 ORDER BY src1 
src1 | count 75   | 1 77   | 2 80   | 2 81   | 5 82   | 6 83   | 20 84   | 10 85   | 14 86   | 31 87   | 33 88   | 48 89   | 74 90   | 74 91   | 89 92   | 107 93   | 133 94   | 135 95   | 168 96   | 160 97   | 192 98   | 190 99   | 195 100  | 189 101  | 197 102  | 191 103  | 188 104  | 180 105  | 201 106  | 130 107  | 128 108  | 107 109  | 96 110  | 64 111  | 65 112  | 45 113  | 40 114  | 19 115  | 22 116  | 18 117  | 9 118  | 8 119  | 6 120  | 3 121  | 4 125  | 1 

For Multiple Sources

I’m thinking that if I can have multiple dimensions (src1, src2… srcN) with a single count column for the metric, then I will be able to graph the dimensions as stacked bars on the X axis and have the count metric set to the X axis.

I imagine the output could be something like this:

src1 | src2 | count 75   | null | 1 77   | 77   | 2 80   | 00   | 2 null | 80   | 5 82   | 82   | 6 83   | null | 20 83   | 11   | null 84   | 84   | 10 

I did a bit of research and experimenting with UNIONS and JOINS and INTERSECTS but couldn’t quiet get the above output. Any advice on how to form such an SQL statement to do that, or offer a better presentation (schema) of the data for the graphing purposes, would be greatly appreciated.