Explain how to interpret the matrices output in Histogram such as hist2d and histdd
I couldnt much understand the density array part of the matrices in hist2d and histdd , please clarify .
You aren't the only one. I wouldn't worry to much about it. You wouldn't use this in real world scenario. Stakeholders have difficults understanding normal histograms plots, there's no way as a data analyst you would present them a chart that resembles something like this.
For the sake of understanding, consider that we are using only the first two rows of our matrixA array where X is the first and Y is the second
now if you consider the first 1-D array given in the result of the 2-D histogram read the bins as rows for the 2-D array result and
read the second 1-D array as a column for the 2-D array in the result. This will mean that:
0-0.75 is row1 and 0.75-1.5 is row 2
2.-3.75 is column1 and 3.75-5.5 is column 2
so to read this row1 and column1 from the 2-D array in the result tell you the number of X and Y values (from the matrixA) that falls within those ranges.
That is row1 (0.-0.75) and column3(5.5-7.5) has 2 values (X=1 and Y=9 from the matrixA) which is why we have 2 in row1 column3 of our 2-D result array
I hope this helps
What I think is:
There are "TWO" coordinates (0, 6) & (0, 6) in matriax_A falls in the bins of row(0-0.75) & column(5.5-7.25),
so we get a number "2" in the density array,
and there are 5 total coordinates(2+1+1+1 = 5) which is the sum of element values in density array.
Am I right?
Thanks!
array([[0., 0., 2., 0.],
[1., 0., 0., 1.],
[0., 0., 0., 0.],
[1., 0., 0., 0.]]
X = array([0. , 0.75, 1.5 , 2.25, 3. ])
Y = array([2. , 3.75, 5.5 , 7.25, 9. ]))
Number 2 in the density array is located at the 1st row and 3rd column. Use this information to find the bin number in the bin edges: find the 1st bin from the X array which is 0 - 0.75, find the 3rd bin from the Y array which is 5.5 - 7.25.
What do they all mean? Well, it just tells you there are 2 numbers (I am referring to number 2 in the density array here) that are between 0 and 0.75 from matrix_A[0], and they are 0 and 0.
There are also 2 numbers between 5.5 and 7.25 from the matrix_A[1] and they are 6 and 6.
Another example:
Number 1 at the 4th row, 1st column of the density array:
4 ---> 4th bin in the X array: 2.25-3
1 ---> 1st bin in the Y array: 2-3.75
How many numbers in matrix_A[0] fall in between 2.25 and 3? only 1
How many numbers in matrix_A[1] fall in between 2 and 3.75? only 1
Just remember the closed-open interval rule for the bin edge. As a reminder, for the last bin, it is actually closed-closed.
I hope this helps.
I think it could be easier to understand this lesson if you write down the points of the matrix_A in the format (x,y):
(1,3) (0,6) (0,6) (3,2) (1,9)
Then you compare the x and y from the points above and check if they fit into the intervals in the array([0. , 0.75, 1.5 , 2.25, 3. ]). Intervals here are (0-0.75), (0.75-1.5),(1.5-2.25),(2.25-3)
Similarly for y intervals in array([2. , 3.75, 5.5 , 7.25, 9. ]). Intervals here are (2.0-3.75), (3.75-5.5), (5.5-7.25), 7.25-9.0)
For example, point (1,3) fits into these intervals: (0.75-1.5,2.0-3.75), point (0,6) -> (0-0.75,5.5-7.25), point (3,2) -> (2.25-3.0,2.0-3.75), point (1,9) ->(0.75-1.5,7.25-9.0)
So, the 2D array you see shows the frequency/density or a number of points that fall into each possible interval. In total you have (4x4) possible intervals. For example, for interval (0-0.75,5.5-7.25) u've got 2 points -> (0,6), (0.6). As shows on the graph of a 2D histogram, the (0-0.75,5.5-7.25) has darker shade, so it has higher density (2) than other intervals (1).
It was pretty confusing to me at first too, but I hope this helps