-
-
Notifications
You must be signed in to change notification settings - Fork 8.3k
Violin plot inconsistency between list and numpy #12178
Copy link
Copy link
Closed as not planned
Labels
DocumentationGood first issueOpen a pull request against these issues if there are no active ones!Open a pull request against these issues if there are no active ones!status: closed as inactiveIssues closed by the "Stale" Github Action. Please comment on any you think should still be open.Issues closed by the "Stale" Github Action. Please comment on any you think should still be open.status: inactiveMarked by the “Stale” Github ActionMarked by the “Stale” Github Action
Milestone
Metadata
Metadata
Assignees
Labels
DocumentationGood first issueOpen a pull request against these issues if there are no active ones!Open a pull request against these issues if there are no active ones!status: closed as inactiveIssues closed by the "Stale" Github Action. Please comment on any you think should still be open.Issues closed by the "Stale" Github Action. Please comment on any you think should still be open.status: inactiveMarked by the “Stale” Github ActionMarked by the “Stale” Github Action
Bug report
When feeding the same data to violin plot in list or in numpy array, the result is not the same.
Code for reproduction
Actual outcome
Expected outcome
Both should be the same isn't it ?
Actually the doc specify - in terms I didn't understood at first - that it will
"Make a violin plot for each column of dataset or each vector in sequence dataset. "
It is more clear when looking at the hist doc, where one can read at the end
"Note that the ndarray form is transposed relative to the list form."
The difference between the two outcomes, I think, is the way of thinking at data : vectors in matrix or lists in list (which correspond to columns in array of row in array).
I discussed with some collegues who are working with Matlab, and they do think the vector as base unit of the 2D matrix. For me the base unit is a row of a 2D array.
In the function plot when plotting [ [1,2,3], [4,5,6] ], we have 3 curves, so I thing it is the matrix/vector way of thinking that is predominant in matplotlib.
So should we change the violinplot and histogram behavior for list to work the same as array ?
After writing this I do belive that the best solution is to just mention that specificity more clearly in the doc of violinplot, not changing the code. I can do this, but I would like some feedback from differents points of view before.
Best,
RP