Skip to content

Adding Scatterplot Matrix to FigureFactory #417

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 30 commits into from
May 12, 2016
Merged

Conversation

Kully
Copy link
Contributor

@Kully Kully commented Mar 4, 2016

@cldougl

This is the first PR for the Scatterplot Matrix in the FigureFactory class in tools.py. The function currently takes in only pandas dataframes, does sanity checks on the dataframe (more than one column, either strings or numbers within a column, etc.) and then builds a scatterplot matrix based on this input. There is an option for indexing by one of the columns which must be inputted by the user (parameter: index) and for either a scatterplot or histogram along the main diagonal (parameter: diagonal).

Things I am working on implementing:

  • I want to integrate a jitter parameter that adds random noise to the scatterplot so that densities can be easily seen in a plot. I already have a jitter parameter, but it only works in certain places.
  • I also am working to add to more options for diagonal plots, such as a 2d density and box plot.
  • Adding more parameters and options to the scatterplot (eg. showgrid) is also another thing I am working on expanding.
    dkjb stats
    heart rate against time

@theengineear
Copy link
Contributor

Hi @Kully, I've committed a bunch of stuff to this repo, I'll take a little weight off @cldougl 's shoulders and give this a quick review!

FWIW, for things like this, I'd be really great to get a little pic of the graphs this is making in the top comment.

def create_scatterplotmatrix(df, useindex = False, index = " ",
diagonal = "Scatter", symbol = 0, size = 6,
height = " ", width = " ", jitter = 0,
title = "Scatterplot Matrix"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We typically keep pretty lengthy docs for things like this

This way, when users run help, the information is right there.

@theengineear
Copy link
Contributor

@Kully thanks for submitting this! The logic flow is super clear. I do have some general concerns:

  • Tests! This 100% needs to be tested before we can pull it into the repo
  • Refact to reuse more of the code (DRY comments)
  • Some Python convention changes (general and specific to this repo)

Let me know when you want another round of review, happy to cruise through again. Thanks for the awesome work so far!

@Kully
Copy link
Contributor Author

Kully commented Mar 5, 2016

@theengineear Thank you so much for the feedback. I would definitely want some more feedback in another round of review.

@theengineear
Copy link
Contributor

@Kully please feel free to reach out with any questions Git, Python, otherwise!

@jackparmer
Copy link
Contributor

Looking good!

I'd love to see an optional theme kwarg that styles the subplots. A lot of folks like the seaborn and ggplot2 style, but it takes a lot of lines of layout code to achieve in plotly, especially with subplots.

Here are some nice examples.

image
via http://stackoverflow.com/questions/28197901/how-to-use-color-coding-for-intervals-for-scatter-plot-matrix-in-a-python-pandas

image
via https://plot.ly/ipython-notebooks/cufflinks/

image
via https://www.pindrop.com/visualizing-data-in-the-space-it-resides-in/

@theengineear
Copy link
Contributor

@Kully Awesome! I'm a little swamped until Tuesday, I probably won't be able to give this a good look until then. Feel free to give me a poke if I don't respond by Tuesday night though! I don't want this PR to loose steam!

@cldougl
Copy link
Member

cldougl commented Mar 21, 2016

@Kully awesome -- I'll take a look at this tonight as well!!

@theengineear
Copy link
Contributor

@Kully Taking a look now.

@theengineear
Copy link
Contributor

@Kully The tests are failing because you need to update the default schema. The short story is that we force new merges to keep master updated with our plot schema. You'll need to run:

make update_default_schema

in your terminal to get tests to pass again.

Let me know if you need any help with that!

diagonal = "Scatter", symbol = 0, size = 6,
height = " ", width = " ", jitter = 0,
title = "Scatterplot Matrix"):
def _scatterplot_no_index(matrix, headers,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐄 , any reason you didn't use the full 79 characters of line length here?

@cldougl
Copy link
Member

cldougl commented May 2, 2016

💃💃💃 when tests pass

@theengineear
Copy link
Contributor

Ditto: 👯 after tests!

}

self.maxDiff = None
self.assertEqual(test_scatter_plot_matrix.data,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Kully looks like you left .data appended to test_scatter_plot_matrix

@Kully
Copy link
Contributor Author

Kully commented May 3, 2016

@cldougl Just removed the .data

@theengineear
Copy link
Contributor

@Kully and @cldougl let me know if you'd like any help debugging test failures. One quick tip to checkout would be the 2-->3 gotcha with .keys and .values for dicts. Things like that behave in slightly different ways from python 2 to python 3. http://blog.labix.org/2008/06/27/watch-out-for-listdictkeys-in-python-3

Have you been able to run the Python 3 test suite locally successfully?

@Kully
Copy link
Contributor Author

Kully commented May 3, 2016

@theengineear We tried 3.4 and it was working locally. But we haven't tried 3.3 yet.

@Kully
Copy link
Contributor Author

Kully commented May 11, 2016

@cldougl @theengineear
Very weird.

I changed the diag param from 'histogram' to both 'scatter' and 'box' (and the respective expected scatter plot matrices) and it worked! Will check out 'histogram' option again and see if it fails.

@Kully
Copy link
Contributor Author

Kully commented May 11, 2016

@cldougl @theengineear @jackparmer

Good and ready for a merge!

@theengineear
Copy link
Contributor

Not a huge deal, but in the future, it's helpful for reviewing when the commit history is clean:

image

I'm guessing those commits were related to failures only occurring on Circle?

Even if that's the case, you can rewrite git history (as long as no one is branching off your branch) with git rebase -i https://www.atlassian.com/git/tutorials/rewriting-history/

That's a little advanced git stuff, but it can come in really handy for things like this.

Again, just in the future 😸. Don't worry about it now.

@theengineear
Copy link
Contributor

💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃 💃

@Kully Kully merged commit 748db23 into master May 12, 2016
@Kully Kully deleted the Scatter_Plot_Matrix branch June 20, 2016 15:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants