Showing posts with label EDA. Show all posts
Showing posts with label EDA. Show all posts

Tuesday, February 27, 2018

Stemgraphic v.0.5.x: stem-and-leaf EDA and visualization for numbers, categoricals and text


 Stemgraphic open source


In 2016 at PyDataCarolinas, I open-sourced my stem-and-leaf toolkit for exploratory data analysis and visualization. Later, in October 2016 I had posted the link to the video.



Stemgraphic.alpha


With the 0.5.x releases, I've introduced the categorical and text support. In the next few weeks, I'll be introducing some of the features, particularly those found in the new stemgraphic.alpha module of the stemgraphic package, such as back-to-back plots and stem-and-leaf heatmaps:




But if you want to get started, check out stemgraphic.org, and the github repo (especially the notebooks).

Github Repo

https://github.com/fdion/stemgraphic


Francois Dion
@f_dion

Thursday, October 20, 2016

Stemgraphic, a new visualization tool

PyData Carolinas 2016

At PyData Carolinas 2016 I presented the talk Stemgraphic: A Stem-and-Leaf Plot for the Age of Big Data.

Intro

The stem-and-leaf plot is one of the most powerful tools not found in a data scientist or statistician’s toolbox. If we go back in time thirty some years we find the exact opposite. What happened to the stem-and-leaf plot? Finding the answer led me to design and implement an improved graphical version of the stem-and-leaf plot, as a python package. As a companion to the talk, a printed research paper was provided to the audience (a PDF is now available through artchiv.es)

The talk




Thanks to the organizers of PyData Carolinas, videos of all the talks and tutorials have been posted on youtube. In just 30 minutes, this is a great way to learn more about stemgraphic and the history of the stem-and-leaf plot for EDA work. This updated version does include the animated intro sequence, but unfortunately the sound was recorded from the microphone, and not the mixer. You can see the intro sequence in higher audio and video quality on the main page of the website below.

Stemgraphic.org

I've created a web site for stemgraphic, as I'll be posting some tutorials and demo some of the more advanced features, particularly as to how stemgraphic can be used in a data science pipeline, as a data wrangling tool, as an intermediary to big data on HDFS, as a visual validation for building models and as a superior distribution plot, particularly when faced with non uniform distributions or distributions showing a high degree of skewness (long tails).

Github Repo

https://github.com/fdion/stemgraphic


Francois Dion
@f_dion