Conducting Statistical and Data Analysis

Whether you are an established researcher or just starting out as a Ph.D. candidate, we are all looking for ways to work smarter and more efficiently. 

There are always new apps coming out and I recommend that you look for quality over quantity and more than that go with what works for you.  Today I want to talk to you about Statistical and Data Analysis tools.  If you are doing research, chances are you are collecting data and needing to analyze it.  I am speaking from the perspective of an engineer.  For those of you in different disciplines, you may have already found and used different software that works for you such as SPSS, R, or Origin.  All of which are great tools and if they work for you by all means keep using them.  This post is more for those of us just starting out and not really having an idea of where to start and some of those packages can have more bells and whistles than others might need.  When I perform data analysis I like using Python, primarily through Juypter Notebooks, and Microsoft® Excel through a third part Add-In.

Python is an open-source programming language with supporting statistical packages and visualization tools such as Plotly and Matplotlib, that will allow you to perform data analysis, data visualization, and statistical analyses.  These supporting packages are also open source and easy to install.  Anaconda is a good open-source suite that includes PyCharm and Juypter Notebooks and can be used on Windows, Linux, or macOS.  Additionally, Python is the most commonly used programming which means that there is a large amount of help out on the Internet.  So as you are starting out searching on YouTube Videos and Stackoverflow can help you conduct and optimize your data analysis, learn coding techniques, and troubleshoot when you get error messages.  Remember Google is your friend.  I found Python to be incredibly helpful in efficiently performing my data analysis and statistical analyses when I was doing my Ph.D. and have continued to use it in my work.  I think if you have a great deal of data, lots of CSV or ASCII files of information to analyze, it is worth the time to figure out how to analyze it using Python.  I found that Microsoft® Excel couldn’t handle the large amount of data I had, and looking at the large data in Excel, importing them in, was all very cumbersome and invited error.

For those of you uncomfortable with coding, maybe you don’t have the time to learn the coding for your current project, or maybe you are just working on a small dataset where Excel is more appropriate to use there is a free Microsoft® Excel add-in called Real-Statistics that you can load and apply to your data directly inside of Excel.  It has a full suite of statistical analyses packages.  The Real-Statistics website includes detailed explanations of how these statistical analyses work and how to use the add-in.  I have used this package for years and is really quite helpful in expanding the functionality of Excel.  Also, the Real-Statistics website has an amazing set up educational information that explains in detail how the various statistical functions work and when and how to use them within the Add-In.

Whatever you choose to use, I encourage you to invest time in finding ways to automate and streamline your data analysis.  It will save you time, save your sanity, and provide you with ready-to-publish information.