SPSS Users, It’s Time to Switch to Python

Nicoletta Tancred
Geek Culture
Published in
5 min readJun 24, 2021

--

When I started my Ph.D. SPSS was the only way to do data analysis and I accepted all the faults that came with it. I didn’t understand statistics that well and I was trying to keep up and do my best. I had heard of R but the thought of learning a programming language that only benefits data analysis didn’t appeal to me. I wished there was a better option in my field but struggled to find anything. That is until I met someone who was using Python. Their skillset in data analysis was far beyond me, using Python to create neural networks, hospital dashboards, and much more. I thought I would ask them if there was a way to do data analysis as I knew it in SPSS. Something simple like an independent t-test. They showed me it was as simple as this:

import numpy as np
import statistics as stats
df = pd.read('your_csv_name.csv')stats.ttest_ind(df[column_name],df[column_name])

This outputs an Independent- sample T-Test; the t-statistics and the p-value of the columns chosen. The simplicity and ease of use astounded me and since then I haven’t turned back. For those of you who are familiar with Python, this very much is old news. However, for someone only used to SPSS, this changed my perspective on data analysis.

When it was first introduced in the late 1960s, SPSS was and still is a great tool. It has its place, as it allows for a whole bunch of ways to validate your datasets. I think it’s also a really valuable tool and I found personally it helped me to understand how to do a Principal Component Analysis to determine factors in survey results.

While SPSS will always have a place in my data and statistics journey, since switching to using Python I’ve found freedom in my data analysis I did not think possible. It’s not scary, it’s not too hard and it is worth it. Python is the number one used language when it comes to data science and can be used in a range of fields. So, what makes Python a better option, particularly when coming from SPSS?

  1. It’s Free

While I’m fortunate enough that my university provides SPSS for free without it I would have to pay more than $100 a month. That’s not including a subscription to sites like Laerd, an online guide on running analysis in SPSS or books on how to use SPSS. Python is free and all the packages are free.

2. Large Online Community and Resources

I found because Python is used so much there is no end to community help you can find. The resources and packages provided by people are a godsend. With SPSS information is scarce and usually requiring payment. I also found that as I started expanding my data analysis in SPSS I needed more tools and more resources. When comparing that experience to Python because you can design the analysis that you need and get help doing it, it’s much more flexible.

3. Data Visualisation

What I found most infuriating about SPSS was how inflexible it is when it came to how your data looked. You do have options on data output, but when it comes to displaying your dataset cleanly and cohesively, SPSS output is just not appealing. Using Python to import a package to make your dataset look nicer is as easy as:

import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_theme(style="dark")

# Simulate data from a bivariate Gaussian
n = 10000
mean = [0, 0]
cov = [(2, .4), (.4, .2)]
rng = np.random.RandomState(0)
x, y = rng.multivariate_normal(mean, cov, n).T

# Draw a combo histogram and scatterplot with density contours
f, ax = plt.subplots(figsize=(6, 6))
sns.scatterplot(x=x, y=y, s=5, color=".15")
sns.histplot(x=x, y=y, bins=50, pthresh=.1, cmap="mako")
sns.kdeplot(x=x, y=y, levels=5, color="w", linewidths=1)
Image by seaborn on seaborn

When importing packages like matplotlib and seaborn, your data can have an array of styles, colours and themes. You can even make your own style and use it throughout your research. I’ve even seen someone use the colours of different Pokemon types for bar graphs. SPSS just doesn’t have these options.

Image by seaborn on seaborn

4. Data Output

As my learning and understanding of Python has increased, so has my understanding of how to build functions. Now functions aren’t anything new in coding but for someone completely new to coding, I love functions. I can determine what part of a dataset is analysed, in what way, what I need to know to validate the data, have it run through different parts of the function, and the data visualisation I was just discussing. Let’s say I want to run a One-Way Anova in Python: I have a function that gives me the descriptive statistics of what I’m comparing, the F statistics and p-value, if the p-value fails it then goes into a posthoc analysis using Tukey’s Test. Then this function outputs all of this into my coding environment, with reasons why it stopped at the first p-value or went onto do a posthoc analysis, exports all the tables in excel, and the data visualisation into a png file.

Then, I can run it again because it’s a function. If my colleagues expect me to report the data in a certain way I can rewrite the output to exactly what is needed and export it in any way I want. All done with just a few lines of code in Python.

To summarize, Python is worth the learning curve. Part of the reason I feel that many people are scared to go into Python learning is that it is a programming language. When you see a block of code, it’s terrifying, particularly if you are coming from a field in which coding is not necessary. Something like psychology, education, government, and more you might think this is a skill you don’t need. I would argue that you do. All you need to learn python is a computer, access to the internet, and a basic understanding of this coding principle: If ->This: Then -> That. If you understand this concept, then learning Python is that easy.

--

--

Nicoletta Tancred
Geek Culture

Current PhD Candidate in Australia. Studies in Human-Computer Interactions and Games Design. Owner and Editor of The Games Development Journal