In my previous role, as a Consumer Data Analyst I found myself often in the position to teach “a bit of SPSS” for people who had no previous exposure to SPSS but they needed some basic knowledge to be able to support me and my team in our work.
In my current position – where my work is still very much SPSS based in an environment full of BI Professionals, who are all very savvy in all sort of database related magic like SQL, SSIS and other acronyms I can’t even guess, but still quite suspicious towards SPSS - we are trying to close the knowledge-gap by shedding some light on what SPSS is and how do we use it.
The material in these blog posts can be used as additional material for classroom based training or it could help anyone who want to do some self-learning on SPSS.
1st Class: What is SPSS?
Below is a quick introduction of what SPSS really is and some useful tips how to get started with the software.
SPSS is a powerful statistical software that enables quick data manipulation, data presentation & complicated statistical calculations on big sets of data. “SPSS” stands for “Statistical Package for the Social Science”. It was launched in 1968 and purchased by IBM in 2010. You can download a trial version (accessible for a couple of weeks) from IBM’s homepage: https://www-01.ibm.com/marketing/iwm/iwm/web/preLogin.do?source=SWG-STATS-DESKTOP_TRIAL)
In my examples I will concentrate on standard Consumer Insight data retrieved from online surveys. In this case a “Big set of data” would mean some thousands of rows (usual range 2,000-40,000) and a few thousands of columns (number of columns ranging an average of 500-5,000).
To really get started, this is how an SPSS file looks like when you open it:
Every row is a Respondent and every column is an answer that the respondent gave to a particular question (or part of a question) in the online survey.
You’ll be able to see more details of your data (more than numeric codes) by clicking on the “Value labels” icon in the menu row on the top:
If you hover over the column headers it will give you the variable label:
Even more details can be accessed by clicking on “Variable View” on the bottom left corner of the same window:
(Hints & tips: you can achieve the same by double clicking on the Variable Name (top, grey cell in each column and swap back by double clicking on the first grey cell for a certain row of the Variable View:
Most helpful items from “Variable View” are the following columns:
- Label: in a clean, well-structured file it will tell you what the exact text of the question was and / or shows the answer options for Multi-Choice questions (E.g. Q: Which of these games have you played during the last month? A1: Halo, A2: Fallout, A3: Medal of Honor, etc. – Respondent can select more than one answer options.)
- Values: it shows all answer options for Single choice questions (E.g. Q: How much you like playing this game? A1: Love it, A2: Sort of like it, A3: It’s OK, A4: I don’t like it, A5: I hate it – Respondent can only select one answer option.)
- Measure: some statistical functions can only be performed on a certain type of variables (i.e. you only can run “Means” on SCALE variables only. (E.g. you can phrase the above question “Please tell us on a 10 point scale how much you like playing this game?”– in this case you can run standard statistics, e.g. means, on the variable. The same wouldn’t be accurate on the original 5 point Nominal scale as the distance between the individual values is subjective.
Because of this it is good to know where to check the correct “Variable level” (= other name for “Measure” commonly used in SPSS syntaxes.)
With all this information you are now in good position to start using the data in your SPSS file.