Learning Resources for Social Science Students: Introduction to Quantitative Methods
what is statistics? it’s about us
Social science researchers use qualitative and quantitative methods to approximate the truth and/or answer their research questions(s) — often questions about populations.
Statistics is a quantitative method. Statisticians have built tools/tests/measures to account for managing: uncertainty; precision; determining whether data can be trusted, e.g. their confidence in the data; variability in the dataset; how to describe data distribution; and possible margins of error.
I created this post to serve as a resource for students in introductory quantitative courses in the social sciences. I vetted the internet for resources that provided solid conceptual understanding without being too bogged down in the mathematical details.
All photos have been linked to their original owners. I will be continuously editing and updating this post with resources
statistics at an introductory level for social sciences
Any time data is charted, the only thing it says is about the chart and nothing else about its context. Good statisticians do not just rely on charts to convey what the data means. Statisticians need to explain to their audience how to read the chart and what it means. All types of visualizations are just tools that the researcher use to help explain or display data.
All critical researchers who use statistics are very cautious about making any claims of truth or casual relationships, which is reflected in language to a non-statistician may be confusing, e.g. “fail to reject the null hypothesis.”
Statistics is all about managing uncertainty and accounting for it. How statisticians evaluate the validity and reliability of their data/dataset contributes to how they talk about the limitations of their data, and thus the observations or meanings they infer from it.
It is important to remember in social science research; quantitative data is decontextualized from the circumstances of where it was collected. For instance, when asked about the same question, people may understand/perceive it differently and may not answer the question way we intended the question to be understood. Furthermore, in statistics, social scientists often have to categorize things, e.g. gender, ethnicity, race, etc. By putting things into categories, we are effectively delineating what something is and what it is not; while we do our best to be inclusive or accurate in our definition of specific categories, our categories cannot be perfect because social life is messy. When we use secondary data, i.e. data that we have not collected, we must analyze the circumstances of how the data was collected; how were questions asked; how concepts were defined; etc. No secondary dataset will be perfect in helping us answer our research question. However, some datasets are better than others, depending on our needs based on the context of our research question(s).
All researchers and how they interpret data are not free from bias. Bias can even exist in many layers of the research process: from the get-go of how data collection instruments are to how statisticians evaluate and make meaning of it, especially concerning research questions. Numbers can also give a false sense of precision. Therefore, in order to reduce bias, one must be aware of it first.
Researchers who use statistics are trained to think when looking at data whether a relationship or a pattern they perceive does exist. We must always remember that correlation does not mean causation. This is imperative in the application of statistics to social science research, as there is so much complexity that needs to be accounted for before we make certain ascertains of ‘truth.’
Introductory SPSS Resources
Kent State University has detailed resources on working with data in SPSS.
Navigating the SPSS environment: libguides.library.kent.edu/SPSS/Environment
Recoding variables: libguides.library.kent.edu/SPSS/RecodeVariables
Cross tabs: libguides.library.kent.edu/SPSS/Crosstabs
Frequency tables: libguides.library.kent.edu/SPSS/FrequenciesCategorical
Empire State College also has great resources on how to look at variables and data in SPSS.
data collection & sampling
Types of Variables
A variable is something that varies, it can be measured, controlled, and manipulated.
“A dependent variable is what you measure in the experiment and what is affected during the experiment. The dependent variable responds to the independent variable.
It is called dependent because it "depends" on the independent variable. In a scientific experiment, you cannot have a dependent variable without an independent variable.” - https://labwrite.ncsu.edu/po/dependentvar.htm
“An independent variable is the variable you have control over, what you can choose and manipulate. It is usually what you think will affect the dependent variable. In some cases, you may not be able to manipulate the independent variable.
It may be something that is already there and is fixed, something you would like to evaluate with respect to how it affects something else, the dependent variable like color, kind, time.” - https://labwrite.ncsu.edu/po/independentvar.htm
Resources on identifying dependent and independent variables
Identify dependent and independent variables and apply them to contexts - self-quiz at the end
An exhaustive list of variable types
“A dichotomous variable is a variable that contains precisely two distinct values.” - https://www.spss-tutorials.com/what-is-a-dichotomous-variable
“Continuous Variables would (literally) take forever to count. In fact, you would get to “forever” and never finish counting them.” - https://www.statisticshowto.datasciencecentral.com/discrete-vs-continuous-variables/
Resources on understanding measurement levels:
univariate // bivariate // multivariate data
This type of data consists of only one variable. The analysis of univariate data is thus the simplest form of analysis since the information deals with only one quantity that changes.
This type of data involves two different variables. The analysis of this type of data deals with causes and relationships and the analysis is done to find out the relationship among the two variables
When the data involves three or more variables, it is categorized under multivariate.
What is a distribution?
mean, median, mode, range
the normal distribution
measures of spread
z-scores // confidence intervals
null hypothesis // type-i, type-ii errors
P-value // z-score // alpha
misleading // Bad data
There are a lot of statistics resources out there, so I suggest reading/looking at ways people have discussed or approach a particular concept/measure. Because someone's explanation may click better for your understanding than how its presented in the course.
Great online general statistics resources
the DeSTRESSproject promoting statistical literacy by sharing, adapting and creating resources to contextualise statistics for social sciences