Learning Resources for Social Science Students: Introduction to Quantitative Methods


what is statistics? it’s about us

Social science researchers use qualitative and quantitative methods to approximate the truth and/or answer their research questions(s) — often questions about populations.

Statistics is a quantitative method. Statisticians have built tools/tests/measures to account for managing: uncertainty; precision; determining whether data can be trusted, e.g. their confidence in the data; variability in the dataset; how to describe data distribution; and possible margins of error.

I created this post to serve as a resource for students in introductory quantitative courses in the social sciences. I vetted the internet for resources that provided solid conceptual understanding without being too bogged down in the mathematical details.

All photos have been linked to their original owners. I will be continuously editing and updating this post with resources

Think you're good at guessing stats? Guess again. Whether we consider ourselves math people or not, our ability to understand and work with numbers is terribly limited, says data visualization expert Alan Smith. In this delightful talk, Smith explores the mismatch between what we know and what we think we know.
 

statistics at an introductory level for social sciences

This video walks students through the ins and outs of generalizations. This is important to understand no matter what type of data and analysis we are making.

Any time data is charted, the only thing it says is about the chart and nothing else about its context. Good statisticians do not just rely on charts to convey what the data means. Statisticians need to explain to their audience how to read the chart and what it means. All types of visualizations are just tools that the researcher use to help explain or display data.

All critical researchers who use statistics are very cautious about making any claims of truth or casual relationships, which is reflected in language to a non-statistician may be confusing, e.g. “fail to reject the null hypothesis.”

Statistics is all about managing uncertainty and accounting for it. How statisticians evaluate the validity and reliability of their data/dataset contributes to how they talk about the limitations of their data, and thus the observations or meanings they infer from it.

It is important to remember in social science research; quantitative data is decontextualized from the circumstances of where it was collected. For instance, when asked about the same question, people may understand/perceive it differently and may not answer the question way we intended the question to be understood. Furthermore, in statistics, social scientists often have to categorize things, e.g. gender, ethnicity, race, etc. By putting things into categories, we are effectively delineating what something is and what it is not; while we do our best to be inclusive or accurate in our definition of specific categories, our categories cannot be perfect because social life is messy. When we use secondary data, i.e. data that we have not collected, we must analyze the circumstances of how the data was collected; how were questions asked; how concepts were defined; etc. No secondary dataset will be perfect in helping us answer our research question. However, some datasets are better than others, depending on our needs based on the context of our research question(s).

All researchers and how they interpret data are not free from bias. Bias can even exist in many layers of the research process: from the get-go of how data collection instruments are to how statisticians evaluate and make meaning of it, especially concerning research questions. Numbers can also give a false sense of precision. Therefore, in order to reduce bias, one must be aware of it first.

Researchers who use statistics are trained to think when looking at data whether a relationship or a pattern they perceive does exist. We must always remember that correlation does not mean causation. This is imperative in the application of statistics to social science research, as there is so much complexity that needs to be accounted for before we make certain ascertains of ‘truth.’


Introductory SPSS Resources

Recoding variables in SPSS is demonstrated in this SPSS Tutorial. See our full video series on Descriptive Statistics in SPSS here: http://bit.ly/1mjJt9g Learn how to recode string variables to numeric variable in SPSS, recode into different variable in SPSS and recode into same variable in SPSS.

data collection & sampling

A brief overview about statistics and common vocabulary used in the field of statistics.

Today we're going to talk about good and bad surveys. From user feedback surveys, telephone polls, and those questionnaires at your doctors office, surveys are everywhere, but with their ease to create and distribute, they're also susceptible to bias and error.


Types of Variables

A variable is something that varies, it can be measured, controlled, and manipulated.

Dependent Variable

“A dependent variable is what you measure in the experiment and what is affected during the experiment. The dependent variable responds to the independent variable.

It is called dependent because it "depends" on the independent variable. In a scientific experiment, you cannot have a dependent variable without an independent variable.” - https://labwrite.ncsu.edu/po/dependentvar.htm

Independent Variable:

“An independent variable is the variable you have control over, what you can choose and manipulate. It is usually what you think will affect the dependent variable. In some cases, you may not be able to manipulate the independent variable.

It may be something that is already there and is fixed, something you would like to evaluate with respect to how it affects something else, the dependent variable like color, kind, time.” - https://labwrite.ncsu.edu/po/independentvar.htm

Dichotomous

“A dichotomous variable is a variable that contains precisely two distinct values.” - https://www.spss-tutorials.com/what-is-a-dichotomous-variable

Continuous

“Continuous Variables would (literally) take forever to count. In fact, you would get to “forever” and never finish counting them.” - https://www.statisticshowto.datasciencecentral.com/discrete-vs-continuous-variables/


measurement levels

Levels of measurement can be split into two groups: qualitative and quantitative data. They are very intuitive, so don't worry. Qualitative data can be nominal or ordinal. Nominal variables are like the categories we talked about just now - Mercedes, BMW or Audi, or like the four seasons - winter, spring, summer and autumn.

univariate // bivariate // multivariate data

Univariate data

This type of data consists of only one variable. The analysis of univariate data is thus the simplest form of analysis since the information deals with only one quantity that changes.

Bivariate data

This type of data involves two different variables. The analysis of this type of data deals with causes and relationships and the analysis is done to find out the relationship among the two variables

Multivariate data

When the data involves three or more variables, it is categorized under multivariate.

- https://www.geeksforgeeks.org/univariate-bivariate-and-multivariate-data-and-its-analysis/

Let's go on a journey through univariate analysis and learn about descriptive statistics in research!
Let's learn about Chi-square, t-test, and ANOVA!

What is a distribution?

When collecting data to make observations about the world it usually just isn't possible to collect ALL THE DATA. So instead of asking every single person about student loan debt for instance we take a sample of the population, and then use the shape of our samples to make inferences about the true underlying distribution our data.
In statistics, when we use the term distribution, we usually mean a probability distribution. Good examples are the Normal distribution, the Binomial distribution, and the Uniform distribution. A distribution is a function that shows the possible values for a variable and how often they occur. Think about a die.

mean, median, mode, range

Understand and learn how to calculate the Mode, Median, Mean, Range, and Standard Deviation

Today we're going to talk about measures of central tendency - those are the numbers that tend to hang out in the middle of our data: the mean, the median, and mode.
This video will introduce you to the three measures of central tendency: mean, median and mode. Even if you are familiar with these terms, please stick around, as we will explore their upsides and shortfalls. The first measure we will study is the mean, also known as the simple average.

Shuyi Chiou's animation explains the implications of the Central Limit Theorem.


the normal distribution

You have surely seen a normal distribution before as it is the most common one. The statistical term for it is Gaussian distribution, but many people call it the Bell Curve as it is shaped like a bell. It is symmetrical and its mean, median and mode are equal.
Today is the day we finally talk about the normal distribution! The normal distribution is incredibly important in statistics because distributions of means are normally distributed even if populations aren't.

Learn about the importance of density curves and their properties. Both of these concepts will be explained in this video. Table of Contents - Review of Histograms (0:12) - What is a Density Curve?

Learn about the basics of what people mean when they are talking about percentiles.


measures of spread

Today, we're looking at measures of spread, or dispersion, which we use to understand how well medians and means represent the data, and how reliable our conclusions are. They can help understand test scores, income inequality, spot stock bubbles, and plan gambling junkets. They're pretty useful, and now you're going to know how to calculate them!
See how outliers can affect measures of spread (range and standard deviation) and measures of centre (mode, median and mean). Get access to practice questions, written summaries, and homework help on our website!

z-scores // confidence intervals

Today we're going to talk about how we compare things that aren't exactly the same - or aren't measured in the same way. For example, if you wanted to know if a 1200 on the SAT is better than the 25 on the ACT.
Today we're going to talk about confidence intervals. Confidence intervals allow us to quantify our uncertainty, by allowing us to define a range of values for our predictions and assigning a likelihood that something falls within that range.

skewness

The most commonly used tool to measure asymmetry is skewness. Skewness indicates whether the observations in a data set are concentrated on one side. Why is skewness important? Skewness tells us a lot about where the data is situated. The mean, median and mode should be used together to get a good understanding of the dataset.

Learn about symmetry and skewness with respect to histograms, boxplots, and stemplots.


null hypothesis // type-i, type-ii errors

A null hypothesis is a precise statement about a population that we try to reject with sample data.

A null hypothesis is a precise statement about a population that we try to reject with sample data.

In general, we can have two types of errors - type I error and type II error. Sounds a bit boring, but this will be a fun lecture, I promise! First we will define the problems, and then we will see some interesting examples.

P-value // z-score // alpha

Today we're going to begin our three-part unit on p-values. In this episode we'll talk about Null Hypothesis Significance Testing (or NHST) which is a framework for comparing two sets of information.
Last week we introduced p-values as a way to set a predetermined cutoff when testing if something seems unusual enough to reject our null hypothesis - that they are the same.
We're going to finish up our discussion of p-values by taking a closer look at how they can get it wrong, and what we can do to minimize those errors.

A visual tutorial on p values, critical values, z scores and alpha.


misleading // Bad data

View full lesson: http://ed.ted.com/lessons/how-statistics-can-be-misleading-mark-liddell Statistics are persuasive. So much so that people, organizations, and whole countries base some of their most important decisions on organized data. But any set of statistics might have something lurking inside it that can turn the results completely upside down. Mark Liddell investigates Simpson's paradox.
We've talked a lot in this series about how often you see data and statistics in the news and on social media - which is ALL THE TIME! But how do you know who and what you can trust? Today, we're going to talk about how we, as consumers, can spot flawed studies, sensationalized articles, and just plain poor reporting.

more resources

There are a lot of statistics resources out there, so I suggest reading/looking at ways people have discussed or approach a particular concept/measure. Because someone's explanation may click better for your understanding than how its presented in the course.

Great online general statistics resources