Learning Resources for Social Science Students: Introduction to Quantitative Methods


what is statistics? it’s about us

Social science researchers use qualitative and quantitative methods to approximate the truth and/or answer their research questions(s) — often questions about populations.

Statistics is a quantitative method. Statisticians have built tools/tests/measures to account for managing: uncertainty; precision; determining whether data can be trusted e.g. their confidence in the data; variability in the dataset; how to describe how data is distributed; and possible margins of error.

I created this post to serve as a resource for students in introductory quantitative courses in the social sciences. I vetted the internet for resources that provided solid conceptual understanding without being too bogged down in the mathematical details.

All photos are linked to the sites where they were originally found. I will be continuously editing and updating this post with resources.

Think you're good at guessing stats? Guess again. Whether we consider ourselves math people or not, our ability to understand and work with numbers is terribly limited, says data visualization expert Alan Smith. In this delightful talk, Smith explores the mismatch between what we know and what we think we know.
 

fyi: statistics at an introductory level:

This video walks students through the ins and outs of generalizations. This is important to understand no matter what type of data and analysis we are making.

Statisticians have created their own language/concepts to describe data and types of patterns seen in data. Any time data is charted, the only thing it says is about the chart and nothing else about its context. This is why good statisticians do not rely on charts to convey what the data means. Statisticians need to explain to their audience how to read the chart and what it means. All types of visualizations are just tools that researcher use to help explain or display data.

All researcher who use statistics are very cautious about making any claims of truth or casual relationships. This may be reflected in language that is redundant or confusing e.g. “fail to reject the null hypothesis”

Statistics is all about managing uncertainty and accounting for it. How statisticians evaluate the validity and reliability of their data/dataset contributes to how they talk about the limitations of their data, and thus the observations or meanings they infer from it.

It’s important to remember in social science research, quantitative data is often decontextualized from the circumstances of where it was collected. For instance, when asked about the same question people may understand/perceive it differently and may not answer the question way we intended the question to be understood. Furthermore in statistics, social scientists often have to categorize things e.g. gender, ethnicity, race, and etc. By putting things into categories, we are effectively delineating what something is and what it isn’t; while we do our best to be inclusive or accurate in our definition of certain categories, our categories cannot be perfect because social life is messy. This is why when we use secondary data i.e. data that we have not collected, we must analyze the circumstances of how the data was collected; how were questions asked; how concepts were defined; and etc. This is because no secondary dataset will be perfect in helping us answer or research question. However, some datasets are better than others depending on our needs based on the context of our research question(s).

All researchers and how they interpret data are not free from bias. Bias can even exist in many layers of the research process: from the get go of how data collection instruments were created to how statisticians evaluate and make meaning of it, especially in relation to research questions. Numbers can also give a false sense of precision. Therefore, in order to reduce bias one must be aware of it first.

Researchers who use statistics are trained to think about whether a relationship or a pattern they perceive based on how data is shown does actually exist. We must always remember that correlation does not mean causation. This is especially imperative in the application of statistics to social science research, as there is so much complexity that needs to be accounted for before we make certain ascertains of ‘truth’.


data collection & sampling

A brief overview about statistics and common vocabulary used in the field of statistics.

Today we're going to talk about good and bad surveys. From user feedback surveys, telephone polls, and those questionnaires at your doctors office, surveys are everywhere, but with their ease to create and distribute, they're also susceptible to bias and error.


measurement levels

Levels of measurement can be split into two groups: qualitative and quantitative data. They are very intuitive, so don't worry. Qualitative data can be nominal or ordinal. Nominal variables are like the categories we talked about just now - Mercedes, BMW or Audi, or like the four seasons - winter, spring, summer and autumn.

What is a distribution?

When collecting data to make observations about the world it usually just isn't possible to collect ALL THE DATA. So instead of asking every single person about student loan debt for instance we take a sample of the population, and then use the shape of our samples to make inferences about the true underlying distribution our data.
In statistics, when we use the term distribution, we usually mean a probability distribution. Good examples are the Normal distribution, the Binomial distribution, and the Uniform distribution. A distribution is a function that shows the possible values for a variable and how often they occur. Think about a die.

mean, median, mode, range

Understand and learn how to calculate the Mode, Median, Mean, Range, and Standard Deviation

Today we're going to talk about measures of central tendency - those are the numbers that tend to hang out in the middle of our data: the mean, the median, and mode.
This video will introduce you to the three measures of central tendency: mean, median and mode. Even if you are familiar with these terms, please stick around, as we will explore their upsides and shortfalls. The first measure we will study is the mean, also known as the simple average.

Shuyi Chiou's animation explains the implications of the Central Limit Theorem.


the normal distribution

You have surely seen a normal distribution before as it is the most common one. The statistical term for it is Gaussian distribution, but many people call it the Bell Curve as it is shaped like a bell. It is symmetrical and its mean, median and mode are equal.
Today is the day we finally talk about the normal distribution! The normal distribution is incredibly important in statistics because distributions of means are normally distributed even if populations aren't.

Learn about the importance of density curves and their properties. Both of these concepts will be explained in this video. Table of Contents - Review of Histograms (0:12) - What is a Density Curve?

Learn about the basics of what people mean when they are talking about percentiles.


measures of spread

Today, we're looking at measures of spread, or dispersion, which we use to understand how well medians and means represent the data, and how reliable our conclusions are. They can help understand test scores, income inequality, spot stock bubbles, and plan gambling junkets. They're pretty useful, and now you're going to know how to calculate them!
See how outliers can affect measures of spread (range and standard deviation) and measures of centre (mode, median and mean). Get access to practice questions, written summaries, and homework help on our website!

z-scores // confidence intervals

Today we're going to talk about how we compare things that aren't exactly the same - or aren't measured in the same way. For example, if you wanted to know if a 1200 on the SAT is better than the 25 on the ACT.
Today we're going to talk about confidence intervals. Confidence intervals allow us to quantify our uncertainty, by allowing us to define a range of values for our predictions and assigning a likelihood that something falls within that range.

univariate // bivariate analysis

Let's go on a journey through univariate analysis and learn about descriptive statistics in research!
Let's learn about Chi-square, t-test, and ANOVA!

skewness

The most commonly used tool to measure asymmetry is skewness. Skewness indicates whether the observations in a data set are concentrated on one side. Why is skewness important? Skewness tells us a lot about where the data is situated. The mean, median and mode should be used together to get a good understanding of the dataset.

Learn about symmetry and skewness with respect to histograms, boxplots, and stemplots.


null hypothesis // type-i, type-ii errors

A null hypothesis is a precise statement about a population that we try to reject with sample data.

A null hypothesis is a precise statement about a population that we try to reject with sample data.

In general, we can have two types of errors - type I error and type II error. Sounds a bit boring, but this will be a fun lecture, I promise! First we will define the problems, and then we will see some interesting examples.

P-value // z-score // alpha

Today we're going to begin our three-part unit on p-values. In this episode we'll talk about Null Hypothesis Significance Testing (or NHST) which is a framework for comparing two sets of information.
Last week we introduced p-values as a way to set a predetermined cutoff when testing if something seems unusual enough to reject our null hypothesis - that they are the same.
We're going to finish up our discussion of p-values by taking a closer look at how they can get it wrong, and what we can do to minimize those errors.

A visual tutorial on p values, critical values, z scores and alpha.


misleading // Bad data

View full lesson: http://ed.ted.com/lessons/how-statistics-can-be-misleading-mark-liddell Statistics are persuasive. So much so that people, organizations, and whole countries base some of their most important decisions on organized data. But any set of statistics might have something lurking inside it that can turn the results completely upside down. Mark Liddell investigates Simpson's paradox.
We've talked a lot in this series about how often you see data and statistics in the news and on social media - which is ALL THE TIME! But how do you know who and what you can trust? Today, we're going to talk about how we, as consumers, can spot flawed studies, sensationalized articles, and just plain poor reporting.

more resources

There are a lot of statistics resources out there, so I suggest reading/looking at ways people have discussed or approach a particular concept/measure. Because someone's explanation may click better for your understanding than how its presented in the course.

Great online general statistics resources

Some supplementary SPSS resources - note these resources probably are based on an older version of SPSS so it may look a bit different: