-
-
(speaker)
What is hypothesis testing?
-
Hypothesis Testing is used to determine
-
whether there is enough evidence
in a sample of data
-
to infer that a certain condition
is true for the entire population.
-
Therefore, it is a method
to test an assumption or theory
-
about a parameter
of a population based on a sample.
-
What is the population
and what is the sample?
-
The population
is the whole group we are interested in.
-
If you want to study the average height
-
of all adults in the United States,
-
then a population
would be all adults in the United States.
-
The sample
is the smaller group we actually study
-
chosen from the population.
-
For example, 150 adults
were selected from the United States,
-
and now we want to use the sample
-
to make a statement about the population.
-
And here are the six steps how to do that.
-
Number one:
hypothesis.
-
First, we need a statement, a hypothesis,
-
that we want to test.
-
For example, you want to know
-
whether a drug will have a positive effect
-
on blood pressure
in people with high blood pressure.
-
But what's next?
-
In our hypothesis, we stated
-
that we would like to study people
with high blood pressure.
-
So our population is all people
with high blood pressure
-
in, for example, the US.
-
Obviously, we cannot collect data
from the whole population,
-
so we take a sample from the population.
-
Now we use this sample to make a statement
-
about the population.
-
But how do we do that?
-
For this, we need a hypothesis test.
-
Hypothesis testing is a method
-
for testing a claim
about a parameter in a population
-
using data measured in a sample.
-
Great, that's exactly what we need.
-
There are many different hypothesis tests,
-
and at the end of this video,
-
I will give you a guide
on how to find the right test.
-
And, of course, you can find videos
-
about many more hypothesis tests
on our channel.
-
But how does a hypothesis test work?
-
When we conduct a hypothesis test,
-
we start with a research hypothesis,
-
also called alternative hypothesis.
-
This is the hypothesis
we are trying to find evidence for.
-
In our case, the research hypothesis
-
is the drug has an effect
on blood pressure,
-
but we cannot test this hypothesis
directly
-
with a classical hypothesis test,
-
so we test the opposite hypothesis
-
that the drug has no effect
on blood pressure.
-
But what does that mean?
-
First, we assume that the drug
has no effect in a population.
-
We therefore assume that, in general,
-
people who take the drug
and people who don't take the drug
-
have the same blood pressure on average.
-
If we now take a random sample,
-
and it turns out that the drug
has a large effect in the sample,
-
then we can ask how likely
it is to draw such a sample
-
or one that deviates even more
-
if the drug actually has no effect.
-
So in reality, on average,
there is no difference in a population.
-
If this probability is very low,
we can ask ourselves,
-
maybe the drug has an effect
in the population,
-
and we may have enough evidence
to reject the null hypothesis
-
that the drug has no effect.
-
And it is this probability
that is called the "p-value".
-
Let's summarize this
in three simple steps:
-
number one,
the null hypothesis states
-
that there is no difference
in the population;
-
number two,
the hypothesis test calculates
-
how much the sample deviates
from the null hypothesis;
-
number three,
the p-value indicates the probability
-
of getting a sample
that deviates as much as our sample,
-
or one that even deviates more
than our sample,
-
assuming the null hypothesis is true.
-
But at what point
is the p-value small enough
-
for us to reject the null hypothesis?
-
This brings us to the next point,
-
statistical significance.
-
If the p-value is less than
a predetermined threshold,
-
the result
is considered statistically significant.
-
This means that the result is unlikely
-
to have occurred by chance alone,
-
and that we have enough evidence
-
to reject the null hypothesis.
-
This threshold is often 0.05.
-
Therefore, a small p-value suggests
-
that the observed data or sample
-
is inconsistent with the null hypothesis.
-
This leads us
to reject the null hypothesis
-
in favor of the alternative hypothesis.
-
A large p-value suggests
that the observed data
-
is consistent with the null hypothesis,
-
and we will not reject it.
-
But note, there is always a risk
of making an error.
-
A small p-value does not prove
-
that the alternative hypothesis is true.
-
It is only saying
that it is unlikely to get such a result
-
or a more extreme
when the null hypothesis is true.
-
And again, if the null hypothesis is true,
-
there is no difference in a population.
-
And the other way around,
-
a large p-value does not prove
-
that the null hypothesis is true.
-
It is only saying
that it is likely to get such a result
-
or a more extreme
when the null hypothesis is true.
-
So there are two types of errors,
-
which are called Type I and Type II error.
-
Let's start with the Type I error.
-
In hypothesis testing,
a Type I error occurs
-
when a true null hypothesis is rejected.
-
So in reality,
the null hypothesis is true,
-
but we make the decision
to reject the null hypothesis.
-
In our example, it means
that the drug actually had no effect.
-
So in reality, there is no difference
in blood pressure.
-
Whether the drug is taken or not,
-
the blood pressure
remains the same in both cases.
-
But our sample
happened to be so far off the true value
-
that we mistakenly thought
the drug was working.
-
And a Type II error occurs
-
when a false null hypothesis
is not rejected.
-
So in reality,
the null hypothesis is false,
-
but we make the decision
not to reject the null hypothesis.
-
In our example,
this means the drug actually did work;
-
there is a difference between
-
those who have taken the drug
and those who have not,
-
but it was just a coincidence
-
that the sample taken
did not show much difference,
-
and we mistakenly thought
the drug was not working.
-
And now I'll show you how Data Tab
-
helps you to find
a suitable hypothesis test,
-
and, of course, calculates it
and interprets the results for you.
-
Let's go to datatab.net,
-
and copy your own data in here.
-
We will just use this example dataset.
-
After copying your data into the table,
-
the variables appear down here.
-
Data Tab automatically tries to determine
-
the correct level of measurement,
-
but you can also change it up here.
-
Now we just click on "Hypothesis Testing"
-
and select the variables we want to use
-
for the calculation
of the hypothesis test.
-
Data Tab
will then suggest a suitable test.
-
For example,
in this case, a Chi squared test,
-
or in that case, an analysis of variance.
-
Then you will see the hypotheses
and the results.
-
If you are not sure
how to interpret the results,
-
click on "Summary in words".
-
Further, you can check the assumptions
-
and decide whether you want to calculate
a parametric
-
or a non-parametric test.
-
You can find out the difference
-
between parametric and non-parametric
tests in my next video.
-
Thanks for watching
and I hope you enjoyed the video.