< Return to Video

What is Statistics? A Beginner's Guide to Statistics (Data Analytics)!

  • 0:00 - 0:03
    PROFESSOR: If you want to
    finally understand statistics,
  • 0:03 - 0:05
    this is the place to be.
  • 0:05 - 0:09
    After this video, you will
    know what statistics is,
  • 0:09 - 0:11
    what descriptive
    statistics is, and what
  • 0:11 - 0:13
    inferential statistics is.
  • 0:13 - 0:16
    So let's start with
    the first question.
  • 0:16 - 0:17
    What is statistics?
  • 0:17 - 0:21
    Statistics deals with
    the collection, analysis,
  • 0:21 - 0:23
    and presentation of data.
  • 0:23 - 0:24
    An example.
  • 0:24 - 0:28
    We would like to investigate
    whether gender has an influence
  • 0:28 - 0:30
    on the preferred newspaper.
  • 0:30 - 0:35
    Then gender and newspaper are
    our so-called variables that we
  • 0:35 - 0:36
    want to analyse.
  • 0:36 - 0:39
    In order to analyse whether
    gender has an influence
  • 0:39 - 0:44
    on the preferred newspaper,
    we first need to collect data.
  • 0:44 - 0:46
    To do this, we create
    a questionnaire
  • 0:46 - 0:50
    that asks about gender
    and preferred newspaper.
  • 0:50 - 0:54
    We will then send out the
    survey and wait two weeks.
  • 0:54 - 0:59
    Afterwards, we can display the
    received answers in a table.
  • 0:59 - 1:03
    In this table, we have one
    column for each variable,
  • 1:03 - 1:06
    one for gender and
    one for newspaper.
  • 1:06 - 1:09
    On the other hand, each
    row is the response
  • 1:09 - 1:11
    of one surveyed person.
  • 1:11 - 1:16
    The first respondent is male
    and stated New York Post,
  • 1:16 - 1:19
    the second is female
    and stated USA Today,
  • 1:19 - 1:21
    and so on and so forth.
  • 1:21 - 1:24
    Of course, the data does not
    have to be from a survey.
  • 1:24 - 1:28
    The data can also come from
    an experiment in which you,
  • 1:28 - 1:32
    for example, want to study the
    effect of two drugs on blood
  • 1:32 - 1:33
    pressure.
  • 1:33 - 1:34
    Now, the first step is done.
  • 1:34 - 1:39
    We have collected data and we
    can start analyzing the data.
  • 1:39 - 1:41
    But what do we actually
    want to analyse?
  • 1:41 - 1:44
    We did not survey the
    entire population,
  • 1:44 - 1:46
    but we took a sample.
  • 1:46 - 1:49
    Now the big question
    is, do we just
  • 1:49 - 1:51
    want to describe
    the sample data,
  • 1:51 - 1:53
    or do we want to
    make a statement
  • 1:53 - 1:55
    about the whole population?
  • 1:55 - 1:59
    If our aim is limited to the
    sample itself, i.e. we only
  • 1:59 - 2:01
    want to describe
    the collected data,
  • 2:01 - 2:04
    we will use
    descriptive statistics.
  • 2:04 - 2:08
    Descriptive statistics will
    provide a detailed summary
  • 2:08 - 2:09
    of the sample.
  • 2:09 - 2:11
    However, if we want
    to draw conclusions
  • 2:11 - 2:16
    about the population as a whole,
    inferential statistics are used.
  • 2:16 - 2:19
    This approach allows us
    to make educated guesses
  • 2:19 - 2:22
    about the population
    based on the sample data.
  • 2:22 - 2:26
    Let us take a closer
    look at both methods,
  • 2:26 - 2:28
    starting with
    descriptive statistics.
  • 2:28 - 2:31
    Why is descriptive
    statistics so important?
  • 2:31 - 2:34
    Let's say a company
    wants to know how
  • 2:34 - 2:36
    its employees travel to work.
  • 2:36 - 2:40
    So the company creates a
    survey to answer this question.
  • 2:40 - 2:42
    Once enough data
    has been collected,
  • 2:42 - 2:46
    this data can be analyzed
    using descriptive statistics.
  • 2:46 - 2:49
    But what is
    descriptive statistics?
  • 2:49 - 2:53
    Descriptive statistics aims
    to describe and summarize
  • 2:53 - 2:55
    a data set in a meaningful way.
  • 2:55 - 2:59
    But it is important to note
    that descriptive statistics only
  • 2:59 - 3:02
    describe the collected data
    without drawing conclusions
  • 3:02 - 3:05
    about a larger population.
  • 3:05 - 3:09
    Put simply, just because we know
    how some people from one company
  • 3:09 - 3:13
    get to work, we cannot
    say how all working people
  • 3:13 - 3:15
    of the company get to work.
  • 3:15 - 3:17
    This is the task of
    inferential statistics,
  • 3:17 - 3:19
    which we will discuss later.
  • 3:19 - 3:22
    To describe data
    descriptively, we now
  • 3:22 - 3:25
    look at the four
    key components--
  • 3:25 - 3:28
    measures of central tendency,
    measures of dispersion,
  • 3:28 - 3:30
    frequency tables and charts.
  • 3:30 - 3:34
    Let's start with the first one,
    measures of central tendency.
  • 3:34 - 3:37
    Measures of central
    tendency are, for example,
  • 3:37 - 3:40
    the mean, the
    median, and the mode.
  • 3:40 - 3:42
    Let's first have a
    look at the mean.
  • 3:42 - 3:47
    The arithmetic mean is the sum
    of all observations divided
  • 3:47 - 3:49
    by the number of observations.
  • 3:49 - 3:50
    An example.
  • 3:50 - 3:53
    Imagine we have the test
    scores of five students.
  • 3:53 - 3:57
    To find a mean score,
    we sum up all the scores
  • 3:57 - 3:59
    and divide by the
    number of scores.
  • 3:59 - 4:04
    The mean test score of these
    five students is therefore 86.6.
  • 4:04 - 4:06
    What about the median?
  • 4:06 - 4:11
    When the values in a data set
    are arranged in ascending order,
  • 4:11 - 4:13
    the median is the middle value.
  • 4:13 - 4:16
    If there is an odd
    number of data points,
  • 4:16 - 4:19
    the median is simply
    the middle value.
  • 4:19 - 4:22
    If there is an even
    number of data points,
  • 4:22 - 4:26
    the median is the average
    of the two middle values.
  • 4:26 - 4:28
    It is important to
    note that the median is
  • 4:28 - 4:32
    resistant to extreme
    values or outliers.
  • 4:32 - 4:34
    Let's look at this example.
  • 4:34 - 4:37
    No matter how tall
    the last person is,
  • 4:37 - 4:40
    the person in the middle remains
    the person in the middle.
  • 4:40 - 4:43
    So the median does not change.
  • 4:43 - 4:46
    But if we look at the mean,
    it does have an effect
  • 4:46 - 4:49
    on how tall the last person is.
  • 4:49 - 4:52
    The mean is therefore
    not robust to outliers.
  • 4:52 - 4:54
    Let's continue with the mode.
  • 4:54 - 4:57
    The mode refers to
    the value or values
  • 4:57 - 5:01
    that appear most frequently
    in a set of data.
  • 5:01 - 5:05
    For example, if 14 people
    travel to work by car,
  • 5:05 - 5:10
    six by bike, five walk, and
    five take public transport,
  • 5:10 - 5:15
    then car occurs most often
    and is therefore the mode.
  • 5:15 - 5:16
    Great.
  • 5:16 - 5:19
    Let's continue with the
    measures of dispersion.
  • 5:19 - 5:22
    Measures of dispersion
    describe how spread out
  • 5:22 - 5:24
    the values in a data set are.
  • 5:24 - 5:27
    Measures of dispersion
    are, for example,
  • 5:27 - 5:30
    the variance and standard
    deviation, the range,
  • 5:30 - 5:32
    and the interquartile range.
  • 5:32 - 5:35
    Let's start with the
    standard deviation.
  • 5:35 - 5:38
    The standard deviation
    indicates the average distance
  • 5:38 - 5:41
    between each data
    point and the mean.
  • 5:41 - 5:43
    But what does that mean?
  • 5:43 - 5:46
    Each person has some
    deviation from the mean.
  • 5:46 - 5:49
    Now we want to know
    how much the person's
  • 5:49 - 5:52
    deviate from the mean
    value on average.
  • 5:52 - 5:56
    In this example, the average
    deviation from the mean value
  • 5:56 - 5:58
    is 11.5 centimeters.
  • 5:58 - 6:01
    To calculate the
    standard deviation,
  • 6:01 - 6:03
    we can use this equation.
  • 6:03 - 6:08
    Sigma is the standard deviation,
    n is the number of persons,
  • 6:08 - 6:12
    xi is the size of
    each person, and x bar
  • 6:12 - 6:15
    is the mean value
    of all persons.
  • 6:15 - 6:18
    But attention, there are two
    slightly different equations
  • 6:18 - 6:20
    for the standard deviation.
  • 6:20 - 6:24
    The difference is that we
    have ones, 1 divided by n,
  • 6:24 - 6:28
    and ones, 1 divided
    by n minus 1.
  • 6:28 - 6:31
    To keep it simple,
    if our survey doesn't
  • 6:31 - 6:33
    cover the whole
    population, we always
  • 6:33 - 6:37
    use this equation to estimate
    the standard deviation.
  • 6:37 - 6:41
    Likewise, if we have
    conducted a clinical study,
  • 6:41 - 6:44
    then we also use this
    equation to estimate
  • 6:44 - 6:45
    the standard deviation.
  • 6:45 - 6:48
    But what is the difference
    between the standard deviation
  • 6:48 - 6:49
    and the variance?
  • 6:49 - 6:52
    As we now know, the
    standard deviation
  • 6:52 - 6:56
    is the quadratic mean of
    the distance from the mean.
  • 6:56 - 6:59
    The variance now is the
    squared standard deviation.
  • 6:59 - 7:02
    If you want to know more details
    about the standard deviation
  • 7:02 - 7:05
    and the variance,
    please watch our video.
  • 7:05 - 7:08
    Let's move on to range
    and interquartile range.
  • 7:08 - 7:10
    It is easy to understand.
  • 7:10 - 7:12
    The range is simply
    the difference
  • 7:12 - 7:16
    between the maximum
    and minimum value.
  • 7:16 - 7:20
    Interquartile range represents
    the middle 50% of the data.
  • 7:20 - 7:24
    It is the difference between
    the first quartile, Q1,
  • 7:24 - 7:27
    and the third quartile, Q3.
  • 7:27 - 7:31
    Therefore, 25% of the
    values are smaller
  • 7:31 - 7:36
    than the interquartile range and
    25% of the values are larger.
  • 7:36 - 7:39
    The interquartile
    range contains exactly
  • 7:39 - 7:42
    the middle 50% of the values.
  • 7:42 - 7:44
    Before we get to
    the last two points,
  • 7:44 - 7:47
    let's briefly compare
    measures of central tendency
  • 7:47 - 7:49
    and measures of dispersion.
  • 7:49 - 7:53
    Let's say we measure the
    blood pressure of patients.
  • 7:53 - 7:56
    Measures of central tendency
    provide a single value
  • 7:56 - 7:59
    that represents the
    entire data set,
  • 7:59 - 8:03
    helping to identify a central
    value around which data
  • 8:03 - 8:05
    points tend to cluster.
  • 8:05 - 8:09
    Measures of dispersion, like the
    standard deviation, the range,
  • 8:09 - 8:12
    and the interquartile
    range, indicate
  • 8:12 - 8:15
    how spread out the
    data points are,
  • 8:15 - 8:18
    whether they are closely
    packed around the center
  • 8:18 - 8:19
    or spread far from it.
  • 8:19 - 8:22
    In summary, while measures
    of central tendency
  • 8:22 - 8:27
    provide a central point of the
    data set, measures of dispersion
  • 8:27 - 8:30
    describe how the data is
    spread around the center.
  • 8:30 - 8:32
    Let's move on to tables.
  • 8:32 - 8:36
    Here we will have a look at the
    most important ones, frequency
  • 8:36 - 8:38
    tables and contingency tables.
  • 8:38 - 8:43
    A frequency table displays
    how often each distinct value
  • 8:43 - 8:45
    appears in a data set.
  • 8:45 - 8:48
    Let's have a closer look at
    the example from the beginning.
  • 8:48 - 8:52
    A company surveyed its
    employees to find out
  • 8:52 - 8:53
    how they get to work.
  • 8:53 - 8:57
    The options given were
    car, bicycle, walk,
  • 8:57 - 8:58
    and public transport.
  • 8:58 - 9:01
    Here are the results
    from 30 employees.
  • 9:01 - 9:06
    The first answered car, the next
    walk, and so on and so forth.
  • 9:06 - 9:10
    Now we can create a frequency
    table to summarize this data.
  • 9:10 - 9:15
    To do this, we simply enter
    the four possible options, car,
  • 9:15 - 9:19
    bicycle, walk, and public
    transport in the first column,
  • 9:19 - 9:22
    and then count how
    often they occurred.
  • 9:22 - 9:26
    From the table, it is evident
    that the most common mode
  • 9:26 - 9:30
    of transport among the employees
    is by car, with 14 employees
  • 9:30 - 9:32
    preferring it.
  • 9:32 - 9:35
    The frequency table thus
    provides a clear and concise
  • 9:35 - 9:37
    summary of the data.
  • 9:37 - 9:39
    But what if we
    have not only one,
  • 9:39 - 9:42
    but two categorical variables?
  • 9:42 - 9:45
    This is where the contingency
    table, also called crosstab,
  • 9:45 - 9:46
    comes in.
  • 9:46 - 9:50
    Imagine the company doesn't
    have one factory, but two.
  • 9:50 - 9:53
    One in Detroit and
    one in Cleveland.
  • 9:53 - 9:57
    So we also asked the employees
    at which location they work.
  • 9:57 - 10:00
    If we want to display
    both variables,
  • 10:00 - 10:03
    we can use a contingency table.
  • 10:03 - 10:07
    A contingency table provides
    a way to analyse and compare
  • 10:07 - 10:10
    the relationship between
    two categorical variables.
  • 10:10 - 10:14
    The rows of a contingency
    table represent the categories
  • 10:14 - 10:18
    of one variable, while the
    columns represent the categories
  • 10:18 - 10:20
    of another variable.
  • 10:20 - 10:23
    Each cell in the
    table shows the number
  • 10:23 - 10:26
    of observations that fall into
    the corresponding category
  • 10:26 - 10:27
    combination.
  • 10:27 - 10:31
    For example, the first cell
    shows that car and Detroit
  • 10:31 - 10:33
    were answered six times.
  • 10:33 - 10:35
    And what about the charts?
  • 10:35 - 10:38
    Let's take a look at
    the most important ones.
  • 10:38 - 10:41
    To do this, let's
    simply use datatab.net.
  • 10:41 - 10:44
    If you like, you can
    load this sample data
  • 10:44 - 10:47
    set with the link in
    the video description.
  • 10:47 - 10:50
    Or you just copy your
    own data into this table.
  • 10:50 - 10:54
    Here below you can see the
    variables-- distance to work,
  • 10:54 - 10:56
    mode of transport, and site.
  • 10:56 - 10:59
    Datatab gives you a hint about
    the level of measurement,
  • 10:59 - 11:02
    but you can also change it here.
  • 11:02 - 11:05
    Now, if we only click
    on Mode of Transport,
  • 11:05 - 11:08
    we get a frequency
    table and we can also
  • 11:08 - 11:11
    display the percentage values.
  • 11:11 - 11:16
    If we scroll down, we get a
    bar chart and a pie chart.
  • 11:16 - 11:19
    Here on the left, we can
    adjust for the settings.
  • 11:19 - 11:22
    For example, we can
    specify whether we
  • 11:22 - 11:26
    want to display the frequencies
    or the percentage values,
  • 11:26 - 11:31
    or whether the bars should
    be vertical or horizontal.
  • 11:31 - 11:35
    If you also select Site,
    we get a cross-table here
  • 11:35 - 11:39
    and a grouped bar
    chart for the diagrams.
  • 11:39 - 11:42
    Here we can specify
    whether we want the chart
  • 11:42 - 11:45
    to be grouped or stacked.
  • 11:45 - 11:48
    If we click on Distance to
    Work and Mode of Transport,
  • 11:48 - 11:52
    we get a bar chart where
    the height of the bar
  • 11:52 - 11:55
    shows the mean value of
    the individual groups.
  • 11:55 - 11:59
    Here we can also
    display the dispersion.
  • 11:59 - 12:03
    We also get a histogram,
    a box plot, a violin plot,
  • 12:03 - 12:05
    and a rainbow plot.
  • 12:05 - 12:09
    If you would like to know more
    about what a box plot, a violin
  • 12:09 - 12:13
    plot, and a rainbow plot are,
    take a look at my videos.
  • 12:13 - 12:16
    Let's continue with
    inferential statistics.
  • 12:16 - 12:18
    At the beginning, we
    briefly go through what
  • 12:18 - 12:21
    inferential statistics
    is, and then I'll
  • 12:21 - 12:24
    explain the six key
    components to you.
  • 12:24 - 12:27
    So what is inferential
    statistics?
  • 12:27 - 12:31
    Inferential statistics allows
    us to make a conclusion
  • 12:31 - 12:36
    or inference about a population
    based on data from a sample.
  • 12:36 - 12:39
    What is the population,
    and what is the sample?
  • 12:39 - 12:43
    The population is the whole
    group we are interested in.
  • 12:43 - 12:45
    If you want to study,
    the average height
  • 12:45 - 12:48
    of all adults in
    a United States,
  • 12:48 - 12:52
    then a population would be all
    adults in the United States.
  • 12:52 - 12:55
    The sample is a smaller
    group we actually study
  • 12:55 - 12:57
    chosen from the population.
  • 12:57 - 13:02
    For example, 150 adults were
    selected from the United States.
  • 13:02 - 13:04
    And now we want
    to use the sample
  • 13:04 - 13:07
    to make a statement
    about the population.
  • 13:07 - 13:10
    And here are the six
    steps how to do that.
  • 13:10 - 13:12
    Number one, hypothesis.
  • 13:12 - 13:16
    First, we need a statement, a
    hypothesis that we want to test.
  • 13:16 - 13:19
    For example, we want to
    know whether a drug will
  • 13:19 - 13:22
    have a positive effect
    on blood pressure
  • 13:22 - 13:25
    in people with high
    blood pressure.
  • 13:25 - 13:26
    But what's next?
  • 13:26 - 13:28
    In our hypothesis,
    we stated that we
  • 13:28 - 13:31
    would like to study people
    with high blood pressure.
  • 13:31 - 13:35
    So our population is all
    people with high blood pressure
  • 13:35 - 13:37
    in, for example, the US.
  • 13:37 - 13:42
    Obviously, we cannot collect
    data from the whole population,
  • 13:42 - 13:44
    so we take a sample
    from the population.
  • 13:44 - 13:48
    Now we use this sample to make a
    statement about the population.
  • 13:48 - 13:50
    But how do we do that?
  • 13:50 - 13:53
    For this we need
    a hypothesis test.
  • 13:53 - 13:57
    Hypothesis testing is a
    method for testing a claim
  • 13:57 - 14:00
    about a parameter in
    a population using
  • 14:00 - 14:01
    data, measured in a sample.
  • 14:01 - 14:02
    Great.
  • 14:02 - 14:04
    That's exactly what we need.
  • 14:04 - 14:06
    There are many different
    hypothesis tests,
  • 14:06 - 14:09
    and at the end of this
    video, I will give you
  • 14:09 - 14:11
    a guide on how to
    find the right test.
  • 14:11 - 14:13
    And of course, you
    can find videos
  • 14:13 - 14:17
    about many more hypothesis
    tests on our channel.
  • 14:17 - 14:19
    But how does a
    hypothesis test work?
  • 14:19 - 14:22
    When we conduct a
    hypothesis test,
  • 14:22 - 14:25
    we start with the research
    hypothesis, also called
  • 14:25 - 14:27
    alternative hypothesis.
  • 14:27 - 14:31
    This is the hypothesis we are
    trying to find evidence for.
  • 14:31 - 14:33
    In our case, the
    research hypothesis
  • 14:33 - 14:36
    is the drug has an
    effect on blood pressure.
  • 14:36 - 14:40
    But we cannot test this
    hypothesis directly with
  • 14:40 - 14:42
    the classical hypothesis test.
  • 14:42 - 14:44
    So we test the
    opposite hypothesis
  • 14:44 - 14:47
    that the drug has no
    effect on blood pressure.
  • 14:47 - 14:49
    But what does that mean?
  • 14:49 - 14:54
    First, we assume that the drug
    has no effect in the population.
  • 14:54 - 14:56
    We therefore assume
    that, in general,
  • 14:56 - 15:00
    people who take the drug and
    people who don't take the drug
  • 15:00 - 15:03
    have the same blood
    pressure on average.
  • 15:03 - 15:06
    If we now take a random
    sample and it turns out
  • 15:06 - 15:09
    that the drug has a large
    effect in the sample,
  • 15:09 - 15:15
    then we can ask how likely it
    is to draw such a sample, or one
  • 15:15 - 15:20
    that deviates even more if the
    drug actually has no effect.
  • 15:20 - 15:23
    So in reality, on average,
    there is no difference
  • 15:23 - 15:24
    in the population.
  • 15:24 - 15:29
    If this probability is very
    low, we can ask ourselves maybe
  • 15:29 - 15:32
    the drug has an effect
    in the population,
  • 15:32 - 15:36
    and we may have enough evidence
    to reject the null hypothesis
  • 15:36 - 15:38
    that the drug has no effect.
  • 15:38 - 15:42
    And it is this probability
    that is called the p-value.
  • 15:42 - 15:45
    Let's summarize this
    in three simple steps.
  • 15:45 - 15:48
    Number one, the null
    hypothesis states
  • 15:48 - 15:51
    that there is no difference
    in the population.
  • 15:51 - 15:54
    Number two, the
    hypothesis test calculates
  • 15:54 - 15:58
    how much the sample deviates
    from the null hypothesis.
  • 15:58 - 16:02
    Number three, the p-value
    indicates the probability
  • 16:02 - 16:07
    of getting a sample that
    deviates as much as our sample,
  • 16:07 - 16:10
    or one that even deviates
    more than our sample,
  • 16:10 - 16:13
    assuming the null
    hypothesis is true.
  • 16:13 - 16:17
    But at what point is the
    p-value small enough for us
  • 16:17 - 16:19
    to reject the null hypothesis?
  • 16:19 - 16:23
    This brings us to the next
    point, statistical significance.
  • 16:23 - 16:27
    If the p-value is less than
    a predetermined threshold,
  • 16:27 - 16:30
    the result is considered
    statistically significant.
  • 16:30 - 16:34
    This means that the result
    is unlikely to have occurred
  • 16:34 - 16:37
    by chance alone, and that
    we have enough evidence
  • 16:37 - 16:39
    to reject the null hypothesis.
  • 16:39 - 16:43
    This threshold is often 0.05.
  • 16:43 - 16:45
    Therefore, a small
    p-value suggests
  • 16:45 - 16:48
    that the observed
    data our sample
  • 16:48 - 16:51
    is inconsistent with
    the null hypothesis.
  • 16:51 - 16:54
    This leads us to reject the
    null hypothesis in favor
  • 16:54 - 16:56
    of the alternative hypothesis.
  • 16:56 - 16:59
    A large p-value suggests
    that the observed data
  • 16:59 - 17:02
    is consistent with
    the null hypothesis,
  • 17:02 - 17:04
    and we will not reject it.
  • 17:04 - 17:07
    But note, there is always
    a risk of making an error.
  • 17:07 - 17:11
    A small p-value does not prove
    that the alternative hypothesis
  • 17:11 - 17:12
    is true.
  • 17:12 - 17:16
    It is only saying that it is
    unlikely to get such a result,
  • 17:16 - 17:21
    or a more extreme when the
    null hypothesis is true.
  • 17:21 - 17:23
    And again, if the null
    hypothesis is true,
  • 17:23 - 17:26
    there is no difference
    in the a population.
  • 17:26 - 17:29
    And the other way
    around, a large p-value
  • 17:29 - 17:32
    does not prove that the
    null hypothesis is true.
  • 17:32 - 17:36
    It is only saying that it is
    likely to get such a result,
  • 17:36 - 17:40
    or a more extreme when the
    null hypothesis is true.
  • 17:40 - 17:42
    So there are two
    types of errors,
  • 17:42 - 17:45
    which are called type
    I and type II error.
  • 17:45 - 17:47
    Let's start with
    the type I error.
  • 17:47 - 17:50
    In hypothesis testing,
    a type I error
  • 17:50 - 17:54
    occurs when a true null
    hypothesis is rejected.
  • 17:54 - 17:58
    So in reality, the null
    hypothesis is true,
  • 17:58 - 18:01
    but we make the decision to
    reject the null hypothesis.
  • 18:01 - 18:06
    In our example, it means that
    the drug actually had no effect.
  • 18:06 - 18:10
    So in reality, there is no
    difference in blood pressure.
  • 18:10 - 18:12
    Whether the drug
    is taken or not,
  • 18:12 - 18:15
    the blood pressure remains
    the same in both cases.
  • 18:15 - 18:19
    But our sample happened to
    be so far off the true value
  • 18:19 - 18:23
    that we mistakenly thought
    the drug was working,
  • 18:23 - 18:28
    and a type II error occurs when
    a false null hypothesis is not
  • 18:28 - 18:29
    rejected.
  • 18:29 - 18:32
    So in reality, the null
    hypothesis is false,
  • 18:32 - 18:36
    but we make the decision not
    to reject the null hypothesis.
  • 18:36 - 18:40
    In our example, this means
    the drug actually did work.
  • 18:40 - 18:42
    There is a difference
    between those
  • 18:42 - 18:45
    who have taken the drug
    and those who have not.
  • 18:45 - 18:50
    But it was just a coincidence
    that the sample taken did not
  • 18:50 - 18:53
    show much difference,
    and we mistakenly
  • 18:53 - 18:56
    thought the drug
    was not working.
  • 18:56 - 18:58
    And now I'll show
    you how Datatab
  • 18:58 - 19:01
    helps you to find a
    suitable hypothesis test,
  • 19:01 - 19:05
    and of course, calculates it and
    interprets the results for you.
  • 19:05 - 19:10
    Let's go to datatab.net and
    copy your own data in here.
  • 19:10 - 19:13
    We will just use this
    example data set.
  • 19:13 - 19:15
    After copying your
    data into the table,
  • 19:15 - 19:18
    the variables appear down here.
  • 19:18 - 19:21
    Datatab automatically
    tries to determine
  • 19:21 - 19:24
    the correct level
    of measurement,
  • 19:24 - 19:27
    but you can also
    change it up here.
  • 19:27 - 19:32
    Now we just click on hypothesis
    testing and select the variables
  • 19:32 - 19:34
    we want to use for
    the calculation
  • 19:34 - 19:36
    of a hypothesis test.
  • 19:36 - 19:40
    Datatab will then suggest a
    suitable test, for example,
  • 19:40 - 19:44
    in this case, a chi-square
    test, or in that case,
  • 19:44 - 19:47
    an analysis of variance.
  • 19:47 - 19:51
    Then you will see the
    hypotheses and the results.
  • 19:51 - 19:54
    If you're not sure how
    to interpret the results,
  • 19:54 - 19:56
    click on Summary in words.
  • 19:56 - 19:59
    Further, you can
    check the assumptions
  • 19:59 - 20:03
    and decide whether you want
    to calculate a parametric
  • 20:03 - 20:05
    or a nonparametric test.
  • 20:05 - 20:07
    You can find out the
    difference between
  • 20:07 - 20:12
    parametric and nonparametric
    tests in my next video.
  • 20:12 - 20:16
    Thanks for watching, and I
    hope you enjoyed the video.
  • 20:16 - 20:20
Title:
What is Statistics? A Beginner's Guide to Statistics (Data Analytics)!
Description:

more » « less
Video Language:
English
Duration:
20:21

English subtitles

Revisions