Wednesday, January 2, 2019

Type I error and type II error simplified

PSM / COMMUNITY MEDICINE by Dr Abhishek Jaiswal is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Based on a work at learnpsm@blogspot.com.
Permissions beyond the scope of this license may be available at jaiswal.fph@gmail.com.

Post hoc tests

Post-hoc in latin means "after this". Post-hoc tests are used to analyse the result of the experimental data. They are based on family-wise error rate.

Family-wise error rate: It is the probability of making at least one Type I Error, when we are performing multiple simultaneous tests. It is also known as alpha inflation or Cumulative Type I error.

The most common post-hoc tests are:

Bonferroni procedure

Duncan's new multiple range test

Dunn's multiple comparison test

Fisher's least significant difference

Holm-Bonferroni procedure

Newman-Keuls

Rodger's method

Scheffe's Method

Tukey's test

Dunnett's correction

Benjamin-Hochberg procedure

Bonferroni Procedure

This multiple-comparison post-hoc correction is used when you are performing many independent or dependent statistical tests at the same time. The problem with running many simultaneous tests is that the probability of a significant result increases with each test run. This post-hoc test sets the significance cut off at α/n.

Imagine looking for the Ace of Clubs in a deck of cards: if you pull one card from the deck, the odds are pretty low (1/52) that you’ll get the Ace of Clubs. Try again (and try perhaps 50 times), you’ll probably end up getting the Ace. The same principal works with hypothesis testing: the more simultaneous tests you run, the more likely you’ll get a “significant” result. Let’s say you were running 50 tests simultaneously with an alpha level of 0.05. The probability of observing at least one significant event due to chance alone is:
P (significant event) = 1 – P(no significant event)
= 1 – (1-0.05)50 = 0.92.
That’s almost certain (92%) that you’ll get at least one significant result.

Holm-Bonferroni Method
The ordinary Bonferroni method is sometimes viewed as too conservative. Holm’s sequential Bonferroni post-hoc test is a less strict correction for multiple comparisons.

Duncan’s new multiple range test (MRT)
When you run Analysis of Variance (ANOVA), the results will tell you if there is a difference in means. However, it won’t pinpoint the pairs of means that are different. Duncan’s Multiple Range Test will identify the pairs of means (from at least three) that differ. The MRT is similar to the LSD, but instead of a t-value, a Q Value is used.

Fisher’s Least Significant Difference (LSD)
A tool to identify which pairs of means are statistically different. Essentially the same as Duncan’s MRT, but with t-values instead of Q values.

Newman-Keuls
Like Tukey’s, this post-hoc test identifies sample means that are different from each other. Newman-Keuls uses different critical values for comparing pairs of means. Therefore, it is more likely to find significant differences.

Rodger’s Method
Considered by some to be the most powerful post-hoc test for detecting differences among groups. This test protects against loss of statistical power as the degrees of freedom increase.

Scheffé’s Method
Used when you want to look at post-hoc comparisons in general (as opposed to just pairwise comparisons). Scheffe’s controls for the overall confidence level. It is customarily used with unequal sample sizes.

Tukey’s Test
The purpose of Tukey’s test is to figure out which groups in your sample differ. It uses the “Honest Significant Difference,” a number that represents the distance between groups, to compare every mean with every other mean.

Dunnett’s correction
Like Tukey’s this post-hoc test is used to compare means. Unlike Tukey’s, it compares every mean to a control mean.

Benjamin-Hochberg (BH) procedure
If you perform a very large amount of tests, one or more of the tests will have a significant result purely by chance alone.

The Benjamini-Hochberg Procedure is a powerful tool that decreases the false discovery rate.

The false discovery rate (FDR) is the expected proportion of type I errors.

Adjusting the rate helps to control for the fact that sometimes small p-values (less than 5%) happen by chance, which could lead you to incorrectly reject the true null hypotheses. In other words, the B-H Procedure helps you to avoid Type I errors (false positives).

A p-value of 5% means that there’s only a 5% chance that you would get your observed result if the null hypothesis were true. In other words, if you get a p-value of 5%, it’s highly unlikely that your null hypothesis is not true and should be thrown out. But it’s only a probability–many times, true null hypotheses are thrown out just because of the randomness of results.

example: Let’s say you have a group of 100 patients who you know are free of a certain disease. Your null hypothesis is that the patients are free of disease and your alternate is that they do have the disease. If you ran 100 statistical tests at the 5% alpha level, roughly 5% of results would report as false positives.

There’s not a lot you can do to avoid this: when you run statistical tests, a fraction will always be false positives.However, running the B-H procedure will decrease the number of false positives.

Put the individual p-values in ascending order.
Assign ranks to the p-values. For example, the smallest has a rank of 1, the second smallest has a rank of 2.
Calculate each individual p-value’s Benjamini-Hochberg critical value, using the formula (i/m)Q, where:
- i = the individual p-value’s rank,
- m = total number of tests,
- Q = the false discovery rate (a percentage, chosen by you).
Compare your original p-values to the critical B-H from Step 3; find the largest p value that is smaller than the critical value.

Statistics

Statistics is a field of study concerned with (1) the collection, organization summarization, and analysis of data; (2) the drawing of inference about a body of data when only a part of data is observed.

A descriptive measure computed from the data of a sample is called a statistic

A descriptive measure computed from the data of a population is called a parameter

Variable: A characteristic that takes different values in different persons, places or things.

Quantitative variable: that can be measured in the usual sense. Measurement convey information about amount.

Qualitative variable: measurement consist of categorization. Measurement convey information regarding attribute.

Random variable: when the values arise as a result of chance factor, so they cannot be predicted in advance

Discrete variable: is characterized by gaps or interruptions in the values that it can assume

Continuous variable: doesn’t possess the gaps or interruptions characteristic of a discrete variable

Population: largest collection of entities for which we have an interest at a particular time

Sample: a part of the population that we took for studying

Measurement: assignment of numbers to objects or events according to as set of rules. Measurement may be carried out under different sets of rules

Measurement Scale:

Nominal scale: naming the observations or classifying them into various mutually exclusive and collective exhaustive categories

Ordinal scale: when observations are not only from different categories but also can be ranked according to some criteria

Interval scale: in addition to ordering the measurement we can also know the distance between the two measurements

Interval scale unlike the nominal and ordinal scales is a truly quantitative scale

Ratio scale: highest level of measurement. Equality of ratios as well as equality of the intervals may be determined.

Fundamental to the ratio scale is true zero point

Simple random sample: If a sample of size “n” is drawn from a population of size “N” in such a way that every possible sample of size “n” has the same chance of being selected, the sample is called simple random sampling

As a rule, in practice, sampling is always done without replacement.

Systematic sampling: first we calculate the total number required for the sample, a random number table is then used to give a starting number (x). A second number determined by the sample size is selected to define the sampling interval (k). Now we select individuals in this way

x, x+k, x+2k, x+3k, …….

Stratified random sampling: population is stratified into strata. And a random sampling is taken in each strata.

Box and whisker plots (Boxplot)

Box and whisker plots (Boxplot):

Represents the variable of interest on horizontal axis

A box is drawn such a way that left end of box align with Q₁, and the right end align with Q₃.

Divide the box into two parts by a vertical line that aligns with the median Q₂

Draw a horizontal line called a whisker from the left end of the box to the point that align with the smallest measurement of the data set

Draw another horizontal line or whisker from the right end of the box to the point that align with the largest measurement of the data set

Outliers: it is an observation whose value, “x”, either exceeds the value of the third quartile by a magnitude greater than 1.5(IQR) or is less than the value of first quartile by a magnitude greater than 1.5(IQR).

That is

{Q₁- 1.5(IQR)} > x > {Q₃ + 1.5(IQR)}

Stem and leaf

Stem and leaf: we partition each measurement into two parts. The first part is called the stem, the second is called the leaf. The stem consists of one or more of the initial digits of the measurement, the leaf is composed of one or more of the remaining digits. All the partitioned numbers are shown together in a single display; the stems form an ordered column with the smallest stem at the top and the largest at the bottom. We include in the stem column all stems within the range of the data even when a measurement with that stem is not in the data set. The rows of display contain the leaves, ordered and listed to the right of their respective stems. When leaves consist of more than one digit, all digits after the first may be deleted. Decimal when present in data is omitted in the stem and leaf display. The stems are separated from their leaves by a vertical line.

An advantage of it over histogram is that it preserves the information contained in the individual measurements.

Are most effective with relatively small data sets

Meta-analysis

Meta-analysis:

Effect size: The effect size is a value which reflects the magnitude of the treatment effect or the strength of a relationship between two variables, is the unit of currency in meta-analysis. (Black square)

Precision: It is the C.I. of the effect-size.

Study weight: It is the weight assigned to each study. The weight assigned is dependent on the precision. (The larger the square the larger the study weight)

Summary effect: It is the weighted mean of the individual effects. The mechanism used to assign the weights depends on our assumptions about the distribution of effect sizes from which the studies were sampled. Under the fixed-effect model, the assumption is that all the studies in the analysis share the same true effect size. The summary effect then is the estimate of this common effect size. Under the random-effect model, the assumption is true effect size varies from study to study. The summary effect here will be the mean of the distribution of effect sizes. (Diamond)

Precision: The location of the diamond represents the effect size. Its width reflects the precision of the estimate.

In fixed effect model: the weight of individual study is reciprocal of that study’s variance

In Random-effect model: We assume that the true effect is normally distributed

To be continued...

Sensitivity and Specificity

Validity (accuracy): Extent to which a test measures what it is supposed to measure.

Sensitivity: 1. Ability of test to correctly classify an individual as diseased.

2. Probability of being test positive when disease is present.

	D+	D-
T+	A	B
T-	C	D

SnNOUT: Highly sensitive test if negative rules out the disease

Specificity: 1. Ability of test to correctly classify an individual as disease free.

2. Probability of being test negative when disease is absent.

SpPIN: Highly specific test if positive rules in the disease.

PPV: Positive predictive value:

1. % of patients with positive test who actually have the disease

2. Probability of patient having disease when test is positive

NPV: Negative predictive value:

1. % of patients having disease when test is positive

2. probability of patient having disease when test is positive

Bayes Theorem:

PPV: Highly dependent on prevalence of disease

Parallel testing:

A-test or B-test: (A, B) sensitivity or specificity

Combined sensitivity: Sn= A+B-AB

Combined specificity: Sp=A*B

Sensitivity will increase and specificity will decrease

Series testing:

A-test or B-test: (A, B) sensitivity or specificity

Combined sensitivity: Sn= A*B

Combined specificity: Sp=A+B-AB

Sensitivity will decrease and specificity will increase

Mantel Haenszel

Mantel Haenszel method is one of the method to control for confounders. It gives a single summary measure of association which provides a weighted average of RR or OR across different strata of confounding factors.

To calculates in this method we first have to divide the original two by two table by different strata of confounding variable and then we calculate the weighted average of RR or OR.

formula

                                          outcome (O)
                                             +      -
RISK FACTOR (E)     +      a.    b.       a+b
- c.    d. c+d

                                             a+c. b+d.

RR = (a/(a+b)) ÷ (c/(c+d)) = a(c+d)÷ c(a+b)

OR = a/b. ÷ c/d. = ad/bc

RR (mh) = summation (a(c+d)÷n) ÷ summation (c(a+b) ÷n)

OR (mh) = summation (ad/n) ÷summation (bc/n)

summation is sigma, that is sum of all the values in the different two by two tables

Skewness and Kurtosis

Skewness: If a histogram/frequency polygon of a distribution is asymmetric, the distribution is said to be skewed.

If the distribution is not symmetric because its graph extends further to the right than to the left, that is, if it has a long tail to the right, we say that the distribution is skewed to the right or it is positively skewed.

A distribution will be skewed to the right, or positively skewed, if its mean is greater than its mode.

If the distribution is not symmetric because its graph extends further to the left than to the right, that is, if it has a long tail to the left, we say that the distribution is skewed to the left or it is negatively skewed.

A distribution will be skewed to the left, or negatively skewed, if its mean is less than its mode.

Skewness >0 indicates positive skewness

<0 indicates negative skewness

Kurtosis: it is a measure of the degree to which the distribution is peaked or flat in comparison to a normal distribution whose graph is characterized by a bell shaped appearance.

Platykurtic: the graph exhibits a flattened appearance

Mesokurtic: normal, bell shaped graph

Leptokurtic: the graph exhibits a more peaked appearance

Platykurtic kurtosis <0

Mesokurtic kurtosis =0

Leptokurtic kurtosis >0

Kurtosis: it is a measure of the degree to which the distribution is peaked or flat in comparison to a normal distribution whose graph is characterized by a bell shaped appearance.

Platykurtic: the graph exhibits a flattened appearance

Mesokurtic: normal, bell shaped graph

Leptokurtic: the graph exhibits a more peaked appearance

Measures of Central tendency

Measures of Central tendency:

Mean: airthmetic average of the overall dataset

Properties:

1. Uniqueness: for a given set of data, there is only one arithmetic mean

2. Simplicity: easy to compute

3. Extreme values have drastic influence on mean

Median: middle value of the dataset when it is arranged in ascending order
its the middle value if the dataset is odd, and average of the two middle values if the dataset is even

it is the value that divides the dataset into two equal parts such that the no. of values equal to or greater than the median is equal to the number of values equal to or less than the median, when the data set is arranged in order of magnitude

In odd data set: it is the (n+1)/2 th value

In even data set: it is the average of n/2 and n/2+1 th value

Properties:

1. Uniqueness: for a given set of data, there is only one median

2. Simplicity: easy to compute

3. Not drastically affected by extreme values as in mean

Mean and median are special cases of a family of parameters known as location parameters, because they can be used to designate certain positions on the horizontal axis when the distribution of a variable is graphed (“locate” the distribution on the horizontal axis)

Mode: most frequent value in the dataset

It may be used also for describing qualitative data.

Suppose the height of the trees (metres) in a garden is represented by following dataset
1,2,3,4,6,7,4,2,3,8,9,1,2

first let us arrange this in ascending order

1,1,2,2,2,3,3,4,4,6,7,8,9

most frequent value in dataset in here is "2" = Mode

now the total no. of values in the datset is 13

Median will be value that is middle value that is 7th one
Median =3

Mean = (1+1+2+2+2+3+3+4+4+6+7+8+9)/13
= 4