Wednesday, January 2, 2019

Statistics

Statistics is a field of study concerned with (1) the collection, organization summarization, and analysis of data; (2) the drawing of inference about a body of data when only a part of data is observed.

A descriptive measure computed from the data of a sample is called a statistic


A descriptive measure computed from the data of a population is called a parameter
Variable: A characteristic that takes different values in different persons, places or things.

Quantitative variable: that can be measured in the usual sense. Measurement convey information about amount.

Qualitative variable: measurement consist of categorization. Measurement convey information regarding attribute.

Random variable: when the values arise as a result of chance factor, so they cannot be predicted in advance 

Discrete variable: is characterized by gaps or interruptions in the values that it can assume 

Continuous variable: doesn’t possess the gaps or interruptions characteristic of a discrete variable

Population: largest collection of entities for which we have an interest at a particular time

Sample: a part of the population that we took for studying 

Measurement: assignment of numbers to objects or events according to as set of rules. Measurement may be carried out under different sets of rules

Measurement Scale: 

Nominal scale: naming the observations or classifying them into various mutually exclusive and collective exhaustive categories

Ordinal scale: when observations are not only from different categories but also can be ranked according to some criteria 

Interval scale: in addition to ordering the measurement we can also know the distance between the two measurements
Interval scale unlike the nominal and ordinal scales is a truly quantitative scale

Ratio scale: highest level of measurement. Equality of ratios as well as equality of the intervals may be determined. 

Fundamental to the ratio scale is true zero point


Simple random sample: If a sample of size “n” is drawn from a population of size “N” in such a way that every possible sample of size “n” has the same chance of being selected, the sample is called simple random sampling
As a rule, in practice, sampling is always done without replacement.

Systematic sampling: first we calculate the total number required for the sample, a random number table is then used to give a starting number (x). A second number determined by the sample size is selected to define the sampling interval (k). Now we select individuals in this way
x, x+k, x+2k, x+3k, …….


Stratified random sampling: population is stratified into strata. And a random sampling is taken in each strata.
Creative Commons License
PSM / COMMUNITY MEDICINE by Dr Abhishek Jaiswal is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Based on a work at learnpsm@blogspot.com.
Permissions beyond the scope of this license may be available at jaiswal.fph@gmail.com.

Box and whisker plots (Boxplot)


Box and whisker plots (Boxplot):

Represents the variable of interest on horizontal axis
A box is drawn such a way that left end of box align with Q1, and the right end align with Q3.
Divide the box into two parts by a vertical line that aligns with the median Q2
Draw a horizontal line called a whisker from the left end of the box to the point that align with the smallest measurement of the data set
Draw another horizontal line or whisker from the right end of the box to the point that align with the largest measurement of the data set






Outliers: it is an observation whose value, “x”, either exceeds the value of the third quartile by a magnitude greater than 1.5(IQR) or is less than the value of first quartile by a magnitude greater than 1.5(IQR).

That is

{Q1- 1.5(IQR)} > x > {Q3 + 1.5(IQR)}

Creative Commons License
PSM / COMMUNITY MEDICINE by Dr Abhishek Jaiswal is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Based on a work at learnpsm@blogspot.com.
Permissions beyond the scope of this license may be available at jaiswal.fph@gmail.com.

Stem and leaf


Stem and leaf: we partition each measurement into two parts. The first part is called the stem, the second is called the leaf. The stem consists of one or more of the initial digits of the measurement, the leaf is composed of one or more of the remaining digits. All the partitioned numbers are shown together in a single display; the stems form an ordered column with the smallest stem at the top and the largest at the bottom. We include in the stem column all stems within the range of the data even when a measurement with that stem is not in the data set. The rows of display contain the leaves, ordered and listed to the right of their respective stems. When leaves consist of more than one digit, all digits after the first may be deleted. Decimal when present in data is omitted in the stem and leaf display. The stems are separated from their leaves by a vertical line.

An advantage of it over histogram is that it preserves the information contained in the individual measurements.
Are most effective with relatively small data sets

Creative Commons License
PSM / COMMUNITY MEDICINE by Dr Abhishek Jaiswal is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Based on a work at learnpsm@blogspot.com.
Permissions beyond the scope of this license may be available at jaiswal.fph@gmail.com.

Meta-analysis


Meta-analysis:

Effect size: The effect size is a value which reflects the magnitude of the treatment effect or the strength of a relationship between two variables, is the unit of currency in meta-analysis. (Black square)
Precision: It is the C.I. of the effect-size.
Study weight: It is the weight assigned to each study. The weight assigned is dependent on the precision. (The larger the square the larger the study weight)
Summary effect: It is the weighted mean of the individual effects. The mechanism used to assign the weights depends on our assumptions about the distribution of effect sizes from which the studies were sampled. Under the fixed-effect model, the assumption is that all the studies in the analysis share the same true effect size. The summary effect then is the estimate of this common effect size. Under the random-effect model, the assumption is true effect size varies from study to study. The summary effect here will be the mean of the distribution of effect sizes. (Diamond)
Precision: The location of the diamond represents the effect size. Its width reflects the precision of the estimate.

In fixed effect model: the weight of individual study is reciprocal of that study’s variance




In Random-effect model: We assume that the true effect is normally distributed


To be continued...

Creative Commons License
PSM / COMMUNITY MEDICINE by Dr Abhishek Jaiswal is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Based on a work at learnpsm@blogspot.com.
Permissions beyond the scope of this license may be available at jaiswal.fph@gmail.com.

Sensitivity and Specificity


Validity (accuracy): Extent to which a test measures what it is supposed to measure.

Sensitivity:      1. Ability of test to correctly classify an individual as diseased.
                        2. Probability of being test positive when disease is present.


D+
D-
T+
A
B
T-
C
D



SnNOUT: Highly sensitive test if negative rules out the disease

Specificity:      1. Ability of test to correctly classify an individual as disease free.
                        2. Probability of being test negative when disease is absent.

                                                                                                



SpPIN: Highly specific test if positive rules in the disease.

PPV: Positive predictive value:
1.     % of patients with positive test who actually have the disease
2.     Probability of patient having disease when test is positive








NPV: Negative predictive value:
1.     % of patients having disease when test is positive
2.     probability of patient having disease when test is positive






Bayes Theorem:





PPV: Highly dependent on prevalence of disease






Parallel testing:

A-test or B-test: (A, B) sensitivity or specificity
Combined sensitivity: Sn= A+B-AB
Combined specificity: Sp=A*B
Sensitivity will increase and specificity will decrease

Series testing:

A-test or B-test: (A, B) sensitivity or specificity
Combined sensitivity: Sn= A*B
Combined specificity: Sp=A+B-AB
Sensitivity will decrease and specificity will increase

Creative Commons License
PSM / COMMUNITY MEDICINE by Dr Abhishek Jaiswal is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Based on a work at learnpsm@blogspot.com.
Permissions beyond the scope of this license may be available at jaiswal.fph@gmail.com.

Mantel Haenszel

Mantel Haenszel method is one of the method to control for confounders. It gives a single summary measure of association which provides a weighted average of RR or OR across different strata of confounding factors.
To calculates in this method we first have to divide the original two by two table by different strata of confounding variable and then we calculate the weighted average of RR or OR.
formula
                                          outcome (O)
                                             +      -   
RISK FACTOR (E)     +      a.    b.       a+b
                                     -       c.    d.       c+d
         
                                             a+c.  b+d.  
RR =   (a/(a+b)) ÷ (c/(c+d))     =    a(c+d)÷ c(a+b)
OR =    a/b.  ÷   c/d.        =   ad/bc
RR (mh)     = summation (a(c+d)÷n) ÷ summation (c(a+b) ÷n)
OR (mh)     = summation (ad/n) ÷summation (bc/n)
summation is sigma, that is sum of all the values in the different two by two tables
Creative Commons License
PSM / COMMUNITY MEDICINE by Dr Abhishek Jaiswal is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Based on a work at learnpsm@blogspot.com.
Permissions beyond the scope of this license may be available at jaiswal.fph@gmail.com.

Skewness and Kurtosis

Skewness: If a histogram/frequency polygon of a distribution is asymmetric, the distribution is said to be skewed. 

If the distribution is not symmetric because its graph extends further to the right than to the left, that is, if it has a long tail to the right, we say that the distribution is skewed to the right or it is positively skewed. 
A distribution will be skewed to the right, or positively skewed, if its mean is greater than its mode.

If the distribution is not symmetric because its graph extends further to the left than to the right, that is, if it has a long tail to the left, we say that the distribution is skewed to the left or it is negatively skewed. 
A distribution will be skewed to the left, or negatively skewed, if its mean is less than its mode.






Skewness >0 indicates positive skewness

                 <0 indicates negative skewness



Kurtosis: it is a measure of the degree to which the distribution is peaked or flat in comparison to a normal distribution whose graph is characterized by a bell shaped appearance.

Platykurtic: the graph exhibits a flattened appearance 
Mesokurtic: normal, bell shaped graph
Leptokurtic: the graph exhibits a more peaked appearance 





Platykurtic kurtosis <0
Mesokurtic kurtosis =0
Leptokurtic kurtosis >0

Kurtosis: it is a measure of the degree to which the distribution is peaked or flat in comparison to a normal distribution whose graph is characterized by a bell shaped appearance.

Platykurtic: the graph exhibits a flattened appearance
Mesokurtic: normal, bell shaped graph
Leptokurtic: the graph exhibits a more peaked appearance

Creative Commons License
PSM / COMMUNITY MEDICINE by Dr Abhishek Jaiswal is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Based on a work at learnpsm@blogspot.com.
Permissions beyond the scope of this license may be available at jaiswal.fph@gmail.com.

Measures of Central tendency

Measures of Central tendency:

Mean: airthmetic average of the overall dataset


Properties:
1.     Uniqueness: for a given set of data, there is only one arithmetic mean 
2.     Simplicity: easy to compute 

3.     Extreme values have drastic influence on mean 

Median: middle value of the dataset when it is arranged in ascending order
its the middle value if the dataset is odd, and average of the two middle values if the dataset is even

it is the value that divides the dataset into two equal parts such that the no. of values equal to or greater than the median is equal to the number of values equal to or less than the median, when the data set is arranged in order of magnitude

In odd data set: it is the (n+1)/2 th value
In even data set: it is the average of n/2 and n/2+1 th value

Properties:
1.     Uniqueness: for a given set of data, there is only one median 
2.     Simplicity: easy to compute 

3.     Not drastically affected by extreme values as in mean 


Mean and median are special cases of a family of parameters known as location parameters, because they can be used to designate certain positions on the horizontal axis when the distribution of a variable is graphed (“locate” the distribution on the horizontal axis)

Mode: most frequent value in the dataset


It may be used also for describing qualitative data.

Suppose the height of the trees (metres) in a garden is represented by following dataset
1,2,3,4,6,7,4,2,3,8,9,1,2

 first let us arrange this in ascending order

1,1,2,2,2,3,3,4,4,6,7,8,9

most frequent value in dataset in here is "2" = Mode

now the total no. of values in the datset is 13

Median will be value that is middle value that is 7th one
Median =3

Mean = (1+1+2+2+2+3+3+4+4+6+7+8+9)/13
      = 4

Creative Commons License
PSM / COMMUNITY MEDICINE by Dr Abhishek Jaiswal is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Based on a work at learnpsm@blogspot.com.
Permissions beyond the scope of this license may be available at jaiswal.fph@gmail.com.

Vaccine efficacy and effectiveness

Vaccine efficacy: 
Vaccine efficacy- % reduction in disease incidence in a vaccinated group compared to an unvaccinated group under optimal conditions 
Reduction in the chance or odds of developing clinical disease after vaccination relative to the chance or odds when unvaccinated. Vaccine efficacy measures direct protection (i.e. protection induced by vaccination in the vaccinated population sample). WHO
Reduction in the chance of developing the disease after vaccination relative to the chance in unvaccinated as determined in a prospectiverandomised controlled study. EMA
The ability of a vaccine to provide protection against disease under ideal circumstances (e.g. during a clinical trial). CDC

Vaccine effectiveness: 
Vaccine effectiveness- ability of vaccine toprevent outcomes of interest in the “real world”
The protection conferred by vaccination in a certain population.
Measures direct and indirectprotection (i.e. protection to non- vaccinated persons). WHO
Direct (vaccine induced) andindirect (population related) protection during routine use, estimated from observationalcohort studies. EMA
The ability of a vaccine to provide protection against disease when used under field conditions(routine practice). CDC


Vaccine impact: 
Compares the burden of disease caused by the pathogen included in the vaccine, in a population that has received the vaccine, to the burden of disease in a population that has not received the vaccine.

Vaccine effects
 Direct effect:
    Protection in vaccinated persons only
    Induced by individual vaccination
 Indirect effect:
    Effect of a vaccination programme
    At population level, including non-vaccinated


Direct effect
 Depends on vaccine and host characteristics Compares disease in vaccinated to disease in
and unvaccinated in one population Measured in clinical trials or in real life

Efficacy: protection measured in clinical trials Ideal conditions of administration Selected subjects (e.g. underlying diseases often excluded)

Effectiveness: protection if measured in real life situation
  Routine vaccination, including incomplete schedule, delayed administration
  Any person of the target group

Herd effects or indirect: 


Effect of widespread vaccination: protection by reduced transmission in the population, when large proportions are vaccinated

Two vaccine exposures
 Individual vaccination 
 Vaccination programme Direct effect only Direct + indirect efect

Effect of programme -> sum of effects of vaccination on vaccinated
• If there is an indirect effect

How to measure vaccine effects?

*Halloran et al


Direct effect: Direct effect of vaccination on those vaccinated
  •   Exposure = individual vaccination
  •   Vaccinated vs. non vaccinated, same population 

    Methods: study design must cancel the indirect effect of programme:
    cohortstudies (from same population)
      case control studies #      
      screening methods #
      Broome method #

    Controls have same exposure/coverage than population giving rise to cases


    Indirect, total and overall effect

    Comparing two separate but similar populations, one with vaccination, the other without:
     Vaccinated persons: total effect
     Non-vaccinated: indirect effect
     All persons: overall effect
    Design:
    •   Population separated by time or place
        Pre and post-vaccine comparison (time)
    •   Cluster randomized trials

        Statistical or mathematical modelling
           Exposure here is programme


Impact of vaccination programme

WHO: correspond to overall effectVaccination programme
Total population being compared



Major confusion: direct and overall

Direct effect
Overall effect
of individual vaccination
of a vaccination programme
on vaccinated persons
in a population, in which a fraction only is vaccinated
Pre or post-licensure
Post-licensure only
Does not include indirect effect
Direct + indirect effects Potentially replacement disease
Compares groups from same population Need to know vaccine status
Compares 2 populations
No need to know vaccine status

The incremental cost-effectiveness ratio (ICER)

The incremental cost-effectiveness ratio (ICER) is used to summarise the cost-effectiveness of a health care intervention
It is defined by the difference in cost between two possible interventions, divided by the difference in their effect
It represents the average incremental cost associated with one additional unit of the measure of effect
The ICER can be estimated as:
Creative Commons License
PSM / COMMUNITY MEDICINE by Dr Abhishek Jaiswal is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Based on a work at learnpsm@blogspot.com.
Permissions beyond the scope of this license may be available at jaiswal.fph@gmail.com.