Wednesday, January 2, 2019

Biostatistics MCQ (AIIMS)

Biostatistics MCQ (AIIMS)

A physician, after examining a group of patients of a certain disease, classifies the condition of each one as ‘Normal’, ‘Mild’, ‘Moderate’ or ‘Severe’. Which one of the following is the scale of measurement that is being adopted for classification of the disease condition?
[AIIMS Nov 92 Dec 98, May 94]
(a) Normal
(b) Interval
(c) Ratio
(d) Ordinal

there is an intrinsic order in ordinal data set. e.g. Mild, Moderate, Severe 


In the WHO recommended EPI cluster sampling for assessing primary immunization coverage, the age group of children to be surveyed is
(a) 0-12 months [AIIMS Nov1992, & 2008]
(b) 6-12 months
(c) 9-12 months
(d) 12-23 months

children aged 12–23 months, if the final primary vaccination is at 9 months of age – this is the most commonly chosen target population (Ref: WHO EPI cluster sampling)


If a biochemical test gives the same reading for a sample on repeated testing, it is inferred that the measurement is [AIIMS June 1992]
(a) Precise
(b) Accurate
(c) Specific
(d) Sensitive

Precision means repeatability 

Mean, Median and Mode are [AIIMS Dec 94, & Nov 2007]
(a) Measures of dispersion
(b) Measures association between two variables
(c) Test of significance
(d) Measures of central tendency

Following are the sampling techniques used to conduct community health surveys, except
(a) Simple random [AIIMS May 1994]
(b) Systematic random
(c) Stratified random
(d) Cluster testing

Median weight of 100 children was 12 kgs. The standard deviation was 3. Calculate the percent coefficient of variance [AIIMS May 1994]
(a) 25%
(b) 35%
(c) 45%
(d) 55%

In statistical literature data are broadly classified as interval scale data, ordinal scale data & categorical data. Blood groups will be an example for: [AIIMS Dec 1994]
(a) Interval scale data
(b) Ordinal scale data
(c) Categorical data
(d) None of the above

Chance of passing a Genetic disease “y” trait by the affected parents to children is 0.16. They plan to have two children. Probability of both the children having “y” trait is [AIIMS Dec 1994]
(a) Zero
(b) 0.16
(c) 0.32
(d) 0.0256

A population study showed a mean glucose of 86 mg/ dL. In a sample of 100 showing normal curve distribution, what percentage of people have glucose above 86mg/ dL [AIIMS Dec 94]
(a) 34
(b) 50
(c) NIL
(d) 68

How much of the sample is included in 1.95 SD? [AIIMS May 1995]
(a) 99%
(b) 95%
(c) 68%
(d) 65%

Square root of p1q1/n1 + p2q2/n2 is a measure of [AIIMS Dec 1995]
(a) Mean
(b) Standard error of difference between two means
(c) Standard error of difference between two proportions
(d) Normal deviate

Histogram is used to describe: [AIIMS Dec 1995]
(a) Quantitative data of a group of patients
(b) Qualitative data of a group of patients
(c) Data collected on nominal scale
(d) Data collected on ordinal scale

If 60 values are arranged in ascending order, middle value is [AIIMS Dec 1995]
(a) Arithmetic Mean
(b) Median
(c) 30th percentile
(d) 31st percentile

50th percentile is equivalent to [AIIMS Sep 1996]
(a) Mean
(b) Median
(c) Mode
(d) Range

A normal distribution curve depends on [AIIMS Feb 1997]
(a) Mean and sample size
(b) Range and sample size
(c) Mean and standard deviation
(d) Mean and median

In a drug trial A 50 yr old patient with CAD is being interviewed about his dietary & smoking habits. The possible bias that might be introduced might be: [AIIMS Feb 1997]
(a) Selection bias
(b) Berkesonian bias
(c) Recall bias
(d) No possibility of bias

The Correlation Coefficient between Smoking & Lung Cancer was found to be 1.4. This indicates
(a) Weak correlation [AIIMS Feb 1997]
(b) Moderate correlation
(c) Strong correlation
(d) Mistake in calculation

A Scatter diagram is drawn to study: [AIIMS June 1997]
(a) Trend of a variable over a period of time
(b) Frequency of occurrence of events
(c) Mean & median values of the given data
(d) Relationship between two given variables

Which of the following is not true about ‘correlation’? [AIIMS June 97]
(a) It indicates degree of association between two characteristics
(b) Correlation coefficient of 1 means that the two variables exhibit linear relationship
(c) Correlation can measure risk
(d) Causation implies correlation


If we know the value of one variable in an individual & wish to know the value of another variable, we calculate - [AIIMS June 1997]
(a) Coefficient of correlation
(b) Coefficient of regression
(c) SE of mean
(d) Geometric mean


A cardiologist wants to study the effect of an antihypertensive drug. He notes down the initial systolic  blood pressure (mmHg) of 50 patients and then administers the drug on them. After a week’s treatment, he measures the following is the most appropriate statistical test of significance to test the statistical significance of the change in blood pressure
[AIIMS June 1997, AIIMS May 1995, AIIMS Nov 2004]
(a) Paired t-test
(b) Unpaired or independent t-test
(c) Analysis of variance
(d) Chi-square test

Not required for Chi-square test is [AIIMS Dec 1997]
(a) Mean & SD of the groups
(b) Each expected cell frequency > 5
(c) Large sample
(d) Contingency Table

The mean B.P. of a group of persons was determined and after an interventional trial, the mean BP was estimated again. The best test to be applied to determine the significance of intervention is
(a) Chi-square [AIIMS Dec 1997]
(b) Paired ‘t’ test
(c) Correlation coefficient
(d) t-test

Study finds a correlation coefficient of + 0.7 between self reported work satisfaction & expectancy of life in a random sample of 5000 corporate workers. (p = 0.01). This means that [AIIMS Dec 1997]
(a) Work satisfaction improves life expectancy
(b) Strong statistically significant (+) association between work satisfaction and life expectancy
(c) 70% people who enjoy work shall live longer
(d) 70% association between work satisfaction & life expectancy

Not true about Chi-square test is [AIIMS June 99]
(a) Tests the significance of difference between two proportions
(b) Tells about presence or absence of an association between two variables
(c) Directly measures the strength of association
(d) Can be used when more than two groups are to be compared

In a bimodal series, if mean is 2 and median is 3, what is the mode? [AIIMS June 99]
(a) 5
(b) 2.5
(c) 4
(d) 3

The standard normal distribution [AIIMS Nov 99]
(a) Is skewed to the left
(b) Has mean = 1.0
(c) Has standard deviation = 0.0
(d) Has variance = 1.0

An investigator into the life expectancy of IV drug abusers divides a sample of patients into HIV- positive and HIV-negative groups. What type of data does this division constitute?
[AIIMS June 2000]
(a) Nominal
(b) Ordinal
(c) Interval
(d) Ratio

P-value is the probability of [AIIMS June 2000]
(a) Not rejecting a null hypothesis when true
(b) Rejecting a null hypothesis when true
(c) Not rejecting a null hypothesis when false
(d) Rejecting a null hypothesis when false

A lecturer states that the correlation coefficient between prefrontal blood flow under cognitive load and the severity of psychotic symptoms in schizophrenic patients is – 1.24. You can therefore conclude that [AIIMS June 2000]
(a) Pre-frontal blood flow under cognitive load is a good predictor of the severity of psychotic symptoms in schizophrenic patients
(b) Prefrontal blood flow under cognitive load accounts for a large proportion of the variance in psychotic symptoms in schizophrenic patients
(c) Psychosis or schizophrenia is in some way a cause or partial cause of low prefrontal blood flow under cognitive load 863 Biostatistics Biostatistics Biostatistics
(d) The lecturer has reported the correlation coefficient incorrectly

Central value of a set of 180 values can be obtained by [AIIMS Nov 2000]
(b) 90th percentile
(a) 2nd tertile
(c) 9th decile
(d) 2nd quartile

The number of malaria cases reported during the last 10 years in a town is given below, 250, 320, 190, 300, 5000, 100, 260, 350, 320, and 160 The epidemiologist wants to find out the average number of malaria cases reported in that town during the last 10 years. The most appropriate measure of average for this data will be [AIIMS May 2001, AIIMS Nov 2004]
(a) Arithmetic mean
(b) Mode
(c) Median
(d) Geometric mean

In a particular trial, the association of lung cancer with smoking is found to be 40% in one sample and 60% in another. What is the best test to compare the results? [AIIMS May 2001]
(a) Chi Square Test
(b) Fischer Test
(c) Paired t Test
(d) ANOVA Test

What can be true regarding the coefficient of correlation between IMR and economic status?
(a) r = + 1 [AIIMS May 2001]
(b) r = – 1
(c) r = + 0.22
(d) r = – 0.8

Standard deviation of means measures  [AIIMS May 01]
(a) Non-sampling errors
(b) Sampling errors
(c) Random errors
(d) Conceptual errors

Among a 100 women with average Hb of 10 gm%, the standard deviation was 1, what is the standard error? [AIIMS May 01, 04, 07]
(a) 0.01
(b) 0.1
(c) 1
(d) 10

A study was undertaken to assess the effect of a drug in lowering serum cholesterol levels. 15 obese women and 10 non-obese women formed the 2 limbs of the study. Which test would be useful to correlate the results obtained?
(a) ANOVA test [AIIMS Nov 01]
(b) Student’s t-test
(c) Chi square test
(d) Fischer test

The incidence of malaria in an area is 20, 20, 50, 56, 60, 5000, 678, 898, 345, 456. Which of these methods is the best to calculate the average incidence?  [AIIMS Nov 01]
(a) Arithmetic mean
(b) Geometric mean
(c) Median
(d) Mode

A randomised trial comparing the efficacy of two drugs showed a difference between the two with a p  value of <0.005. In reality, however the two drugs do not differ. This therefore is an example of
(a) Type I error (alpha error) [AIIMS Nov 02]
(b) Type II error (beta error)
(c) 1 – a (alpha)
(d) 1 – b


A test which produces similar results when repeated, but values obtained are not close to actual/true value, is [AIIMS Nov 02]
(a) Precise but inaccurate
(b) Precise and accurate
(c) Imprecise and accurate
(d) Imprecise and inaccurate

When a diagnostic test is used in “series” mode, then [AIIMS Nov 02]
(a) Sensitivity increases but specificity decreases
(b) Specificity increases but sensitivity decreases
(c) Both sensitivity and specificity increase
(d) Both sensitivity and specificity decrease

The number of patients required in a clinical trial to treat a specify disease increases as
[AIIMS Nov 02]
(a) The incidence of the disease decreases
(b) The significance level increases
(c) The size of the expected treatment effect increased
(d) The drop-out rate increases

The usefulness of a screening test depends upon its- [AIIMS May 03]
(a) Sensitivity
(b) Specificity
(c) Reliability
(d) Predictive value

An investigator wants to study the association between maternal intake of iron supplements (Yes/ No)  and birth weights (in grams) of newborn babies. He collects relevant data from 100 pregnant women and their newborns. What statistical test of hypothesis would you advise for the investigator in this situation? [AIIMS May 03]
(a) Chi-Square test
(b) Unpaired or independent t-test
(c) Analysis of Variance
(d) Paired t-test

For testing the statistical significance of the difference in heights of school children
[AIIMS May 2003]
(a) Student’s ‘t’ test
(b) Chi-squared test
(c) Paired ‘t’ test
(d) One way analysis of variance (one way ANOVA)

The fasting blood levels of glucose for a group of diabetics is found to be normally distributed with a mean of 105 mg per 100 ml of blood and a standard deviation of 10 mg per 100 ml of blood. From this data is can be inferred that approximately 95% of diabetics will have their fasting blood glucose levels within the limits of: [AIIMS Nov 2003]
(a) 75 and 135 mgs
(b) 85 and 125 mgs
(c) 95 and 115 mgs
(d) 65 and 145 mgs

An investigator wants to study the association between maternal intake of iron supplements (Yes or No) and incidence of low birth weight (< 2500 or > 2500) grams). He collects relevant data from 100 pregnant women as to the status of usage of iron supplements and the status of low birth weight in their newborns. The appropriate statistical test of hypothesis advised in this situation is
[AIIMS Nov 03]
(a) Paired – t-test
(b) Unpaired or independent t-test
(c) Analysis of variance
(d) Chi – Square test

 Mean and standard deviation can be worked out only if data is on [AIIMS Nov 03, AIIMS May 05] (a) Interval/Ratio scale
(b) Dichotomous scale
(c) Nominal scale
(d) Ordinal scale

After applying a statistical test, an investigator gets the ‘P value’ as 0.01. it means that [AIIMS Nov 2003, AIIMS May 05, 08]
(a) The probability of finding a significant difference is 1%
(b) The probability of declaring a significant difference is 1%
(c) The difference is not significant 1% times and significant 99% times
(d) The power of the test used is 99%

Sampling method used in assessing immunization status of children under immunization program is (a) Systematic sampling [AIIMS May 2004]
(b) Stratified sampling
(c) Group sampling
(d) Cluster sampling

All are true Except - [AIIMS May 04]
(a) Alpha is the maximum tolerable probability of type-I error
(b) Beta is the probability of type-II error
(c) When Null Hypothesis is true but is rejected, it is Type-II error
(d) P-value can be more or less than alpha

Statistical Power of a trial is equal to  [AIIMS Nov 04]
(a) 1 + a
(b) 1 – b
(c) a + b
(d) a / b

In a 3 x 4 contingency tables, the number of degrees of freedom equals to [AIIMS Nov 2004]
(a) 1
(b) 5
(c) 6
(d) 12

In assessing the association between maternal nutritional status and the birth weight of the newborns, two investigators A and B studied separately and found significant results with p values 0.02 and 0.04 respectively. From this information, what can you infer about the magnitudes of association found by the two investigations? [AIIMS Nov 2004]
(a) The magnitude of association found by investigator A is more than that found by B
(b) The magnitude of association found by investigator B is more than that found by A
(c) The estimates of association obtained by A and B will be equal, since both are significant
(d) Nothing can be concluded as the information given is inadequate

Pearson or spearman coefficient is used for evaluation of: [AIIMS Nov 04]
(a) Differences in proportion
(b) Comparison of more than 2 means
(c) Comparison of variance
(d) Correlation

Sensitivity for a test ‘X’ is 0.90 and Specificity is .50. Prevalence of disease ‘Y’ in a population is 10%. Post-test probability of test ‘X’ when applied to population ‘Y’ is - [AIIMS May 05]
(a) 0.90
(b) 0.84
(c) 0.16
(d) 0.10

A bacterium can divide every 20 minutes. Beginning with a single individual, how many bacteria will  be there in the population if there is exponential growth for 3 hours? [AIIMS May 05]
(a) 18
(b) 440
(c) 512
(d) 1024

The distribution of random blood glucose measurements from 50 first year medical students was found to have a mean of 3.0 mmol/litre with a standard deviation of 3.0 mmol/litre. Which of the following is a correct statement about the shape of the distribution of random blood glucose in these first year medical students? [AIIMS Nov 2005]
(a) Since both mean and standard deviation are equal, it should be a symmetric distribution
(b) The distribution is likely to be positively skewed
(c) The distribution is likely to be negatively skewed
(d) Nothing can be said conclusively

A chest physician observed that the distribution of forced expiratory volume (FEV) in 300 smokers had a median value of 2.5 litres with the first and third quartiles being 1.5 and 4.5 litres respectively. Based on this data how many persons in the sample are expected to have a FEV between 1.5 and 4.5 litres? [AIIMS Nov 05]
(a) 7.5
(b) 150
(c) 225
(d) 300

If the distribution of intra-ocular pressure (IOP) seen in 100 glaucoma patients has an average 30 mm with a SD of 1.0, what is the lower limit of the average IOP that can be expected 95% of times? [AIIMS Nov 05]
(a) 28
(b) 26
(c) 32
(d) 259

In the WHO recommended EPI Cluster sampling for assessing primary immunization coverage, the age group of children to be surveyed is
(a) 0-12 months [AIIMS Nov 2005]
(b) 6-12 months
(c) 9-12 months
(d) 12-23 months

Height of group of 20 Boys aged 10 years was 140 + 13 cm & 20 girl of same age was 135 cm + 7cm to test the statistical significance of difference in height, test applicable is [AIIMS Nov 05]
(a) X2
(b) Z
(c) t
(d) F

Histogram is used to present which kind of the data: [AIIMS May 2006]
(a) Nominal
(b) Continuous
(c) Discrete
(d) Any of above

A randomised trial comparing efficacy of two regimens showed that difference is statistically significant with p<0.001 but in reality the two drugs do not differ in their efficacy. This is an example of- [AIIMS May 2006]
(a) Type-I error (a error)
(b) Type – II error (b error)
(c) 1-a
(d) 1-b

You have diagnosed a patient clinically as having SLE and ordered 6 tests. Out of which 4 tests have come positive and 2 are negative. To determine the probability of SLE at this point, you need to know- [AIIMS May 2006]
(a) Prior probability of SLE; sensitivity and specificity of each test
(b) Incidence of SLE and predictive value of each test
(c) Incidence and prevalence of SLE
(d) Relative risk of SLE in this patient

A diagnostic test for a particular disease has a sensitivity of 0.90 and a specificity of 0.80. A single test is applied to each subject in the population in which the diseased population is 30%. What is the probability that a person, negative to this test, has no disease? [AIIMS May 2006]
(a) Less than 50%
(b) 70%
(c) 95%
(d) 72%

In a given data, degree of freedom will be
Duration of developing AIDS Blood group  A  B  AB  O
0 – 5 years                                                     20 30  48   7
5 – 10 years                                                 110 12  37  12
10 – 15 years                                                 12   9    8    3
[AIIMS May 06]
(a) 12
(b) 6
(c) 9
(d) 20

If the birth weight of each of the 10 babies born in a hospital in a day is found to be 2.8 kg, then the standard deviation of this sample will be [AIIMS May 2006, Dec 97]
(a) 2.8
(b) 0
(c) 1
(d) 0.28

LJ chart is used for: [AIIMS May 07]
(a) Accuracy
(b) Precision
(c) Odds
(d) Likelihood ratio

Which is the best method to compare the results obtained by a new test and a gold standard test?
(a) Correlation study [AIIMS May 07]
(b) Regression study
(c) Bland and Altman analysis
(d) Kolmogorov-Smirnov test

Sensitivity of a screening test ‘X’ is 90 % while its specificity is 10 %. Likelihood ratio for a positive test is - [AIIMS May 07]
(a) 9.0
(b) 8.0
(c) 1.0
(d) 0.1

If a 95% Confidence Interval for prevalence of Cancer in Smokers aged >65 years is 56% to 76%, the chance that the prevalence could be less than 56% is [AIIMS May 07]
(a) Practically NIL
(b) 44%
(c) 2.5%
(d) 5%


In a group of 100 children, the mean weight of children is 15 kg. The standard deviation is 1.5 kg. Which one of the following is true? [AIIMS May 2007]
(a) 95% of all children weight between 12 and 18 kg
(b) 95% of all children weight between 13.5- and 16.5kg
(c) 99% of all children weight between 12 and 18 kg
(d) 99% of all children weight between 13.5 and 16.5kg

Which is the best distribution to study the daily admission of head injury patients in a trauma care centre? [AIIMS May 2008]
(a) Normal distribution
(b) Binomial distribution
(c) Uniform distribution
(d) Poisson distribution

Mean bone density amongst 2 group of 50 people each is compared, which would be the best test?
(a) Chi square [AIIMS May 2008]
(b) Student t test
(c) Mcnemar chi square test
(d) Fischer test

Association can be measured by all except
(a) Correlation coefficient [AIIMS May 2009]
(b) Cronbach’s alpha
(c) P value
(d) Odds ratio

The risk factor association of smoking with pancreatic cancer was studied in a case control study. The values are
Group  Odds ratio        95% Confidence limits
A             2.5                   1.0 – 3.1
B             1.4                   1.1 – 1.7
C             1.6                   0.9 – 1.7
Which of the following is correct [AIIMS Nov 09]
(a) Risk is more associated with Group A
(b) Risk is more associated with Group B
(c) Risk is more associated with Group C
(d) Risk is equally associated with all three groups

All of the following are true about Standard error except? [AIIMS Nov- 09]
(a) As the sample size increases, Standard error will also increase
(b) Based on Normal distribution
(c) It depends on Standard deviation of mean
(d) Is used to estimate confidence limit

In a study following interpretation are obtained: Satisfied, Very satisfied, Dissatisfied. Which type of scale is this? [AIIMS May 2010]
(a) Nominal
(b) Ordinal
(c) Interval
(d) Ratio

Which of the following is used to denote a continuous variable? [AIIMS May 2010]
(a) Simple bar
(b) Histogram
(c) Pie diagram
(d) Multiple bar

In a study following interpretation are obtained: Satisfied, Very satisfied, Dissatisfied. Which type of scale is this? [AIIMS May 2010]
(a) Nominal
(b) Ordinal
(c) Interval
(d) Ratio

True about cluster sampling all except [AIIMS May 2011]
(a) Sample size same as simple random
(b) It is two stage sampling
(c) Cheaper than other methods
(d) It is a method for rapid assessment

An investigator finds out that 5 independent factors influence the occurrence of a disease. Comparison of multiple factors that are responsible for the disease can be assessed by:
[AIIMS May 2011]
(a) ANOVA
(b) Multiple linear regression
(c) Chi-square test
(d) Multiple logistic regression

Method used for comparison of a new test with an available gold-standard test is
 [AIIMS November 2011]
(a) Regression analysis/Likelihood test
(b) Correlation analysis/Bland and Altmann test
(c) Baltin and Altimore method
(d) Kimorov and Samletor technique

In a study first schools are sampled, then sections, and finally students. This type of sampling is known as: [AIIMS November 2012]
(a) Stratified sampling
(b) Simple random sampling
(c) Cluster sampling
(d) Multistage sampling

50% population having disease with estimated prevalence to be 45-55% with 95% of probability of identifying them minimum sample size required is:
(a) 100 [AIIMS May 2013]
(b) 200
(c) 300
(d) 400

If confidence limit is increased, then: [AIIMS May 2013]
(a) Previously insignificant data becomes significant
(b) Previously significant data becomes insignificant
(c) No effect on significance
(d) Any change can happen

In a population of 100 prevalence of candida glabrata was found to be 80%. If the investigator has to repeat the prevalence with 95% confidence what will the prevalence be? [AIIMS May 2013]
(a) 78-82%
(b) 76-84%
(c) 72-88%
(d) 74-86%

How much population falls between median and median plus one standard deviation in a normal distribution? [AIIMS Nov 2013]
(a) 0.34
(b) 0.68
(c) 0.17
(d) 0.47

There is a population of 20000 people with mean haemoglobin being 13.5 gm% having a normal distribution. What proportion of population constitutes proportion more than 13.5 gm%?
[AIIMS Nov 2013]
(a) 0.25
(b) 0.50
(c) 1
(d) 0.34

Q-test is used for detecting: [AIIMS Nov 2013]
(a) Outliers
(b) Interquartile range
(c) Difference of means
(d) Difference of proportions

ANSWERS ARE IN RED!
Creative Commons License
PSM / COMMUNITY MEDICINE by Dr Abhishek Jaiswal is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Based on a work at learnpsm@blogspot.com.
Permissions beyond the scope of this license may be available at jaiswal.fph@gmail.com.

Type I error and type II error simplified


Type I error and type II error simplified
Creative Commons License
PSM / COMMUNITY MEDICINE by Dr Abhishek Jaiswal is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Based on a work at learnpsm@blogspot.com.
Permissions beyond the scope of this license may be available at jaiswal.fph@gmail.com.

Post hoc tests



Post-hoc in latin means "after this". Post-hoc tests are used to analyse the result of the experimental data. They are based on family-wise error rate. 

Family-wise error rate: It is the probability of making at least one Type I Error, when we are performing multiple simultaneous tests. It is also known as alpha inflation or Cumulative Type I error. 

The most common post-hoc tests are: 

  • Bonferroni procedure 
  • Duncan's new multiple range test 
  • Dunn's multiple comparison test 
  • Fisher's least significant difference 
  • Holm-Bonferroni procedure 
  • Newman-Keuls 
  • Rodger's method 
  • Scheffe's Method 
  • Tukey's test 
  • Dunnett's correction 
  • Benjamin-Hochberg procedure


Bonferroni Procedure
This multiple-comparison post-hoc correction is used when you are performing many independent or dependent statistical tests at the same time. The problem with running many simultaneous tests is that the probability of a significant result increases with each test run. This post-hoc test sets the significance cut off at α/n. 
Imagine looking for the Ace of Clubs in a deck of cards: if you pull one card from the deck, the odds are pretty low (1/52) that you’ll get the Ace of Clubs. Try again (and try perhaps 50 times), you’ll probably end up getting the Ace. The same principal works with hypothesis testing: the more simultaneous tests you run, the more likely you’ll get a “significant” result. Let’s say you were running 50 tests simultaneously with an alpha level of 0.05. The probability of observing at least one significant event due to chance alone is:
P (significant event) = 1 – P(no significant event)
= 1 – (1-0.05)50 = 0.92.
That’s almost certain (92%) that you’ll get at least one significant result.

Holm-Bonferroni Method
The ordinary Bonferroni method is sometimes viewed as too conservative. Holm’s sequential Bonferroni post-hoc test is a less strict correction for multiple comparisons. 


Duncan’s new multiple range test (MRT)
When you run Analysis of Variance (ANOVA), the results will tell you if there is a difference in means. However, it won’t pinpoint the pairs of means that are different. Duncan’s Multiple Range Test will identify the pairs of means (from at least three) that differ. The MRT is similar to the LSD, but instead of a t-value, a Q Value is used.
Fisher’s Least Significant Difference (LSD)
A tool to identify which pairs of means are statistically different. Essentially the same as Duncan’s MRT, but with t-values instead of Q values. 
Newman-Keuls
Like Tukey’s, this post-hoc test identifies sample means that are different from each other. Newman-Keuls uses different critical values for comparing pairs of means. Therefore, it is more likely to find significant differences.
Rodger’s Method
Considered by some to be the most powerful post-hoc test for detecting differences among groups. This test protects against loss of statistical power as the degrees of freedom increase.
ScheffĂ©’s Method
Used when you want to look at post-hoc comparisons in general (as opposed to just pairwise comparisons). Scheffe’s controls for the overall confidence level. It is customarily used with unequal sample sizes.
Tukey’s Test
The purpose of Tukey’s test is to figure out which groups in your sample differ. It uses the “Honest Significant Difference,” a number that represents the distance between groups, to compare every mean with every other mean.
Dunnett’s correction
Like Tukey’s this post-hoc test is used to compare means. Unlike Tukey’s, it compares every mean to a control mean. 
Benjamin-Hochberg (BH) procedure
If you perform a very large amount of tests, one or more of the tests will have a significant result purely by chance alone.
The Benjamini-Hochberg Procedure is a powerful tool that decreases the false discovery rate.
The false discovery rate (FDR) is the expected proportion of type I errors. 
Adjusting the rate helps to control for the fact that sometimes small p-values (less than 5%) happen by chance, which could lead you to incorrectly reject the true null hypotheses. In other words, the B-H Procedure helps you to avoid Type I errors (false positives).
A p-value of 5% means that there’s only a 5% chance that you would get your observed result if the null hypothesis were true. In other words, if you get a p-value of 5%, it’s highly unlikely that your null hypothesis is not true and should be thrown out. But it’s only a probability–many times, true null hypotheses are thrown out just because of the randomness of results.
example: Let’s say you have a group of 100 patients who you know are free of a certain disease. Your null hypothesis is that the patients are free of disease and your alternate is that they do have the disease. If you ran 100 statistical tests at the 5% alpha level, roughly 5% of results would report as false positives.
There’s not a lot you can do to avoid this: when you run statistical tests, a fraction will always be false positives.However, running the B-H procedure will decrease the number of false positives.
  1. Put the individual p-values in ascending order.
  2. Assign ranks to the p-values. For example, the smallest has a rank of 1, the second smallest has a rank of 2.
  3. Calculate each individual p-value’s Benjamini-Hochberg critical value, using the formula (i/m)Q, where:
    • i = the individual p-value’s rank,
    • m = total number of tests,
    • Q = the false discovery rate (a percentage, chosen by you).
  4. Compare your original p-values to the critical B-H from Step 3; find the largest p value that is smaller than the critical value.
Creative Commons License
PSM / COMMUNITY MEDICINE by Dr Abhishek Jaiswal is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Based on a work at learnpsm@blogspot.com.
Permissions beyond the scope of this license may be available at jaiswal.fph@gmail.com.

Statistics

Statistics is a field of study concerned with (1) the collection, organization summarization, and analysis of data; (2) the drawing of inference about a body of data when only a part of data is observed.

A descriptive measure computed from the data of a sample is called a statistic


A descriptive measure computed from the data of a population is called a parameter
Variable: A characteristic that takes different values in different persons, places or things.

Quantitative variable: that can be measured in the usual sense. Measurement convey information about amount.

Qualitative variable: measurement consist of categorization. Measurement convey information regarding attribute.

Random variable: when the values arise as a result of chance factor, so they cannot be predicted in advance 

Discrete variable: is characterized by gaps or interruptions in the values that it can assume 

Continuous variable: doesn’t possess the gaps or interruptions characteristic of a discrete variable

Population: largest collection of entities for which we have an interest at a particular time

Sample: a part of the population that we took for studying 

Measurement: assignment of numbers to objects or events according to as set of rules. Measurement may be carried out under different sets of rules

Measurement Scale: 

Nominal scale: naming the observations or classifying them into various mutually exclusive and collective exhaustive categories

Ordinal scale: when observations are not only from different categories but also can be ranked according to some criteria 

Interval scale: in addition to ordering the measurement we can also know the distance between the two measurements
Interval scale unlike the nominal and ordinal scales is a truly quantitative scale

Ratio scale: highest level of measurement. Equality of ratios as well as equality of the intervals may be determined. 

Fundamental to the ratio scale is true zero point


Simple random sample: If a sample of size “n” is drawn from a population of size “N” in such a way that every possible sample of size “n” has the same chance of being selected, the sample is called simple random sampling
As a rule, in practice, sampling is always done without replacement.

Systematic sampling: first we calculate the total number required for the sample, a random number table is then used to give a starting number (x). A second number determined by the sample size is selected to define the sampling interval (k). Now we select individuals in this way
x, x+k, x+2k, x+3k, …….


Stratified random sampling: population is stratified into strata. And a random sampling is taken in each strata.
Creative Commons License
PSM / COMMUNITY MEDICINE by Dr Abhishek Jaiswal is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Based on a work at learnpsm@blogspot.com.
Permissions beyond the scope of this license may be available at jaiswal.fph@gmail.com.

Box and whisker plots (Boxplot)


Box and whisker plots (Boxplot):

Represents the variable of interest on horizontal axis
A box is drawn such a way that left end of box align with Q1, and the right end align with Q3.
Divide the box into two parts by a vertical line that aligns with the median Q2
Draw a horizontal line called a whisker from the left end of the box to the point that align with the smallest measurement of the data set
Draw another horizontal line or whisker from the right end of the box to the point that align with the largest measurement of the data set






Outliers: it is an observation whose value, “x”, either exceeds the value of the third quartile by a magnitude greater than 1.5(IQR) or is less than the value of first quartile by a magnitude greater than 1.5(IQR).

That is

{Q1- 1.5(IQR)} > x > {Q3 + 1.5(IQR)}

Creative Commons License
PSM / COMMUNITY MEDICINE by Dr Abhishek Jaiswal is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Based on a work at learnpsm@blogspot.com.
Permissions beyond the scope of this license may be available at jaiswal.fph@gmail.com.

Stem and leaf


Stem and leaf: we partition each measurement into two parts. The first part is called the stem, the second is called the leaf. The stem consists of one or more of the initial digits of the measurement, the leaf is composed of one or more of the remaining digits. All the partitioned numbers are shown together in a single display; the stems form an ordered column with the smallest stem at the top and the largest at the bottom. We include in the stem column all stems within the range of the data even when a measurement with that stem is not in the data set. The rows of display contain the leaves, ordered and listed to the right of their respective stems. When leaves consist of more than one digit, all digits after the first may be deleted. Decimal when present in data is omitted in the stem and leaf display. The stems are separated from their leaves by a vertical line.

An advantage of it over histogram is that it preserves the information contained in the individual measurements.
Are most effective with relatively small data sets

Creative Commons License
PSM / COMMUNITY MEDICINE by Dr Abhishek Jaiswal is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Based on a work at learnpsm@blogspot.com.
Permissions beyond the scope of this license may be available at jaiswal.fph@gmail.com.

Meta-analysis


Meta-analysis:

Effect size: The effect size is a value which reflects the magnitude of the treatment effect or the strength of a relationship between two variables, is the unit of currency in meta-analysis. (Black square)
Precision: It is the C.I. of the effect-size.
Study weight: It is the weight assigned to each study. The weight assigned is dependent on the precision. (The larger the square the larger the study weight)
Summary effect: It is the weighted mean of the individual effects. The mechanism used to assign the weights depends on our assumptions about the distribution of effect sizes from which the studies were sampled. Under the fixed-effect model, the assumption is that all the studies in the analysis share the same true effect size. The summary effect then is the estimate of this common effect size. Under the random-effect model, the assumption is true effect size varies from study to study. The summary effect here will be the mean of the distribution of effect sizes. (Diamond)
Precision: The location of the diamond represents the effect size. Its width reflects the precision of the estimate.

In fixed effect model: the weight of individual study is reciprocal of that study’s variance




In Random-effect model: We assume that the true effect is normally distributed


To be continued...

Creative Commons License
PSM / COMMUNITY MEDICINE by Dr Abhishek Jaiswal is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Based on a work at learnpsm@blogspot.com.
Permissions beyond the scope of this license may be available at jaiswal.fph@gmail.com.