Fundamentals of Research
Overview
The entire research turns around the null hypothesis(H0) and alternative hypothesis (Ha). A research hypothesis is development of the research question (s), including a supportive hypothesis and objectives created by researchers. Alternative Hypothesis (Ha) is the hypothesis that is believed to be true by the researcher; however, it may or may not be accepted.
Null hypothesis(H0) is accepted when NO statistical difference is detected between treated group (p-value > 0.05 (or 5%)). On the other hand, alternative hypothesis (Ha) is accepted (p-value < 0.05) when Null hypothesis(H0) is rejected as Ha is the opposite of H0 (p-value < 0.05 (or 5%)). For example, a drug manufacturer testing a new calcium channel blocker. They are comparing its blood pressure lowering ability against placebo. The null hypothesis (Ho) would be that there is no difference between the two groups. The alternative hypothesis (Ha) would be that there is a difference in blood pressure. If the researchers found that new calcium channel blocker significantly lowers blood pressure more than placebo with a p-value of 0.002. So, we would reject the null hypothesis in this case and accept the alternative hypothesis (Ha).
State the type or direction of the effect. A two-tailed test is when studying the differences in either direction at the same time. In other words, the 2-tailed tests for the possibility of positive or negative differences (e.g., increase and decrease at the same time). In contrast, the one-tailed test is appropriate if you only want to determine if there is a difference between groups in a specific direction (e.g., we test for reduction). Should note that two-tailed test requires larger sample size compared with one-tailed test. For example, if we study drug X for blood sugar level (BSL), and the alternative hypothesis is drug X differences on BSL, then we will use a two-tailed test as there is a possibility of positive or negative differences (e.g., increase and decrease at the same time). On the other hand, if we study drug X for blood sugar level (BSL), and the alternative hypothesis is that drug X reduces BSL, then we will use a one-tailed test as we are studying only one possibility (reduction only).
Remember always, every research study possesses its limitations, regardless of how carefully it was conducted. As possible, you should minimize the limitations. Also, you should be able to understand any study limitation (s) to apply its results to medical practice.
To assess a research question, the researcher should ask, “is it feasible, interesting, novel, ethical, relevant”. These are known as the FINER criteria.
The PICOT format is a useful format to refer to the four components of a good research questions that will facilitate the identification of relevant information.
(P) Population of interest
(I) Intervention being studied.
(C) Comparison group (or to what is the intervention being compared).
(O) Outcome (dependent variable) of interest.
(T) Time for follow-up
Type of Errors
There are 2 potential errors are commonly recognized when testing a hypothesis:
Type 1 error, represented by p-value (α-value). In which, A p-value is the likelihood of obtaining a statistical result by chance, assuming there is no difference between the treatments being investigated. Alpha represents an acceptable probability of a Type I error in a statistical test. Because alpha (α-value) corresponds to a probability, it can range from 0 to 1. In practice, 0.01, and 0.05 are the most commonly used values for alpha (α-value), representing a 1%, and 5% chance of a Type I error occurring. The lower the p-value (α-value) the less likely the result happened by chance, and the more likely the result can be attributed to the drug being tested. Therefore, p-value <0.05 (or 5%) is mean that there is less than 5% chance that type 1 error is committed. However, P-values can have limitations as it depend upon both the magnitude of association and the precision of the estimate (the sample size). A more informative than a p-value is confidence interval because they provide insight into the adequacy of the sample size to detect the association
Type 2 error, represented by β-value. In which, β-value is the opposite for α value. Therefore, Type 2 error, is the chance NOT detecting a difference when one exists. This type of error can be overcome by increase the sample size or increasing the time of study. The β-value should be ≥ 80% to increase the power of study as “Power=1- β-value”. Therefore, β-value should be less than 20% (0.2) as elevation in β-value indicate higher probability to have type 2 error. “The closer β-value to 1, will worsen the power of study”.
Example:
A drug manufacturer has been new calcium channel blocker to detect a difference on lowering ability against versus placebo. The study is only powered to 60%. The study did not prove a statistical difference in depression scores versus placebo with a p-value of 0.16. Deriving Beta from the Power, β-value = 0.4. There was a 40% chance that we committed a Type 2 error. Or stated another way, there was a 40% chance that new calcium channel blocker did have an effect, but we were unable to prove it. Therefore, this type of error can be overcome by increase the sample size or increasing the time of study.
Internal and external validity in research
Validity of a study is a general issue of whether or not there are imperfections in the study design, data collection, or methods of data analyses that might distort the conclusions about an exposure-disease relationship. Validity of the study can be classified as internal and external validity.
Internal validity: Internal validity is the degree of certainty that the causal correlation/association being tested is trustworthy and unaffected by other factors or variables. A study is considered valid when these three alternative explanations have been eliminated: Bias (systemic error) (e.g. selection bias, information bias), confounding, and random error (Precision issue).
External validity: External validity refers to the extent to which results from a study can be applied (generalized) to different situations, groups, events or times.
Should note that when evaluating the bias in a study, it is essential to assess its source, strength, and direction. Bias can pull an estimate towards or away from the null. The most common types of bias are selection and information bias. Misclassification (Measurement error) is the most common type of information bias that occuring during the data collection.
Random error is a chance difference between the observed and true values of something (e.g., a researcher misreading a weighing scale, records an incorrect measurement). Random error mainly affects precision, which is how reproducible the same measurement is under equivalent circumstances. In contrast, systematic error affects the of a measurement, or how close the observed value is to the true value. Factors that contribute to random error: Observer variability, imprecise definition, instrument variability, lack of instrument sensitivity and sampling error.
Bias
The outcome of a study can be knowingly or unknowingly influenced by bias. In statistical terms, bias can cause an under or overestimation of an effect. The most common types of bias include selection, observation, recall, and misclassification bias. Remember always that there is little that can be done to correct bias after it has occurred; researchers must avoid it by properly designing and conducting the study.
Selection bias: “Selecting” or choosing your sample and making it misrepresentative or making it disproportionate. Example, a study of medication on a blood pressure reduction, they put all the patients with blood pressure greater than 160/100 into the treatment group and any one with blood pressure less than 160/100 in the placebo group. There is a strong possibility that blood pressure would drop very significantly in the higher BP group due to non-randomization of the study participants.
Observation bias: This type of bias involves the people doing the investigating. Observation bias is when the investigator sees an event that is actually occurring or exaggerates an event that has occurred. Therefore in studies we “blind” the investigators, to try to control for this type of bias.
Recall bias: Recall bias happens when a study participant tries to remember or “recall” an event that is being studied. This type of bias usually seen in survey.
Misclassification bias: When put study participants into groups that they shouldn’t have been. Example, if a study participant has a given disease state and is put into the non-disease state category.
Confounder (s)
Confounder (s):
Is the mixing of effects between an exposure, an outcome, and a third variable, known as “confounder”. A confounder is defined as a risk factor for the outcome of interest that is associated with the exposure but that is not part of the causal pathway. Also, can be defined as a something outside your thing you are studying that is having an impact on your result. Confounder (s) is one of the most important factors that affect the accuracy of the results.
Confounding can either exaggerate “positive confounding” or minimize “negative confounding” the true association.
Confounders can be determined through the literature review or prior knowledge. We can control the confounders through study design (e.g., inclusion/exclusion criteria, matching, randomization), analysis (adjusting for the confounders), or both approaches.
Should note that confounding is NOT an error in the study but an actual phenomenon. However, NOT controlling known confounder (s) is an error. It can be very difficult to account for every possible confounder when doing research with people, but researchers must try to account for anything that could influence their results when planning their research and analysing their data. Confounders have the potential to change the results of research because they can influence the outcomes that the researchers are measuring. Failure to identify or account for confounders may lead to erroneous results.
Common confounding variables encountered in critical illness include demographic variables (e.g., age), severity of illness (e.g., APACHE II score), admission diagnosis (e.g., sepsis vs. septic shock), therapies provided (e.g., vasopressors, mechanical ventilation), and concomitant disease states.
Randomized controlled trial (RCTs) are generally the only type of study that can adequately control for unmeasured confounders and are generally the best evidence for proving causality.
Residual confounding: Confounding that remains even after many confounding variables have been controlled. Reasons for residual confounders: Uncollected data, mismeasurement of confounder, and/or persistent differences in risk within a category of a confounder.
Confounder (s) criteria:
Associated with the exposure.
An independent cause of the disease (outcome).
It can’t be an intermediate step in the casual pathway between exposure and disease.
Assessment of confounding:
Magnitude or extent of confounding.
The direction of confounding (can be either positive (overestimate) or negative (underestimate) confounding).
Magnitude of confounding= [(crude estimate-adjusted estimate)/Adjusted estimate ]x100
Example: If we are studying the association between coffee and lung cancer.
Estimate Coffee-Lung cancer= 5 (not adjusted for smoking).
Estimate Coffee-Lung cancer= 1.2 (adjusted for smoking).
Magnitude of confounding= [(crude estimate-adjusted estimate)/Adjusted estimate ]x100. If we apply the equation, Magnitude of confounding= ((5-1.2)/1.2))x100=316%. Therefore, smoking is considered a significant confounder, and we should adjust for it.
Categorical versus Continuous Variables
Data comes in several different types. There are four measurement scales (or types of data): nominal, ordinal, interval, and ratio that can be classified as categorical (nominal, ordinal) or continuous variables (interval and ratio).
Knowing these types of variables (nominal, ordinal, ratio, and interval) is critical as it helps to identifying what statistical test is going to be appropriate. Categorical variables were reported using numbers and percentages, whereas continuous variables reported using means with standard deviation (SD) or medians with interquartile range (IQR) when appropriate. For example, Chisquare or Fisher exact test are used for categorical variables, and Mann–Whitney U test or t-test (student test) for normally distributed numerical variables.
A categorical variable cannot have any value between two points. On the other hand, continuous variables have a distinct, measurable distance between each value. They are called continuous because you can have an infinite number of values between two points.
When you dealing with continuous variables, there STATISTICS THAT DESCRIBE HOW DATA ARE DISTRIBUTED which include measures of central tendency (i.e. mean, median, mode) , and measures of dispersion (i.e. standard deviation (SD), Range, are all important to know and to calculate: In normal distribution, the mean, median and mode are all equivalent “classic bell-shaped curve”. However, in skewed data they are not. Therefore, median is preferred than mean to measure central tendency, as some huge or even small numbers will affect the central tendency, as this huge or small numbers will skew the mean.
In negative skewed graph, the long tail point toward zero and vice versa.
Categorical Variables
Categorical variables can be reported as frequency, percentage, and ratio/Proportion.
Frequency is the count of a given outcome or in each category. While the percentage is the count of a given outcome per hundred showing the proportion of each category out of the total.
Graphical presentation for categorical variables: Bar chart or Pie chart.
Categorical variables can be divided into nominal or ordinal:
Nominal variables: Nominal scales are used for name “or labeling” variables, without any quantitative value. You can “name” them to a group or category? -can you answer, “yes or no”?, and there is no order to the variables. Examples for nominal variables include: Gender (Male or female), marital status (married, single, divorced), alcoholic versus non-alcoholic, smokers versus non-smokers, diabetic versus non-diabetic, blood type of a person (A, B, AB, or O), and hair color (black, brown, …). The test used for ≥ 2 variables → Chi-squared (X2); while for correlation between two paired samples → contingency coefficient.
Ordinal variables: Ordinal, think “order”. There is a specific order, but they do not have a clear and easily interpreted difference between each value. There isn’t a uniform/objective distance between the variables for every patient. Examples for Ordinal variable: Pain scale of 1-10, satisfaction is assessed using very poor, poor, fair, good, and very good as survey responses. The test used for sample independent → Mann-Whitney U-test; sample dependent → Wilcoxon signed rank test; while for correlation between two paired samples → spearman correlation coefficient.
Continuous (Parametric) Variables
Continuous variables can be divided to ratio or interval scale:
Ratio scale: Have absolute ZERO, in which if there ZERO, it means the absence of measurement. Ability to calculate ratios since a “true zero” can be defined. Examples include: Weight, Length and temperature in Kelvin.
Interval scale: Interval Scale, much like ratio scale is a type of continuous variable and has set differences between each value. However, it doesn’t have absolute ZERO, in which if there ZERO, it doesn’t mean the absence of measurement. With interval data, we can add and subtract, but cannot multiply or divide. Example include temperature in celsius.
The test used for ≤ 2 variables → t-test (Student’s t-test); > 2 variables → analysis of variance (ANOVA); while for correlation between two paired samples → Pearson correlation coefficient.
Correlation & Causation
Correlation is a term used in statistics that demonstrates that two variables are associated with one another. This association can vary from no correlation to perfect correlation. When two variables correlate with one another, it is assumed that they will have a linear relationship. Always remember that association is NOT causation is perhaps the most important lesson one learns in a statistics class. "Correlation does not imply causation", this phrase is commonly used in statistics and studies. It's mean the inability to deduce a cause-and-effect relationship between two variables solely on the basis of an observed association or correlation between them. To understand this phrase, a common example is used. The icecream and sunburn in summer, as sun is hot and dry in summer, the numbers of icecream consumption will increased as well as the sunburns. So, researcher will conclude correlation that increasing in icecream consumption will increase the risk of sunburn!!!, this will be a correlation not causation. Therefore, it's important to determine confounders (summer, sun) and select the best study method.
Comparison is a crucial requirement to determine association. Ideal “counterfactual” comparison does not exist in human research. The extent to which the baseline characteristics are similar between comparative groups determines the degree of confidence about the causal association.
It is crucial to identify the direction of causality (association). As if not assessed carefully, it will lead to misinterpretation of the results. In addition, should differentiate between causal Vs. non-causal association. For example, if we studied coffee administration (A) and its association with lung cancer (B), we may find an association. However, it's well known that most smokers (C) drink coffee; thus, this is considered a non-causal association as A is not causally associated with B.
The correlation coefficient (denoted as “r”) values will range from -1 to 1. If you have a correlation coefficient value of -1, this is also called an inverse relationship. On the other hand, 1 is perfect correlation. While if correlation coefficient (r) close to 0 essentially means there is no relationship between the two variables.
R-squared is a value between 0 – 1 which tells us the confidence that our model will predict where values will lie. R2 value of close to 0 = no correlation. On the other hand, R2 value of close to 1 equates to strong correction.
Hill’s Criteria of Causation:
1. Strength of association.
2. Consistency.
3. Specificity.
4. Temporality.
5. Dose-response relationship (gradient).
6. Plausibility (agrees with currently accepted understanding of pathological processes).
7. Coherence (compatible with existing theory and knowledge).
8. Experimental evidence.
9. Consideration of alternate Explanations.
Central tendency
To assess the “middle” or “central” portion of the data.
Three measures of central tendency are most frequently used to describe data include mean, median, and mode:
Mean: Equals the sum of observations divided by the number of observations.
Median: Equals the observation in the middle when all observations are ordered from smallest to largest if odds number. While if even number, mean of the middle two data points. Median is the value that holds 50% of the data above it and 50% of the data below it.
Mode: Equals the observation that occurs most frequently. Mode is the value that appears the most frequent.
Choosing mean, median, or mode is mainly based on the scale of measurement (ordinal or numerical) and on the shape of the distribution of observations (normal, positive skewed, or negative skewed). Mean is used when data is numerical & has symmetric distribution; while the median is used when data is an ordinal or numerical and skewed distribution. Lastly, the mode is used when data has a bimodal distribution.
For example, the mean, median, and mode for the following data: 5, 6, 9, 5, 6, 7, 2, 3 will be as follows: Mean=5+6+9+5+6+7+2+3/8=5.3; Median, will order data from smallest to largest: 23556679. Therefore, as it’s even number, the median is 5+6/2=5.5 and lastly, the Mode is bimodal (5 & 6).
Dispersion
Dispersion: Degree to which data are spread around a specific value (i.e., mean).
Range: Equals the difference between the largest and smallest observation. The range is used to emphasize extreme values.
Percentile: Equals the percentage of a distribution that is below a specific value (for median). Percentile is used when reporting the median or mean is used but wants to compare to norms.
InterQuartile Range (IQR): This is the difference between 25th (1st Q) and 75th (3rd Q). IQR is used when reporting the median or mean is used but wants to compare to norms.
Standard deviation (SD): Standard deviation is a measure of the range or dispersion of the data. It’s measuring the variability of data around the mean. In addition, It’s an important measure in the normal distribution. Standard deviation (SD) is used when reporting the mean. Standard deviation can be smaller with larger sample size. Data with a larger standard deviation will have a flatter bell curve. On the other hand, smaller standard deviation will have a sharp curve. In a normal distribution, one standard deviation away from the mean in each direction contains about 68% of the data, and two standard deviations contain 95% of the data.
Sample Variance is the square of the standard deviation (SD).
Dependent versus Independent Variables
Variables can be either quantitative (Numerical) or qualitative (Categorical). Numerical variables can be classified into continuous, or discrete variables. On the other hand, categorical variables can be classified as either ordinal, nominal, or binary.
This difference between knowing what an independent and dependent variable is very important. This will help you to determine the appropriate statistical test to be used.
Independent variables are variables that are set by the researcher. These variables are the ones that “researchers” can control. On the other hand, dependent variables (outcomes) are variables that will DEPEND upon the independent variables that can NOT be controlled by “researchers”.
For example, researchers are going to study the effect of various proton pump inhibitors (PPI) on the developing osteoporosis in women. They will use the bone density test score and place patients in 2 groups. Group 1 will be patients on 40 mg esomeprazole, group 2 will be patients on 40 mg pantoprazole. The independent variables are esomeprazole & pantoprazole use. The dependent variable is what DEPENDS on the independent variables. We do not control the bone density test score, so that will be the dependent variable.
Prevalence versus incidence
The terms The terms prevalence and incidence are commonly used interchangeably. However, there are principal differences between the two terms. The proportion of people who have a condition at or during a specific time period is referred to as prevalence, whereas the proportion or rate of people who acquire a condition during the same time period is referred to as incidence. are commonly used interchangeably. However, there are principal differences between the two terms. The proportion of people who have a condition at or during a specific time period is referred to as prevalence, whereas the proportion or rate of people who acquire a condition during the same time period is referred to as incidence.
Prevalence differs from incidence proportion as prevalence includes all cases (new and pre-existing cases) in the population at the specified time, whereas incidence is limited to new cases only.
Prevalence: Prevalence refers to the actual number of individuals with a given disease at a given point in time divided by the population at risk at that point in time. Prevalence can be either during a period of time (period prevalence) or at a particular date in time (point prevalence).
Prevalence usually obtained from cross-sectional studies or disease registers. For prevalence, we need a numerator (number of existing cases as old or new), and denominator (total sample size), and a time period of interest. The time period should be specified as much as possible (e.g. over 3 years). Prevalence is an appropriate measure of the burden of a relatively stable chronic condition (e.g., DM, HTN). Limitations of prevalence include: Can not determine when the disease developed or the duration of the disease. Prevalence is affected by incidence and duration. As If a disease has long duration (e.g. DM), the prevalence > incidence in general.
Incidence also known as incidence rate often stated per 100,000 population per year. Incidence rate measures the rate of new disease occurrence over time (speed). As it is measures the rapidity with which newly diagnosed cases of the disease of interest develop. For incidence, we need a numerator (number of NEW cases ONLY), and denominator (population at risk of becoming a new case), and should specified over a specific time period. Incidence rates are commonly used in prospective studies.
Rate versus proportion
A proportion can range from 0 to 100, and the numerator is contained in the denominator.
A rate can range from 0 to infinity, and the numerator is the number of cases, whereas the denominator is the person-time at risk.
Relative RISK(RR)/RELATIVE RISK REDUCTION (RRR)
Relative Risk (RR) is the incidence in exposed individuals divided by the incidence in unexposed individuals. Relative Risk (RR)= A/(A+B) / C/(C+D), If we change the terminology for reduction of risk, the new terminology is Relative Risk Reduction (RRR) with the same equation.Relative risk (RR) is the risk over the entire study. While Hazard ratio (HR) can represent risk at given moment in time.
In retrospective (case-control) studies, where the total number of exposed people is not available, RR cannot be calculated, and OR is used as a measure of the strength of association between exposure and outcome. By contrast, either RR or OR can be calculated in prospective studies (cohort studies), where the number at risk (number exposed) is available.
Absolute Risk (AR) is the difference in the rate of a disease in an exposed, compared with a non-exposed population, also known as risk difference. Absolute Risk (AR)= A/(A+B) - C/(C+D), If we change the terminology for reduction of risk, the new terminology is Absolute Risk Reduction (ARR) with the same equation.
For example, in a study where 10% of patients treated with drug A progressed vs. 15% of patients treated with drug B there is a 5% ARR in disease progression with drug A compared with drug B: Absolute risk reduction (ARR) = 15% -10% = 5%. Using the example above the RRR of progression is reduced by 33% with Drug A compared with Drug B: RRR = (15-10)/15 = 5/15 = 33.3%
Odds ratio (OR)
Odds ratio (OR) is defined as the “odds” or chance of an association between treatment of a medication (or exposure to something) with an outcome. Odds Ratio (OR)= A × D /B × C:
OR=1, means there is no difference in effect between the groups (NOT significant).
OR>1, means there is increase in effectiveness.
OR<1, means there is decrease in effectiveness.
Relative risk (RR) differs from than Odds ratio (OR). For example, if we are studying drug X on mortality, the relative risk of death = ([number of deaths]/[all outcomes(all deaths + survivors)]). While, odds ratio (OR) of death = ([number of deaths]/[number of non-deaths, i.e., survivors]).
For interpretation of odds ratio, for example, an OR of 0.5, suggests that patients exposed to a variable of interest were 50 % less likely to develop a specific outcome compared to the control group. Similarly, an OR of 1.7 suggests that the risk was increased by 70 %.
Hazard ratio (HR)
Hazard ratio (HR) is analogous to an odds ratio (OR). Thus, a hazard ratio of 5 means that exposed group of to a specific risk factor has 5 times the chance of developing the outcome compared with unexposed group, another examples, HR of 2 means that there is double the risk. While, HR of 0.5 tells that there is half the risk (Protection effect):
HR= 1, means there is no difference between the groups (NOT significant).
HR>1, means there is increase in the risk of event.
HR<1, means there is reduction in risk of event.
Number needed to treat (NNT)
Number needed to treat (NNT) is a statistic that tells the actual number of patients who would need to be treated with a given therapy (or combination of therapies) for one patient to get a particular endpoint benefit.
Number needed to treat (NNT)=1/ARR.
The lower the NNT the more effective the treatment. When the outcome is a harm rather than a benefit, a number needed to harm (NNH) can be calculated similarly.
For example, if NNT is nine, Its interpretation can be illustrated by the following sentence: "This study suggests that we need to treat 9 patients to get the desired outcome for 1 patient." On the other hand, if NNH to have AKI is 9 Its interpretation will be as follow "This study suggests that if we treated 9 patients, 1 patient will get the adverse effect (AKI).
Confidence interval (CI)
The 95 % Confidence interval (CI) is the range of values that you can 95 % confident that values lie between and ontains the true mean of the population. In another way, 95% chance that the confidence interval you calculated contains the true population mean. For example, if the CI is 95% then in hypothetical indefinite data collection, in 95% of the samples the interval estimate will contain the true population parameter.
Confidence interval are usually expressed with 95% confidence, this is just a tradition. Confidence intervals can be computed for any desired degree of confidence. The level of confidence can be chosen by the investigator. (e.g. 80%, 90%, 98%...etc).
The 95% CI is used to estimate the precision. A large CI indicates a low level of precision, whereas a small CI indicates a higher precision: Wide CI= Small sample size, while narrow CI=Large sample size.
If standard deviation (SD) is known, CI can be calculated.
CIs are used as an indication of how a study result would be reflected in the general patient population outside of the investigation.
Factors affecting the width of the CI include the size of the sample, the confidence level, and the variability in the sample.
A study that reports the relative risk (RR) or the odds ratio (OR) without reporting the confidence interval cannot be adequately interpreted. Therefore, CIs should always be presented for the relative risk and odds ratio.
For specific variable, a narrower CI (e.g., 90-110) suggest more precise estimation of the populations compared with a wide CI (e.g., 90-500).
Unlike the p-value, the 95% CI does not report a measure’s statistical significance. In practice, the 95% CI is often used as a proxy for the presence of statistical significance if it does not overlap the null value (e.g. OR=1).
For differences (e.g., AR, ARR, beta coefficient), CI should NOT include zero to be significant. However, if CI contains zero it will be considered not significant.
For ratios (e.g., OR, HR, RR), CI should NOT include one to be significant. However, if CI contains one it will be considered not significant.