User Tools

Site Tools


projectdev:pebc_methods_handbook

Table of Contents

CCO/PEBC METHODS HANDBOOK

Introduction

The intended users of this handbook are guideline development groups working with the CCO/PEBC to develop PEBC guidelines, recommendation reports, and evidence summaries. This document outlines the principles used by the PEBC in developing guideline objectives and research questions, assessing and reporting study quality, and in developing recommendations. It has been developed by a working group of PEBC health research methodologists (HRMs) and is based on a number of sources which are referenced both throughout the document and within the Sources and Acknowledgements section.

Guideline Objectives and Research Questions

PEBC documents have both guideline objectives and research questions. Both are generated by the Working Group with the direction and assistance of the PEBC HRMs and included in the Project Plan for the project.

Objectives vs. Questions

Figure 1 depicts the relationship between guideline objectives, research questions, data, evidence, and recommendations. Guideline objectives are the reason for the guideline’s existence, a description of the problems that the guideline should address. The “answer” to a guideline objective is one or more recommendations. Research questions are derived from the objectives, and are a summary of the evidence needs that must be met to achieve the objectives. The answer to a research question is evidence. Evidence is data that has been synthesized and analyzed in an appropriate fashion.

Figure 1: Guideline Objectives to Recommendations

Writing Objectives

An objective should be phrased as an infinitive verb phrase, as in…

  • To determine the optimal treatment for patients with locally advanced pancreatic cancer
  • To make recommendations with respect to the role of combined modality radiotherapy and chemotherapy in the treatment of localized carcinoma of the esophagus
  • To provide guidance with respect to the appropriate mapping techniques, operative techniques, and techniques for pathological processing, handling, and reporting in sentinel lymph node biopsy of the breast
  • To determine the appropriate screening frequency, age of initiation, and screening protocol, for an organized cervical screening program

When the intended users of the guideline read the guideline objectives, they should have an immediate sense of what sort of recommendations they can expect to see in the guideline. Therefore, there should be at least one objective statement for each major topic area of the guideline.

Objectives should NOT refer directly to evidence or data. For example, “To determine whether aromatase inhibitors provide superior overall survival compared to no aromatase inhibitors” is not a good guideline objective. “To determine the appropriate indications for the use of aromatase inhibitors” would be.

Writing Questions

Questions should be phrased, somewhat obviously, as questions. However, in the Project Plan, additional detail for each question should be provided with respect to the four elements of population, intervention, comparisons, and outcomes when those elements are appropriate. The question itself need not touch all of those elements, but should touch on as many of them as still makes for a readable question. Some examples…

  • Does radiation after surgery increase survival in patients with locally advanced pancreatic cancer compared to surgery alone?
  • Does treatment with chemotherapy decrease important toxicity with equivalent survival in patients with localized carcinoma of the esophagus compared to combined modality radiotherapy?
  • Do very sharp scalpels provide benefits in terms of ease of surgical incision compared to not-so-sharp scalpels?
  • What is the correlation between increased frequency of screening and incidence of cervical cancer in asymptomatic patients undergoing cervical cancer screening?

When the intended users read a research question, they should have an immediate sense of what sort of study would have to be conducted and the outcomes that would have to be measured to answer it.

There may be multiple research questions associated with a single guideline objective, and a single research question may be sufficient for several objectives. That is, the research questions are related to and derived from the objectives, but do not have to match up with them on a one-to-one basis.

Defining Importance of Outcomes

The Grades of Recommendation, Assessment, Development and Evaluation (GRADE) methodology suggests that Working Groups should determine the relative importance of the outcomes under consideration a priori. For simpler projects this might not be necessary, but for more complicated projects with many questions and/or outcomes, this is a useful step. Working Groups should follow the GRADE methods to determine the importance of outcomes and should classify them as “Critical”, “Important”, or “Limited Importance”. In certain cases, the ranking of outcomes could change over time (e.g. if an important outcome was inadvertently left out of the project plan).

Identification and Review of Existing Guidelines

As there are a number of national and international groups that develop high-quality guidelines, the first stage in the creation of PEBC documents is a systematic review of existing guidelines on that subject. A systematic search of the available electronic databases (Medline, EMBASE, National Guidelines Clearinghouse, CMA Infobase, etc.) is conducted. The websites of recognized guideline development groups, for example, National Institute for Clinical Excellence (NICE), Scottish Intercollegiate Guideline Network (SIGN), the National Comprehensive Cancer Network (NCCN), and the American Society of Clinical Oncology (ASCO), are also reviewed for relevant guidelines. The intent of this search is to create a comprehensive list of all existing, recent guidelines that are relevant to the project. For most topics, only guidelines published in the past three years need be considered.

All relevant guidelines are evaluated using the AGREE II instrument, and the Working Group then considers the available guidelines and decides whether endorsement or adaptation of one, or in rare instances more than one, of these guidelines would be sufficient to address the clinical and/or organizational questions outlined in the project plan. The Working Group looks at the currency and quality of the identified guidelines and the relevance of their recommendations to the populations of interest and the Ontario context when considering whether the identified guidelines might be endorsed (the recommendations should be used without modification) or adapted (the recommendations should be modified) for use in Ontario. If one or more existing guidelines are deemed worthy of adaptation, the Working Group proceeds, using the ADAPTE methods where feasible and relevant.

The Working Group may be aware during project planning of highly relevant existing guidelines that could be adapted. In fact, the existing guideline(s) might have been the impetus for the development of an Ontario guideline. Based on their knowledge of an existing guideline, the Working Group might choose not to conduct a complete search for other guidelines but instead, move forward with adapting the known guideline. However, the Working Group should have a strong justification for assuming that the guideline they have chosen is the one most worthy of adaptation.

In some cases, existing guidelines may address some of the questions of interest but not address all questions completely. In such cases, the Working Group may choose to adapt existing guidelines for those questions and use other methods, as described below, to address the remaining questions.

In the experience of the PEBC, adaptation of recommendation(s) from existing guidelines is a pragmatic method to use when:

  • There are a limited number of existing guidelines, and
  • The existing guidelines address all, or nearly all, of the topics and questions the Working Group wants to address, and
  • The evidence base used in those guidelines is fairly recent, usually not more than three years out-of-date.

When these criteria are not met, the PEBC has found that the most efficient way to move forward is to use the evidentiary base of the existing guideline(s) to formulate new recommendations, rather than using the existing recommendations, and to update the evidentiary base as necessary.

From Questions to Study Selection Criteria and Literature Search Strategy

Based on the research questions, the HRM and Working Group can develop an appropriate search strategy and study selection criteria. The study selection criteria should be chosen such that the eligible studies will provide the data to answer, in whole or in part, the research questions. The search strategy is crafted based on the study selection criteria, balancing the need to identify all relevant studies with the need to reduce the number of irrelevant studies. Search strategies are peer reviewed by two PEBC HRMs. The studies identified through the literature search and study selection form the evidence base and provide answers to the research questions.

Special issues regarding selection criteria for non-randomized studies

When it is necessary to systematically review non-randomized studies, some additional considerations arise that may not arise for randomized trials. In general, only prospective studies greater than 30 patients per group should be included. Below thirty patients the uncertainty around any measured outcomes may be very large, and any seeming trends in the evidence may just as likely be misleading as informative. In addition, case reports and case series, as defined according to Part 3, Table 13.2a of the Cochrane Handbook, should be excluded as there is rarely any compelling justification to include them. Priority should be given to studies that had prospective data collection and/or are driven by a study protocol. Completely retrospective studies should only be included with a compelling justification of their value to making recommendations. For example, the Working Group may be aware of large (e.g. hundreds of patients) case series reported in the literature. It may be reasonable to include such studies if the Working Group can provide a justification for how such studies will inform potential recommendations.

In the case of rare conditions, it may be the case that the only available literature will be small studies of lower quality designs. The Working Group should have a frank discussion about what sorts of studies would allow the development of recommendations, and search for only those studies, accepting that the evidence base may be very small and recommendations may be very difficult to make. As noted above, small studies may be just as likely to misinform as to inform the recommendations.

In all cases, the HRM assigned to the project should consult with PEBC's manageent team on the selection criteria for non-randomized treatment studies before the project plan has been approved by the Working Group.

Searching for Existing Systematic Reviews

Usually PEBC guidelines are conducted in two planned stages: a search for systematic reviews followed by a search for primary literature. There are three potential uses for existing reviews. First, they can replace all or part of any needed primary literature search. The PEBC can simply incorporate the reference list from that review as its own reference for the time period and topics covered. Second, they can reduce the amount of data extraction and quality assessment necessary, assuming the existing review has extracted and summarized the data in a useful fashion. Third, they may report on meta-analyses that would be difficult or impossible for the PEBC to replicate. If a review provides none of these benefits, then that review need not be discussed in any detail in the PEBC document, even if it covers a relevant topic area. Its reference can be mentioned, the reasons why it was not useful described, and then the Working Group can move on to the primary literature.

Systematic reviews are identified through searches of electronic databases such as MEDLINE and EMBASE, and the Cochrane Library by using validated search strategies that can be found in the PEBC Toolkit. Also, the PROSPERO registry of systematic reviews can be useful in determining if systematic reviews on the topic of interest are currently being conducted.

AMSTAR is a useful tool for evaluating the quality of systematic reviews. However, there are two caveats when using AMSTAR. First, only those reviews that have the potential to provide one of the three benefits mentioned above should be evaluated; it is a waste of time to evaluate reviews that one knows will not be used. Second, AMSTAR only evaluates reporting of the methods of the systematic review, not the review itself. A systematic review can score highly on AMSTAR but still have important flaws relevant to the PEBC project.

Primary Literature Search, Assessment and Reporting

Once the utility of existing systematic reviews has been assessed, any necessary search for primary literature will be conducted using accepted search methods. The PEBC has a set of standardized search filters that form the basis of our searches, and search strategies are peer reviewed by other PEBC HRMs before being used. The detailed methods of how the search is conducted are beyond the scope of this Methods Handbook.

This section describes specific methods of quality assessment and reporting that are dependent on study design, as well as some common assessment steps regardless of study design.

General Concepts - Quality and Potential for Bias Assessment

There are two important reasons for assessing study quality and the potential for bias: 1) to ensure internal validity and 2) to ensure external validity.

Internal validity is the extent to which a study’s estimate of effect represents the true effect of an intervention in the study sample. Are the study results correct or were they influenced by biases and lack of methodological rigour? For example, if a study reports a cancer recurrence risk ratio of 0.5 (95% confidence interval of 0.4 to 0.7) for an experimental therapy compared with standard therapy, is there really a 50% reduction in the risk of recurrence or were there flaws in the study design that affected the results?

External validity is the extent to which a study’s results can be applied to situations outside the exact context of the study. For example, assuming the results of the study above were internally valid but only patients less than 65 years old were included, are the results applicable to the typical population seen in clinical practice that includes many patients over the age of 65?

We talk about quality and potential for bias as separate concepts, because while they are related to each other, they are not the same thing. There are some aspects of study quality (e.g., ethics approval, obtaining informed consent from all participants) that are not likely to bias the study results. In addition, a study that is conducted to the highest quality standard for a particular setting may still have important potential biases that limit its validity. For example, a well-designed randomized controlled trial (RCT) may, through no fault of the investigators conducting it, have problems with internal validity due to poor recruitment or an inability to blind participants to the intervention that they receive.

A detailed explanation of the types of phenomena that introduce bias in studies can be found in the Agency of Healthcare Research and Quality (AHRQ) Methods Guide.

Common Assessment Elements - All Study Types

When you assess a study, you are assessing what steps the study investigators took to mitigate the potential threats to the validity of the study, and whether those steps were adequate. In addition, you are assessing whether there are any potential threats to the validity of the study, particularly external validity, that were outside the control of the study investigators but still important. The specific quality-related features of the studies one will consider vary depending on the types of studies being evaluated and the nature of the interventions being studied. However, there are some common elements that can and should be assessed regardless of study. These common elements will be discussed below, followed by some specific instructions and tables.

Regardless of the specific instructions provided elsewhere in this handbook, these common elements should always be assessed and reported on for any studies included in PEBC systematic reviews.

Blinding

Blinding is another useful method that a study’s investigators can use to increase validity. Blinding involves procedures to prevent study participants, personnel and outcome assessors from knowing which study arm a given trial participant belongs to and can occur at many points in the implementation of a study protocol. Blinding may be possible in any study design, from the simplest case-control study to the most complicated randomized trial. Blinding of outcome assessment is particularly important for outcomes that are subjective and open to interpretation.

Prospective Design

The more a study is conducted based on an a priori design plan with prospective data collection and assessment of outcomes, the more opportunities the study investigators have to mitigate threats to validity. All randomized trials are prospective, but non-randomized studies can include a combination of both prospective and retrospective elements.

Power and Sample Size Determination

A lack of statistical power is an important threat to the internal validity of any study. Therefore, regardless of other study design features, a priori assessment and calculation of the needed sample size to achieve a desired level of statistical power is an important consideration. If a study does not include power calculations and there is a failure to achieve statistical significance for the outcome of interest, it can mean that either the intervention is not significantly different from the standard treatment or that the sample size simply was not sufficient to detect the difference. Unfortunately, this cannot be determined without knowing if a suitable number of participants were recruited into the study.

Appropriate and Complete Outcome Assessment

For any study, the outcomes under study must have been assessed in an appropriate fashion, and this assessment must have been completed on all relevant participants. If outcome assessment is not complete (e.g. there is significant loss to follow-up), then the internal validity of the study may be questioned. Reasons for loss to follow-up should be included in study reports. For studies designed to assess the superiority of one intervention over another, all participants should be analyzed in the groups to which they were randomized using the intention-to-treat principle.

Source of Funding

The source of funding for a study should be identified in any abstract or full publication. Even when there is no overt and troublesome participation of the study funder in the implementation, analysis or interpretation of a study, the source of funding is still very relevant when considering the potential for bias. If the study authors are editorially independent from the funder, this should be explicitly noted in the publication.

Appropriate Analysis

Regardless of the design of the study, use of inappropriate methods to analyze the relationship between the interventions and outcomes cast the study’s internal validity into doubt.

Intervention Studies: Randomized Trials

Description of Trials

For each randomized trial included in a systematic review, the following basic descriptive elements should be reported in table or text form:

  • The number of patients allocated to each arm
  • The eligibility criteria and other key characteristics of the patients in the trial
  • A description of the interventions provided in each arm
  • The planned primary outcomes of the trial
  • The planned sample size and statistical power of the trial
  • The length of follow-up when time-to-event outcomes are involved, preferably by arm
  • The source of funding for the trial

The Working Group should comment on the suitability and generalizability of the items above, especially the eligibility criteria and the chosen interventions. It is particularly important to comment on the interventions chosen for the control arm in each trial, as unsuitable control is an important threat to external validity.

Randomization Method

The method of randomization should be considered and reported. Assignment of participants to different interventions using a randomly generated sequence minimizes selection bias by balancing the differences among participants across treatment arms. Examples of adequate randomization methods are: use of a random number table, a computer random number generator, or coin tossing, while allocation by birth date, day of admission or clinical record number are not true randomization. Allocation using minimization is considered an appropriate alternative to randomization.

Allocation Concealment

The method of allocation concealment should be considered and reported. Researchers should not be able to predict the intervention to which participants will be allocated and consciously or unconsciously influence which participants are given which intervention. Adequate methods to conceal allocation prior to group assignment include central telephone, web-based or pharmacy-controlled allocation, sequentially numbered identical drug containers, and sequentially numbered, opaque, sealed envelopes.

Risk of Bias Assessment

The PEBC endorses the use of the Cochrane Collaboration’s risk of bias assessment tool for randomized trials. This tool can be found in Chapter 8 of the Cochrane Handbook. The results of the risk of bias assessment for each trial , with a summary assessment (e.g. Low risk of bias, Unclear risk of bias, High risk of bias), should be reported in a table in the systematic review.

Reporting of Results

The critical and important outcomes, identified by the Working Group, should be reported by trial in one or more tables in the systematic review. Where possible, outcomes should be reported both as relative and absolute differences. For example, for overall survival, both the hazard ratio and its confidence interval, as well as the difference in proportion surviving at one or more appropriate time points, should be reported. The Working Group should comment not only on the results themselves, but also the appropriateness of the statistical methods used. The Working Group should consult with the PEBC management team if there is a question as to whether the analytical methods used in the study were appropriate and implemented correctly.

Meta-analysis of RCTs

Meta-analysis should always be considered when there are clinically homogeneous randomized trials that report the same outcome. However, deciding whether trials are clinically homogeneous is more art than science, and reasonable people will differ in their opinions. In general, the PEBC takes a conservative stance towards clinical homogeneity. That is, the burden of proof is on those who believe that the trials are homogeneous enough to justify the analysis. In every case where meta-analysis is being considered, the PEBC management should be consulted. The PEBC uses random effects models unless a compelling justification exists to do otherwise. When time-to-event outcomes are of interest, the PEBC prioritizes the meta-analysis of hazard ratios as opposed to risk ratios or other measures made at a single point in time.

Intervention Studies: Non-Randomized Comparative Studies

When non-randomized studies are considered, the PEBC endorses the use of the Cochrane Collaboration’s schema for defining the designs of these studies, as found in Table 13.2a of the Cochrane Handbook.

Description of Studies

Due to the wide variety of different study designs that are possible, there must be considerable flexibility in describing these studies. However, at a minimum, the following should be reported for each study:

  • Its design, as defined by Cochrane Table 13.2a
  • The number of patients studied, broken down by intervention received
  • The eligibility criteria and other key characteristics of the patients in the study
  • A description of the interventions
  • The study’s source of funding

Other relevant descriptive features should be included as necessary.

Risk of Bias Assessment

Again, as there are a wide variety of studies that may be included, there can be no one schema for assessing their risk of bias. In addition, because of their very nature, the Working Group should assume that such studies have, at best, an unclear risk of bias when considered conceptually using the table quoted above from the Risk of Bias Assessment tool. Therefore, assessing such studies is less about assessing the risk of bias but rather about assessing the actual bias present, its direction, and its magnitude.

The Working Group should consider the general threats to internal and external validity described above, as well as the common elements. While a specific assessment for each study is not required, studies that are particularly biased, or studies that are particularly free of bias, should be named and commented upon. A general assessment of the potential for bias across studies that address similar situations should be made, acknowledging the fact that there is a known risk of bias due to the lack of randomization.

The ROBINS-I tool (Risk Of Bias In Non-randomized Studies - of Interventions) may be used to evaluate the risk of bias (RoB) of non-randomized studies of interventions (NRSI) that compare the health effects of two or more interventions. The types of NRSI that can be evaluated using this tool include observational studies where allocation occurs during the course of usual treatment decisions or through patients' choices; these include cohort studies, case-control studies, controlled before-and-after studies, interrupted-time-series studies and quasi-randomized studies. The complete ROBINS-I tool can be found at https://sites.google.com/site/riskofbiastool/. Note that if ROBINS-I is to be used, an a priori assessment of potential confounding variables should be conducted during project planning.

Reporting of Results

The critical and important outcomes, identified by the Working Group, should be reported. It is reasonable, with appropriate justification, to report no results from studies if, in a post-hoc assessment, they are found to be worthless for developing recommendations. While consistency is a virtue in reporting, brevity and focus are also virtues that should be weighed against consistency. For example, if you find in a table of results that there are many blank/“not reported” cells, reducing the number of columns in the table and/or reporting the results in text form may be worthwhile.

Meta-analysis

Meta-analysis of non-randomized treatment studies is often not possible due to clinical and outcome heterogeneity across the studies. When the clinically homogeneity of the studies (in terms of patient population, intervention, outcome, etc.) is not seriously questioned, a meta-analysis may be considered. The methods of meta-analysis must be customized to the particular study designs and outcomes in the studies; consult with the PEBC management on this issue.

Diagnostic Studies

Studies of diagnostic methods and modalities are very different from other types of studies in both their design and execution. The PEBC endorses, in general terms, the methods and concepts described in the Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy, as modified and elaborated upon below.

Description of Studies

For each study included in the review, the following descriptive elements should be reported in table or text form in addition to the common elements described above:

  • The number of patients included in the study
  • The eligibility criteria and other key characteristics of the patients in the study
  • The method of recruitment for the study. This will be either a “cohort” method (e.g. all patients at a particular institution) or a “case-control” method (e.g. patients with the condition are recruited via a different mechanism than those without the condition)
  • The method of comparison, assuming more than one intervention is under consideration. This will be either “fully paired” (e.g. all patients receive all interventions), “randomized” (e.g. patients are randomized to the interventions), or “indirect” (e.g. patients receive the interventions via some other method). In general, PEBC reviews should exclude “indirect” studies unless there is a good justification to include them. It is important to note that for studies of this nature, while both fully paired and randomized studies control for selection bias, the fully paired design is more efficient (e.g. provides more statistical power for the same number of patients) than the randomized design.
  • Description of the interventions the patients underwent, especially if it varies from study to study. For example, if some studies compared ultrasound and MRI with mammography, and some only compared MRI with mammography, this should be clear.
  • Description of the reference standard used in each study.
  • Criteria for determining the result of the intervention. This will usually be the criteria for determining whether the test was positive for the condition or not.

Risk of Bias Assessment

The PEBC endorses the use of the Quality Assessment of Diagnostic Accuracy Studies (QUADAS II) tool to assess the risk of bias for diagnostic studies, as suggested by the Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy. The Working Group should also make a summary statement of the potential risk of bias for each study and across all studies that address similar situations (low/unclear/high) as described above for randomized trials.

Reporting of Results

The outcomes from a diagnostic study can be classified into 3 levels, listed in order of their typical utility in making recommendations: patient-related outcomes, changes in patient management outcomes, and diagnostic accuracy outcomes.

Patient-related outcomes can include mortality, recurrence rate, health-related quality of life, side effects, and so on. They are the most useful outcomes with respect to recommendation development, and should be prioritized where possible.

Changes in patient management outcomes document how clinical decision making was affected by the information provided by the diagnostic intervention under consideration. They could include such things as avoidance of unnecessary surgery, upstaging/downstaging leading to change in therapy, alterations of treatment plan, etc. These outcomes are also important, because they can show how a diagnostic intervention generates not just accurate, but also clinically important information.

Diagnostic accuracy outcomes include sensitivity, specificity, positive predictive value, negative predictive value, likelihood ratio, post-test probability, ROC (receiver operating characteristic) curve, etc. A detailed explanation of these outcomes can be found in this paper: Leeflang et al 2009. These outcomes may be of the least value in making recommendations, as more accurate information may only be of value when that information leads to some change in the outcomes already mentioned (patient-related, patient management). When diagnostic accuracy outcomes are reported, the complete two by two table of true positives, true negatives, false positives, and false negatives should be reported for each study. If necessary, this table should be reconstructed from the available data for each study, and if any of the cells in the table cannot be found this should be noted. In addition, the prevalence of the condition, sensitivity, specificity, positive and negative predictive values should be reported for each study. Positive and negative likelihood ratios and diagnostic odds ratios may also be a useful outcome to report.

Meta-analysis

Meta-analysis of the results of these studies is possible, but it is a methodological sophisticated procedure with many potential pitfalls. Therefore, all meta-analyses of such data should be planned in consultation with the PEBC management.

Screening Studies

Screening studies can fall under intervention and/or diagnostic studies. Please follow the above specific instructions for intervention and/or diagnostic studies to assess risk of bias and summarize the data.

Prognostic Studies

There are four types of prognostic research questions as outlined in Hemingway et al 2013:

  1. Fundamental prognostic research, like five-year survival variations among countries after adjusting age in patients with breast cancer
  2. Prognostic factor research, like specific factors (such as biomarkers) that are associated with prognosis
  3. Prognostic model research to develop and validate a model to predict individual risk of a future outcome
  4. Stratified medicine research to help tailor treatment decisions to an individual or group with similar characteristics

Description of Studies

Due to the different study types, there must be considerable flexibility in describing these studies. However, at a minimum, the following should be reported for each study:

  • Its design, as defined by Cochrane Table 13.2a (including initial cohort or not)
  • The number of patients studied
  • Follow-up time
  • Multivariable model for types 1, 2, and 4 prognostic studies
  • Development model and validation model for type 3 prognostic studies
  • The study’s source of funding

Other relevant descriptive features should be included as necessary.

Risk of Bias Assessment

At present, the PEBC suggests the use of the Quality in Prognosis Studies (QUIPS) tool (Hayden et al 2013) to assess the risk of bias for types 1, 2, and 4 prognostic studies; and the Prediction Study Risk of Bias Assessment (PROBAST tool) to assess the risk of bias for type 3 prognostic studies.

Reporting of Results

As prognostic studies may report a wide variety of outcomes, no specific guidance can be provided. Consult with the PEBC management if there is a question as to how to proceed.

Meta-analysis

Meta-analysis should always be considered when eligible studies are clinically homogenous that report the same outcome. Meta-analysis of prognostic studies may be conceptually complex, so the PEBC management should be consulted before proceeding.

Assessment of the Quality of the Aggregate Evidence

After assessing the risk of bias for each study per outcome, the quality of the aggregate evidence per comparison (or research question) needs to be assessed. The risk of bias, inconsistency, indirectness, imprecision and publication bias are among the domains to be taken into account. The PEBC uses two approaches as options for this.

  1. Using the AGREE (Item 9) (Appraisal of Guidelines, REsearch and Evaluation) and AGREE REX (Item 1) (AGREE Recommendation Excellence) Reporting Checklists (more details are available on AGREETRUST website);
  2. Using the GRADE’s (Grading of Recommendations, Assessment, Development, and Evaluation) Summary of Findings Table (more details are available on the GRADEpro website).

Recommendations Development

The PEBC uses learnings from GUIDE-M and GRADE to inform its strategy for the interpretation of evidence to inform recommendations and consideration of issues relevant to their implementability.

Figure 2: The Process of Recommendations Development

Figure 2 depicts the overall process of developing recommendations at the PEBC. The Working Group identifies the key evidence (i.e., magnitude and uncertainty) of benefits and harms associated with the interventions or objects of study being considered. These are weighed against each other taking into account the values and preferences of patients and the Working Group, and the certainty or quality of the aggregate evidence. Each potential recommendation (or logical recommendation cluster or domain of the evidence) will include information about the key evidence and the interpretation of the evidence. These are described below.

Key Evidence

Based on the results found in the systematic review, and the quality appraisal as described above, the Working Group and HRM should craft succinct statements about the key pieces of evidence that address each research question. The available evidence should be summarized in a way that presents the salient data regarding benefits and harms numerically (e.g. relative and/or absolute measurements, with uncertainty) in as brief a fashion as possible. Refer the reader to the data tables and discussion in the systematic review of the document if necessary.

Interpretation of Evidence

Each Working Group needs to arrive at a common interpretation of the available evidence as part of developing the recommendations. The PEBC has developed a set of criteria and questions to consider while interpreting the evidence, based on the GRADE methods and past experience. These criteria form an agenda for a discussion guided by the PEBC HRM. They are applied for each potential recommendation (or logical recommendation cluster or domain of the evidence).

Criteria for Interpretation of the Evidence

Criteria Questions Judgements/Options Explanations
PATIENT VALUES Is there important uncertainty about
how much patients value the outcomes
♦ Important uncertainty or variability
♦ Possibly important uncertainty or variability
♦ No known uncertainty or variability
♦ Probably no important uncertainty or variability
♦ No important uncertainty or variability
How much do those affected by the option value each of the outcomes in relation to the other outcomes
(i.e. what is the relative importance of the outcomes)?

The more likely it is that differences in values would lead to different decisions, the less likely it is that there will
be a consensus that an option is a priority.

Values in this context refer to the relative importance of the outcomes of interest (how much people value each of
those outcomes). These values are sometimes called ‘utility values’. This should be viewed from the target population’s
perspective
CERTAINTY OF EVIDENCE What is the overall certainty of
this evidence?
♦ Very low
♦ Low
♦ No included studies
♦ Moderate
♦ High
What is the overall certainty of this evidence of effects, across all of the outcomes that are critical to making a decision?

The less certain the evidence is for critical outcomes (those that are driving a recommendation), the less likely that an
option should be recommended
DESIRABLE EFFECTS How substantial are the desirable
anticipated effects?
♦ Trivial
♦ Small
♦ Don't know
♦ Varies
♦ Moderate
♦ Large
How substantial are the desirable anticipated effects (including health and other benefits) of the option (taking into account
the severity or importance of the desirable consequences and the number of people affected)?

The larger the benefit, the more likely it is that an option should be recommended
UNDESIRABLE EFFECTS How substantial are the undesirable
anticipated effects?
♦ Large
♦ Moderate
♦ Don't know
♦ Varies
♦ Small
♦ Trivial
How substantial are the undesirable anticipated effects (including harms to health and other harms) of the option
(taking into account the severity or importance of the adverse effects and the number of people affected)?

The greater the harm, the less likely it is that an option should be recommended
BALANCE OF EFFECTS What is the balance between the harms
and the benefits?
♦ Benefits < Harms
♦ Benefits ≤ Harms
♦ Don't know
♦ Benefits = Harms
♦ Benefits ≥ Harms
♦ Benefits > Harms
The larger the desirable effects in relation to the undesirable effects, taking into account the values of those affected
(i.e. the relative value they attach to the desirable and undesirable outcomes) the more likely it is that an option should be recommended
ACCEPTABILITY Is the option acceptable to patients and
providers?
♦ No
♦ Probably no
♦ Don't know
♦ Varies
♦ Probably yes
♦ Yes
The less acceptable an option is to patients and providers, the less likely it is that it should be recommended,
or if it is recommended, the more likely it is that an implementation strategy should address concerns about acceptability
(these concerns can be mentioned under implementation considerations).

Unacceptability may be due to some patients or providers:
1) Not accepting the means of administration of a treatment (e.g., monthly injections versus a daily pill)
2) Not accepting the change in practice patterns (e.g., changes in training)
3) Morally disapproving the option (i.e. in relationship to ethical principles such as autonomy, nonmaleficence, beneficence or justice)
GENERALIZABILITY Is this evidence generalizable to the entire
target population?
♦ No
♦ Probably no
♦ Don't know
♦ Probably yes
♦ Yes
If the evidence applies to a specific subset of the target population, this subset should be clearly defined in the recommendation.

List situations or circumstances where the action statement should not be applied

Drafting the Interpretation of the Evidence

The working group, based on the criteria above, will draft a summary of their interpretation of the evidence. Not all criteria will be equally important in every case, but any that are should be explicitly named in this summary, either as headings or in the summary, for example…

  • “The primary desirable effect of the treatment, an increase in overall survival, is very large, especially when compared with the principal undesirable effects such as increased febrile neutropenia.”
  • “The available evidence is difficult to generalize to the target population in Ontario, because it did not cover all the relevant age groups and risk profiles that would be met in common practice.”

While drafting the interpretation of the evidence, the Working Group should also consider the following.

Examine the Values of the Working Group

In many guidelines, the balance between desirable and undesirable effects is not always clear, and reasonable people may differ with respect to how they weigh these effects based on the values they bring to the assessment. This is touched upon in the “Value of Outcomes” criteria above, but warrants further consideration. The Working Group should explicitly consider the values they are using to weigh the desirable and undesirable effects against each other, and should specify this in their interpretation of the evidence. For example…

  • “The Working Group considered the increase in overall survival apparent with Therapy A, while small, was important, and that this increase was more important than the real but clinically manageable increase in toxicity. The Working Group recognizes that there were important deficiencies in the blinding of the arms in RCT 1, but do not believe these deficiencies challenge the overall conclusion of the study or the magnitude of the benefit from Therapy A. In patients in this target population, where there are few alternative therapies, any therapy that demonstrates an overall survival advantage should be strongly considered for use.”

This statement directly compares the benefits with the harms, and expresses the working groups values used to weigh these against each other. It states how the Working Group evaluated the uncertainty in the evidence.

Alternate Viewpoints and Considerations

When crafting these statements, the Working Group should consider the conclusions others might reach when interpreting the same data. How might reasonable people differ in weighing the benefits and harms, or in assessing the importance of a particular flaw in quality or potential bias? The Working Group should outline alternative conclusions that may be reasonable from the same evidence and quality assessment, and then state why they do not agree with that conclusion. For example…

  • “The Working Group recognizes that some medical oncologists may feel that in the elderly population included in the study of Therapy A the toxicity associated with therapy poses too great a burden. The Working Group believes that the potential for increased survival in these patients still warrants offering Therapy A to these patients, as always with a discussion of the relative benefits and risks.”

The ultimate goal of this subsection is that, for each guideline objective, a reader will be able to understand the reasoning that the working group used to come to their considered judgment. If that reader does not agree with the working group’s conclusions, the reader should be able to find in this subsection exactly where the point of disagreement lies.

Draft the Recommendations

Based on the interpretation of the evidence, the Working Group should draft one or more recommendations for each guideline objective.

When recommendations can be made, the PEBC favours the use of active language. That is, the recommendation should make it clear what action is being encouraged or discouraged.

  • Active verbs: The preferred recommendation wording involves the use of an active verb, such as “advise”, “offer”, “perform”.
  • Level of obligation: “May”/”Should”/”Must” - These words convey a sense of the level of obligation on the part of the reader to take the action described. “Must” is an imperative, while “may” indicates other courses of actions are reasonable. “Should” implies that the action is desirable, but not necessarily in all contexts and situations. Alternatively, “is a reasonable option”/“is recommended”/”is strongly recommended”: Again, these phrases convey an increasing level of obligation on the reader.

In the case where the evidence simply does not support any active recommendation, the Working Group should state this fact, as in…

  • No recommendation can be made for or against the use of Therapy A.

In such cases, a qualifying statement should present the available options and discuss their potential benefits and harms.

Recommendations should address the objectives of the guideline, and each objective should have at least one recommendation, even if it is “No recommendation can be made…” Recommendations do not address the research questions directly, but are rather built upon the evidence identified in answering those questions.

Writing the Recommendation Statements

In order to generate clear, transparent, and implementable recommendations, the PEBC ensures that recommendation statements include the following elements.

Criteria Questions Judgements/Options
WHAT TO RECOMMEND WHAT is being recommended? Most likely the intervention under investigation
ACTION WORD What is the ACTION? Active verbs: Order, test, prescribe, offer, use, etc
WHO IS RESPONSIBLE WHO is responsible for ensuring the ACTION is undertaken? Specific healthcare provider? Team?
Type of Recommendation
and Level of Obligation
At what level of obligation should the reader feel the recommended action should be followed? ♦ Must (strong recommendation)
♦ Should
♦ May (weak recommendation or consensus statement)

Implementation Considerations

The Working Group should consider how the recommendations that have been drafted might be implemented. The following criteria should be considered. The overall effect of all of the recommendations is considered, as well as that of individual recommendations.

Implementation Considerations Criteria

Criteria Questions Explanations
FEASIBILITY ♦ Is the option(s) feasible to implement?
♦ Is this the standard of care or is this a change to the standard of care?
♦ Are there any barriers or enablers to the implementation of these recommendations?
This includes system-level considerations, including human resources, organizational considerations, and cost.

Resource needs are not considered by the Working Group when determining whether or not to make a recommendations or a recommendation’s strength; however, we record these considerations to provide context to the target audience
EQUITY ♦ What would be the impact on health inequities?
♦ Would the option reduce or increase health inequities?
Provide context regarding whether or not, in the opinion of the Working Group, the recommendations would reduce or create further inequities. Examples include: gender, socio-economic status, race, culture, co-morbid disease, remote location
PATIENT CONSIDERATIONS ♦ Is it anticipated that most, some, or few patients will view the recommended action as acceptable?
♦ Is it anticipated that the outcomes valued by the clinicians will align with the outcomes valued by the patients?
Any concerns for patient acceptability can be mentioned here to help inform any future implementation strategies
PROVIDER CONSIDERATIONS ♦ Would the interpretation of the evidence align with the interpretation of most members of the clinical community?
♦ Would the recommendations be accepted by most, many, or some providers for their implementation?
♦ Would they align with current practice?
♦ Would they align with norms within the clinical community?
♦ Would they require providers to have additional training?
External review results can also help to answer these questions.

Any concerns for provider acceptability can be mentioned here to help inform any future implementation strategies
SYSTEM CONSIDERATIONS ♦ Would these recommendations require a significant change to the current system?
♦ Would these recommendations require a significant change in how the system is organized?
♦ Is it anticipated that implementation of these recommendations would be costly?
The stakeholders could include the intended users, the sponsors, or individuals responsible for the implementations of the recommendations

The issues identified in the discussion of the criteria above should be presented in the “Implementation Considerations” section of the Guideline. This may occur once for all recommendations combined or there may need to be a separate section for each recommendation or for groups of recommendations if there are significant differences in implementation considerations.

Sources and Acknowledgements

This handbook and the associated templates and checklists were developed using information from a number of sources, including:

Many PEBC staff have worked on this handbook and it's predecessors over the history of the PEBC. However, the following individuals are acknowledged for their more extensive contributions to the current version:

  • Fulvia Baldassarre
  • Judy Brown
  • Sheila McNair
  • Hans Messersmith
  • Lesley Souter
  • Emily Vella
  • Xiaomei Yao
  • Caroline Zwaal

Melissa Brouwers is acknowledged for her feedback and overall direction of the project.

projectdev/pebc_methods_handbook.txt · Last modified: 2021/10/22 10:37 by Sheila McNair

Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki