NKF KDOQI GUIDELINES

KDOQI Clinical Practice Guidelines and Clinical Practice Recommendations for Anemia in Chronic Kidney Disease

V. APPENDIX 1: METHODS OF EVIDENCE REVIEW AND SYNTHESIS

Aim
The overall aim of the project is to update the 2000 KDOQI CPGs for Anemia of CKD². The Work Group sought to update the guidelines by using an evidence-based approach. After topics and relevant clinical questions were identified, the available scientific literature on those topics was systematically searched and summarized. High-quality or moderately high-quality evidence formed the basis for the development of evidence-based CPGs. When evidence was of low or very low quality or was entirely lacking, the Work Group could develop CPRs based on consensus of expert opinion.

Overview of Process
Update of the guidelines required many concurrent steps to:

Form the Work Group and ERT that were to be responsible for different aspects of the process
Confer to discuss process, methods, and results
Develop and refine topics
Create draft guideline statements and rationales
Define exact populations, interventions, predictors, comparisons groups and outcomes of interest and study design and minimum follow-up time criteria (PICOD)
Create and standardize quality assessment and applicability metrics
Create data extraction forms
Develop literature search strategies and run searches
Screen abstracts and retrieve full articles
Review articles
Extract data and perform critical appraisal of the literature
Tabulate data from articles into summary tables
Grade quality and applicability of each study
Grade the quality of evidence for each outcome and assess the overall quality of the evidence across all outcomes with the aid of evidence profiles
Write guideline recommendations and supporting rationale statements and grade the strength of the recommendations
Write CPRs based on consensus of the expert Work Group in the absence of sufficient evidence.

Creation of Groups
The KDOQI Co-Chairs appointed the Co-Chairs of the Work Group, who then assembled groups to be responsible for the development of the guidelines. The Work Group consisted of domain experts, including individuals with expertise in adult and pediatric nephrology, hematology, nursing and nutrition, cognitive function, QOL, and CVD outcomes in patients with CKD. Support in evidence review and methods expertise was provided by an ERT contracted by the NKF at the NKF Center for CPG Development and Implementation. The Work Group and the ERT collaborated closely throughout the project.

The first task of the Work Group members was to define the overall topics and goals for the update. Smaller groups of 2 to 4 individuals were formed and assigned to each topic. The Work Group and ERT then further developed and refined each topic and specified screening criteria for PICOD, literature search strategies, and data extraction forms (described next). Work Group members were the principal reviewers of the literature, and from their reviews and detailed data extractions, they summarized the available evidence and took the primary roles of writing the guidelines and rationale statements.

The ERT consisted of individuals (staff, fellows, and research assistants) from Tufts–New England Medical Center with expertise in nephrology and development of evidence-based CPGs. It instructed the Work Group members in all steps of systematic review and critical literature appraisal. The ERT also coordinated the methodological and analytical process of the report; it defined and standardized the method of performing literature searches, data extraction, and summarizing the evidence in summary tables and evidence profiles. It performed literature searches, organized abstract and article screening, created forms to extract relevant data from articles, organized Work Group member data extraction, checked data, and tabulated results. Throughout the project, the ERT conducted seminars and provided instruction on systematic review, literature searches, data extraction, assessment of quality and applicability of articles, evidence synthesis, and grading of the quality of evidence and strength of guideline recommendations.

Refinement of Topics and Development of Materials
The Work Group reviewed the 2000 KDOQI CPGs for Anemia of CKD² and determined which of the guideline recommendations required updates and which could remain unchanged. These assessments were based primarily on expert opinion regarding the likelihood of new evidence being available. When experts were uncertain about the current evidence basis of a topic, a “first look” of the topic was undertaken to inform this process (Fig 19). After literature review of potentially relevant abstracts and studies, members of the Work Group focused the specific questions deemed clinically relevant and amenable to systematic review or decided to produce a narrative summary of the literature.

Fig 19. Process of triaging a topic to a systematic review or a narrative review. *First Look Topic: Topics for which the substance of the evidence base was unclear were first explored to determine their suitability for systematic review. A sensitive Ovid search was performed for each first look topic by the ERT. Abstracts were reviewed by the Work Group members to determine whether there was an adequately defined and sufficient base of scientific information from which to answer the clinically relevant question or resolve controversies. Topics that qualified were submitted to systematic review, while topics lacking a sufficient evidence base for systematic review were summarized by Work Group members in narrative reviews. †Narrative Review: Work Group members had wide latitude in summarizing reviews and original articles for topics that were determined not to be amenable to a systemic review of the literature. Under special circumstances, focused literature search was performed by ERT for a specific subtopic. ‡Systematic Review: A systematic review entailed systematic screening, data abstraction, appraisal, and synthesis of studies in summary tables and evidence profiles.

The Work Groups and ERT developed: (1) draft guideline statements, (2) draft rationale statements that summarized the expected pertinent evidence, and (3) data extraction forms requesting the data elements to be retrieved from the primary articles. The topic refinement process began before literature retrieval and continued through the process of reviewing individual articles.

Literature Search
A master reference list was compiled from references used in previous evidence-based guidelines on Anemia and CKD:

EBPG II, 2004
EBPG I, 2000
KDOQI Anemia Guideline Update, 2000
DOQI Anemia Guideline, 1997
Caring for Australasians with Renal Impairment (CARI) Anemia Guideline, 2003

For the topics addressed in EBPG II, update searches of MEDLINE were performed. For Hb Targets, a module for (Anemia and ESA and Kidney) was run on articles from January 2003 through March 2004. Selective updates of literature searches were performed through November 2004. A pre-MEDLINE search also was performed to capture more recent trials not yet indexed in MEDLINE. For the topic of Iron Targets, the (Anemia and ESA and Kidney) module was modified by adding additional iron terms and was run to include publications between January 2003 and November 2004. For the topics of adjuvants to ESA treatment, a MEDLINE search was conducted for [(Anemia and Kidney) and (Androgens, Statins, Carnitine, Vitamin E, or Ascorbic Acid)] for all articles published from January1989 through September 2004. A separate search for studies on prevalence of anemia by eGFR was conducted from January 1999 through February 2005. The searches also were supplemented by articles identified by Work Group members through September 2005.

Only full journal articles of original data were included. Editorials, letters, abstracts, and unpublished reports were not included. Selected review articles identified in the searches were provided to the Work Group for background material.

MEDLINE search results were screened by members of the ERT for relevance by using predefined eligibility criteria, described in Table 44. Retrieved articles were screened by the ERT. Potentially relevant studies were sent to Work Group members for rescreening and data extraction. Domain experts, along with the ERT, made the final decision for inclusion or exclusion of all articles.

Generation of Data Extraction Forms
Data extraction forms were designed to capture information on various aspects of the primary studies. Data fields for all topics included study setting, demographics, eligibility criteria, causes of kidney disease, numbers of subjects, study design, study funding source, dialysis characteristics, comorbid conditions, descriptions of relevant risk factors or interventions, description of outcomes, statistical methods, results, study quality (discussed later), study applicability (discussed later), and free text field for comments and assessment of biases. Training of the Work Group members to extract data from primary articles occurred during Work Group meetings and by E-mail and teleconference calls. Work Group members then were assigned the task of data extraction of articles.

Generation of Evidence Tables
The ERT condensed the information from the data extraction forms into evidence tables, which summarized individual studies. These tables were created for the Work Group members to assist them with review of the evidence and are not included in this document. All extracted articles and all evidence tables were made available to all Work Group members. During the development of the evidence tables, the ERT rescreened the accepted articles to verify that each of them met the initial screening criteria and checked the data extraction for accuracy. If the criteria were not met, the article was rejected, in consultation with the Work Group.

Format for Summary Tables
Summary tables describe the studies according to the following dimensions: study size and follow-up duration, applicability or generalizability, results, and methodological quality (see Table 43). The ERT generated summary tables by using data from extraction forms, evidence tables, and/or the articles. All summary tables were reviewed by the Work Group members.

In the summary tables, studies were ordered first by method quality (best to worst), then by applicability (most to least), and then by study size (largest to smallest). Results are presented in their appropriate metric or in summary symbols, as defined in the table footnotes.

To provide consistency throughout summary tables, data sometimes were converted or estimated. Follow-up times were converted to months by estimating 1 month as 4 weeks. In general, data provided as percent Hct was converted into grams per deciliter of Hb by dividing by 3. Additionally, results sometimes were estimated from graphs. All estimated values have been annotated as such.

Systematic Review Topics, Study Eligibility Criteria
The topics covered by systematic review are listed in Table 44. The screening criteria were defined by the Work Group members in conjunction with the ERT.

Literature Yield
For systematic review topics, the literature searches yielded 2,756 citations. Of these, 137 articles were reviewed in full. An additional 19 were added by Work Group members. A total of 83 were extracted and of these, 51 studies are included in Summary Tables. Details of the yield by topic can be found in Table 45.

The literature search yields for first-look topics can be found in Table 46. Upon reviewing the resultant abstracts, only the topics of noniron adjuvants (carnitine, ascorbic acid, and androgens) proceeded to systematic review.

Assessment of Individual Studies

Study Size and Duration
The study (sample) size is used as a measure of the weight of a study. In general, large studies provide more precise estimates of prevalence and associations. In addition, large studies are more likely to be generalizable; however, large size alone does not guarantee applicability. A study that enrolled a large number of selected patients may be less generalizable than several smaller studies that included a broad spectrum of patient populations. Similarly, longer duration studies may be of better quality and more applicable, depending on other factors.

Applicability
Applicability (also known as generalizability or external validity) addresses the issue of whether the study population is sufficiently broad so that the results can be generalized to the population of interest at large. The study population typically is defined primarily by the inclusion and exclusion criteria. The target population was defined to include patients with anemia and kidney disease and subdivided into those with CKD stages 3 to 5 not on dialysis therapy and those with CKD stage 5 on HD or PD therapy. Furthermore, topic 3.6 includes such special patient populations as kidney transplant recipients and patients with nonrenal anemias. Applicability was specified for each study according to a 3-level scale (Table 45A). In making this assessment, sociodemographic characteristics were considered, as well as comorbid conditions and prior treatments. Applicability is graded in reference to the population of interest as defined in the clinical question. Target populations are specified in the titles of each summary table (discussed later).

Study Quality
Method quality (or internal validity) refers to the design, conduct, and reporting of the clinical study. Because studies with a variety of types of design were evaluated, a 3-level classification of study quality was devised (Table 46A).

Quality of studies of interventions. The evaluation of questions of interventions was limited to RCTs. The grading of these studies included consideration of the methods (ie, duration, degree of blinding, number and reasons for dropouts, and so on), population (ie, does the population studied introduce bias?), outcomes (ie, are the outcomes clearly defined and properly measured?), thoroughness/precision of reporting, statistical methods (ie, was the study sufficiently powered and were the statistical methods valid?), and the funding source.

Quality of studies of prevalence. The ideal study design to assess prevalence of anemia and its association with eGFR is a cross-sectional study of a population representative of the general population. Criteria for evaluation of cross-sectional studies to assess prevalence are listed in Table 47.

Results
The type of results used from a study was determined by the study design, the purpose of the study, and the question(s) being asked for which the results were used. Decisions were based on the screening criteria and prespecifed outcomes of interest (Table 47).

Summarizing Reviews and Selected Original Articles
Work Group members had wide latitude in summarizing reviews and citing original articles.

Guideline Format
The format for each section containing an evidence-based guideline or a CPR is outlined in Table 48. Each guideline contains 1 or more specific “guideline or statements,” which are presented as “bullets” that represent recommendations to the target audience. Implementation issues and research recommendations were formulated after this guideline document had been completed and will be presented in another supplement.

Rating the Quality of Evidence and the Strength of Guideline Recommendations
A structured approach, facilitated by the use of evidence profiles and modeled after the Grades of Recommendation, Assessment, Development and Evaluation (GRADE) approach,⁴⁶¹ was used to grade the quality of the overall evidence and the strength of recommendations. For each topic, the discussion on grading of the quality of the overall evidence and the strength of the recommendations was led by the primary expert reviewers of each topic, with participation by the Work Group chairs, all other Work Group members, and the ERT members.

Grading the Quality of Evidence
The quality of a body of evidence pertaining to a particular outcome of interest initially was categorized based on study design. For questions of interventions, the initial quality grade was high if the evidence consisted of RCTs, low if it consisted of observational studies, or very low if it consisted of studies of other study designs. Work Group members decided a priori to include only RCTs for questions of interventions other than harm. The quality rating for each intervention/outcome pair then was decreased if there were some or serious limitations to the quality of the aggregate of studies, there were important inconsistencies in the results across studies, the applicability of the studies to the population of interest was limited or there was uncertainty about the directness of evidence, the data were imprecise or sparse, or there was a high likelihood of bias. The final grade for the quality of the evidence for an intervention/outcome pair could be 1 of the following 4 grades: high, moderately high, low, or very low.

The quality of the overall body of evidence then was determined based on the quality grades for all outcomes of interest, taking into account explicit judgments about the relative importance of each of the outcomes (eg, death and access thrombosis having greater weight than change in ESA dose or Hb level). The actual results were reviewed for each outcome to judge the balance between benefits and harm (Table 49). Four final categories for the quality of overall evidence were used, as shown in Table 50.

Evidence profiles were constructed by the ERT to record decisions about grades and interpretation of summary effects by the Work Group members. These profiles serve to make transparent to the reader the thinking process of the Work Group in systematically combining evidence and judgments. Each evidence profile was filled in by Work Group experts with ERT guidance. Decisions were based on facts and findings from the primary studies listed in corresponding summary tables; additional information related to AEs in nontarget populations, when applicable; and judgments of the Work Group. Judgments about the quality, consistency, and directness of evidence often were complex, as were judgments about the importance of an outcome or the net medical benefit across all outcomes.

The evidence profiles provided a structured approach to grading, rather than a rigorous method of quantitatively summing up grades. In an effort to balance simplicity with full and transparent consideration of the important issues, footnotes were placed to provide the rationale for grading (Table 51).

Grading the Strength of the Recommendations
The quality of evidence for each outcome and across all outcomes was graded in the evidence profile. The guideline recommendation was graded based on the quality of the overall evidence, as well as additional considerations. Additional considerations, such as feasibility, availability of a service, and regional and population differences were implicitly considered. Costs also were considered implicitly, but, in most cases, it was believed that the grading of the evidence and formulation of a guideline and its strength should rest primarily on the evidence for medical benefit to a patient.

The strength of each guideline recommendation was rated as either “strong” or “moderately strong.” A “strong” rating indicates “it is strongly recommended that clinicians routinely follow the guideline for eligible patients. There is high-quality evidence that the practice results in net medical benefit to the patient.” The “moderately strong” rating indicates “it is recommended that clinicians routinely follow the guideline for eligible patients. There is at least moderately high-quality evidence that the practice results in net medical benefit to the patient.” Overall, the strength of the guideline recommendation was based on the extent to which the Work Group could be confident that adherence will do more good than harm. Strong guidelines require support by evidence of high quality. Moderately strong guidelines require support by evidence of at least moderately high quality. Incorporation of additional considerations modified the linkage between quality of evidence and strength of guidelines, usually resulting in a lower strength of the recommendation than would be supportable based on the quality of evidence alone.

After grading the quality of the overall evidence for a topic, the Work Group triaged the recommendations to either an evidence-based guideline recommendation when the quality of evidence was high or moderately high or otherwise to an opinion-based CPR.

In the absence of strong or moderately strong quality evidence or when additional considerations did not support strong or moderately strong evidence-based guideline recommendations, the Work Group could elect to issue CPRs based on consensus of expert opinions. These recommendations are prefaced by “In the opinion of the Work Group,” and are based on the consensus of the Work Group that following the recommendations might improve health outcomes. As such, the Work Group recommends that clinicians consider following the recommendation for eligible patients. Issues considered in the grading of the quality of the evidence and the strength of the recommendation are discussed in the Rationale section of each recommendation.

Limitations of Approach
While the literature searches were intended to be comprehensive, they were not exhaustive. MEDLINE was the only database searched, and searches were limited to English-language publications. Hand searches of journals were not performed, and review articles and textbook chapters were not systematically searched. However, important studies known to the domain experts that were missed by the electronic literature were added for consideration.