Journal of Knowledge Management Practice, March 2001

Data Quality Assessment Methods in Healthcare Information Systems

Robert Jameson, CDS Research; Professor Daniel P. Lorence, Pennsylvania State University;
and Rick Churchill, Virtual Management Institute

ABSTRACT:

Historical attempts to impose rigid data-based practice and performance standards in the delivery of healthcare have met with resistance from providers, though the personalized nature of healthcare practice required little need for rigorous data quality assessment in daily practice. The rapid adoption of evidence-based decision support systems at the provider level, however, now suggests that the application of data quality improvement is less likely to be found objectionable in the establishment of standards for information application and management in health care.  Such an information-intensive environment requires a system of formal, continuous data quality assessment in service delivery and quality management.


Data Quality Assessment Methods in Healthcare Information Systems 

On analysing types of methods favored by early adopters of formal data quality assessment in healthcare organizations,  one sees that historically, a reactive, problem-based approach has been the norm for the assessment and measurement of information accuracy. With the emergence of evidence-based, data-driven medicine in the U.S. however, the representation of the patient, and ultimately the organization’s measured quality of care, is primarily represented by data rather than intensive, personal interaction with the patient (1994). As such, the quality of data maintained by the organization becomes a critical factor in the ultimate delivery of care, and the need for more rigorous quality assessment methodologies becomes apparent, requiring the inclusion of assessment of data comprehensiveness, consistency, currency, relevance, and timeliness (Agmon, 1987).

Past attempts to impose rigid data-based standards in the application of clinical judgement have met resistance from healthcare providers, and thus the need for rigorous data quality assessment was of little consequence. Today, with the increasing deployment of decision support systems at the provider level, application of data quality improvement is less likely to be found objectionable in the establishment of standards for information application and management in health care, in both the care and treatment of patients as well as the management of the system where they are treated .  Such an information-based scheme now requires a system of continuous data collection, evaluation, feedback, and adjustment of the health care process (Ballou, D. 1985).

A number of quality improvement measures have been attempted in response this development, including the use of health outcomes data in clinical practice, which are objective measures of post-treatment health, based on defined criteria . Commonly, large outcomes databases of similar procedures are maintained and analyzed statistically to provide the patient with an objective prognosis for success or failure (Fennell, 1988). So a patient contemplating heart surgery, for example, can compare himself to other patients with similar characteristics who have undergone the surgery, and obtain a statistical prediction of success with the surgery, as well as chances for survival without it. Outcomes databases are being compiled by surveying patients after surgery on their quality of life, which examines indicators other than medical symptoms, such as mobility or social activity. This information is becoming more accessible to office-based providers for use as reference points or data quality benchmarks in providing day-to-day treatment. 

The availability of rich data sets makes it likely that U.S. medicine will continue to move to data-drive, evidence-based medicine. Sophisticated Electronic Data Interchange (EDI) networks, as well, are being more widely adopted as both automated claims processing systems and financial analysis resources.  Data quality indicators have been developed related to the standardization of patient information and the formatting and submission of data, along with payment standards through electronic funds transfer and communication with providers through electronic mail. Likewise, standards for uniformity of clinical records are actively being pursued . Such systems are designed to create a paperless claims system, and create a comprehensive database of healthcare practices and outcomes involving Medicare recipients. The American National Standards Institute (ANSI), among others, are actively developing system standards for this process (Huh, 1990).

The growing adoption of evidence-based medicine further promotes the use of data-driven decision-making, or the use of current best evidence in making decisions about the care of individual patients. The practice of evidence-based medicine relies on integrating individual clinical experience with the best available external clinical research evidence.

In day to day practice, reliable data quality can enhance the judgement that individual clinicians acquire through training and practice. Increased clinical performance is thus achieved in many ways, but especially in better-informed diagnosis and in more complete identification of treatment options and application of an expanded knowledge based in day-to-day practice.

High quality external treatment data, which applies clinical research in routine practice,  is integrated into patient-centered clinical research, incorporating such concepts as   accuracy and precision of diagnostic tests (including the clinical examination), the power of outcomes-based diagnostic indicators, and the efficacy and safety of therapeutic, rehabilitative, and preventive regimens.

All types of evidence, ultimately, are derived from raw data, and the quality assurance  of such data is a prerequisite for the adoption of evidence in clinical practice.

Data and Methodology

Data from a nationwide survey of healthcare information managers, the AHA Annual Survey, the U.S. census, Interstudy publications, state and regional health service departments, and The Market Statistics Report were used to examine the organizational and environmental characteristics of data quality in a variety of healthcare settings. A comparison of selected data quality characteristics was made across practice settings, geographic areas, and selected healthcare demographic characteristics.

The survey data include measures of the quality of data obtained via automated encoders, the impact of organizational mergers and acquisitions on data quality, and the existence of an organizational data quality committee or team. The survey further identify the existence of a data quality manager, the most common methods used for identifying data quality problems, and the use of edit checks in automated data fields. Finally, the data provide measures of the existence of organizational master data dictionaries, the existence of organizational policies and procedures related to the timeliness of data capture, and the existence of an organizational process to identify and prioritize data acquisition.  Preliminary findings from the survey were summarized for analysis and are reported elsewhere (1999).

Design and Methodology

The survey, completed in May of 1999, was a response to the need of the healthcare industry for more timely and frequent practice information in the field of health information management No comprehensive study of this kind has been available in the past.

The goal of this undertaking was to establish a source of practical, comparative information that can be used in both strategic planning and day-to-day practice. This project was designed to allow ready access to ideas and innovations that other professionals have used in solving management and technology problems common to many organizations.

This study was designed to provide periodic information on a range of information management topics, such as clinical coding, reimbursement, compliance, health record computerization, and data quality. In addition, a portion of the survey was devoted to collecting information about the professional issues facing practicing managers.

Sample Design

The initial survey was initially fielded in June 1998, with follow-up assessments accomplished through May 1999. It was designed to provide representative information on the population of health information managers. The sample included respondents from a variety of practice settings and job titles and excluded students. The survey obtained data from 16,591 health information managers, for a 50.4 percent gross response rate

Samples for surveys were selected from a database of certified health information managers provided by the Foundation for Record Education , and contained current and historical information on RRA/ART credentialed information professionals in the United States.  The data included on the population surveyed were obtained primarily from membership renewal forms and from an annual member profile mailed to all active members. Preferred mailing address data were obtained from the member population from those members with changes in address or professional status. These changes may be signaled by input from periodic mailings or by other correspondence.

Questions for the main questionnaire were developed from a review of past surveys and focus group results. Questions in the survey were designed so that managers from a variety of practice settings and work roles were asked questions that were generally relevant to the profession overall.

Work settings captured in this study included the following:

Ø      Hospitals and medical centers

Ø      Group practices

Ø      Ambulatory care clinics

Ø      Managed care offices

Ø      Long-term care and rehabilitation facilities

Ø      Colleges and universities

Ø      Consulting firms

Ø      Government agencies

Ø      Software product companies

Ø      Pharmaceutical companies

Ø      Self-employed HIM professionals

Ø      Other work settings

Topics were formulated and pre-tested by convened groups of practicing information managers to represent a broad range of activity areas. Questions that had not been used previously in any known surveys of health information professionals were pre-tested prior to the survey fielding to evaluate the wording and ordering of questions and to determine the ability of respondents to provide the desired information.

Field Procedures

Prior to the survey mailing, announcements were made through professional publications and meetings to inform members of the upcoming survey. A preprinted questionnaire was mailed to all credentialed health information managers who had identifiable mailing addresses. An instruction letter accompanied the form and explained the purpose of the survey and instructions for completion. At the six-week point following the last wave of the initial mailing, a second mailing was sent to those who had not responded.  Follow-up on specific issues identified after the second mailing was accomplished as a series of more focused studies, reported elsewhere.

The questionnaires were processed by an independent testing and research firm, National Computer Sytems of Minneapolis, MN, which processed forms weekly over the length of the study. Region-specific response rates were tracked to ensure that the mailings were received in a timely manner.

Strict adherence to confidentiality standards was maintained in this study. Data were entered via a computerized scanning system and released only in aggregate form, without individual respondents identified within the reported results.

In addition, a number of data quality control measures were employed to provide the cleanest possible data. A detailed review was made of all sample response dispositions, both pending and final, on a weekly basis. From this evaluation, the time schedule was reviewed and necessary recommendations were made to the contractor to enable the survey to be completed during the allotted field period.

As part of the post-survey program review, the design and methodology were examined to identify areas needing improvement. After data entry was complete, an evaluation was made of the impact of survey and item non-response rates and various potential methods for adjusting results to correct for non-response.

Respondents in this study were asked:

1.      What is the most common way you identify data quality problems?

·        Errors reported from analysis of aggregate data

·        Data completeness edits when data is transferred from one system to another

·        Errors reported on individual records

·        User complaints caused by unavailable information

·        Data quality edits when data is transferred from one system to another

·        External data audits

·        Internal data quality audits

2.      What percentage of automated data fields captured employ edit checks to improve data quality?

·        0-5%

·        5-10%

·        10-20%

·        20-30%

·        30-40%

·        OVER 40%

Results

The most popular method of identifying data quality problems is the internal audit. Thirty-two percent of respondents report that this is their preferred quality check, followed by examination of errors reported on individual records (22 percent) and errors reported from analysis of aggregate data (21 percent). The least preferred method is the use of auditors from outside the organization (5 percent).

Fig. 1   PREFERRED METHODS FOR IDENTIFYING DATA QUALITY PROBLEMS

 

 

 

 

 

 

 

 

TOTAL

(A)

 

BASE:  THOSE RESPONDING

 

100%

 

 

INTERNAL DATA QUALITY AUDITS

 

32.1

 

 

ERRORS REPORTED ON INDIVIDUAL RECORDS

 

21.9

 

 

ERRORS REPORTED FROM ANALYSIS OF AGGREGATE DATA

 

21.3

 

 

OTHER

 

24.7

 

 

The use of computerized automated edits appears to be a seldom-used data quality technique. About 60 percent of respondents indicated that 10 percent or fewer of their computerized data accuracy measures use computer edits to analyze data quality. About 15 percent of respondents claimed that more than 40 percent of their computerized data is checked this way.

Internal data quality audits are the most common way “other” practice setting respondents (37 percent) identify data quality problems. This finding is significant compared to hospital (31 percent) and clinic (23 percent) setting respondents. Clinic practice setting respondents (38 percent) reported a significant percentage compared to hospital (19 percent) and “other” (23 percent) practice setting respondents in regard to errors reported on individual records as the most common way they identify data quality problems.

Significant variation exists across several key demographic and organizational settings. 

Non-metro area respondents with a <25K population density (32%) are more likely to identify data quality problems by errors reported on individual records than respondents with a 25K-49.9K population density (25%).  Non-metro area respondents in the 25K-49.9K population density category (26%) are more likely to use an “Other” method as the most common way to identify data quality problems than respondents with a <25K population density (21%).

Fig. 2   POPULATION VARIATION IN PREFERRED METHODS FOR IDENTIFYING DATA QUALITY

            PROBLEMS

 

 

 

 

NON-METRO

 

METRO

 

 

 

 

 

 

 

 

TOTAL

(A)

 

< 25K

(B)

25K-

49.9K

(C)

 

<250K

(D)

250K-

1MIL.

(E)

 

 

> 1MIL.

(F)

 

BASE:  THOSE RESPONDING

 

100%

 

100%

 

100%

 

100%

 

100%

 

100%

 

 

INTERNAL DATA QUALITY AUDITS

 

32.1

 

27.2

 

30.1

 

31.2

 

32.4

 

33.8

 

 

ERRORS REPORTED ON INDIVIDUAL RECORDS

 

21.9

 

32.3

C

 

25.4

 

24.5

EF

 

20.5

F

 

17.9

 

 

ERRORS REPORTED FROM ANALYSIS OF AGGREGATE DATA

 

21.3

 

19.4

 

18.6

 

20.2

 

22.4

D

 

22.4

 

 

OTHER

 

24.7

 

21.0

 

25.8

B

 

24.1

 

24.7

 

25.9

 

* letters within cells indicate t-test significance at (p<.05)

 

Respondents in population areas tend to favor aggregated data review for identifying data quality problems. Metro area respondents with a population density of <250K (24%) are significantly different from respondents in the 250K or more population when reporting errors reported on individual records as the most common way to identify data quality problems.  Metro area respondents in the 250K-1 MIL. population category (22%) report a significantly higher percentage than respondents with a <250K population density (20%) for errors reported from analysis of aggregate data as the most common way to identify data quality problems.

High managed care areas favor individual record error tracking for identifying data quality problems. Respondents with 1-10% HMO enrollees (27%) are significantly different compared to respondents with 11% or more HMO enrollees when using errors reported on individual records as the most common way to identify data quality problems.  Using an “Other” method as the most common way to identify data quality problems is significantly different for respondents with 21% or more HMO enrollees compared to respondents with 1-20% HMO enrollees.

Fig. 3   HMO ENROLLMENT VARIATION IN PREFERRED METHODS FOR IDENTIFYING DATA    

            QUALITY PROBLEMS

 

 

 

PERCENT HMO ENROLLEE

 

 

 

 

 

1% -

10%

(G)

11% -

20%

(H)

21% -

30%

(I)

OVER

30%

(J)

 

BASE:  THOSE RESPONDING

 

100%

 

100%

 

100%

 

100%

 

 

INTERNAL DATA QUALITY AUDITS

 

29.4

 

32.8

G

 

32.6

G

 

33.3

G

 

 

ERRORS REPORTED ON INDIVIDUAL RECORDS

 

26.8

HIJ

 

22.0

J

 

19.6

 

18.2

 

 

ERRORS REPORTED FROM ANALYSIS OF AGGREGATE DATA

 

21.1

 

21.9

 

20.5

 

21.6

 

 

OTHER

 

22.8

 

23.3

 

27.3

GH

 

26.9

GH

 

 

Overall, internal data quality audits are the most common way to identify data quality problems for respondents with over 98K hospital inpatient visits (33%), compared to respondents with 1-10.5K hospital inpatient visits (30%).  Respondents with 1-10.5K hospital inpatient visits (27%) report a significantly higher percentage than respondents with 10,501+ hospital inpatient visits, for using errors reported on individual records as the way to identify data quality problems.

High outpatient organzations favor internal audits for quality assessment. Respondents with 10,501+ hospital outpatient visits are more likely to use internal data quality audits as the way to identify data quality problems than the respondents with 1-10,5K hospital outpatient visits (29%).  Errors reported on individual records as the most common way to identify data quality problems is significantly different for respondents with 1-10,5K hospital outpatient visits (29%) compared to respondents with 10,501+ hospital outpatient visits.

Fig.4   PATIENT VISIT VARIATION IN PREFERRED METHODS FOR IDENTIFYING DATA QUALITY

           PROBLEMS

 

 

 

HOSPITAL INPATIENT VISITS

HOSPITAL OUTPATIENT VISITS

 

 

 

 

 

 

1-

10,5K

(K)

10,501

-30K

(L)

30.1K

-98K

(M)

OVER

98K

(N)

1-

10,5K

(O)

10,501

-30K

(P)

30.1K

-98K

(Q)

OVER

98K

(R)

 

BASE:  THOSE RESPONDING

 

100%

 

100%

 

100%

 

100%

 

100%

 

100%

 

100%

 

100%

 

 

INTERNAL DATA QUALITY AUDITS

 

30.2

 

32.1

 

32.8

 

33.4

K

 

28.