Journal of Knowledge Management Practice, May 2004
 

Design Error Classification and Knowledge Management

Lawrence P. Chao, Kosuke Ishii, Stanford University

ABSTRACT:

This paper discusses the development a failure knowledge management system for error-proofing the design process.  From benchmarking several historical design failures in order to devise a simple and useful classification system, the effort revealed that design errors can be effectively analyzed from a development task standpoint.  Design process errors can be decomposed into knowledge, analysis, communication, execution, change, and organizational factors.  An industry-based survey of failure knowledge information systems complements this research with recommendations in implementing failure databases.  This paper discusses the failure events, error analysis and classifications, and the authors’ proposed work to address the existing lack of a systematic approach to design process error-proofing, including FMEA methodologies.


1.         Introduction

The Titanic, Pentium processor, and Firestone tires are very different products in terms of size, purpose, and materials.  However, they shared the dubious distinction of being very costly, public failures.  The organizations that designed these products encountered design errors like most.  Presumably, they have performed some risk mitigations to try to prevent further incidents from occurring.  Some corrective actions were sufficient, but some were not.  The key is finding a systematic approach to deal with errors. 

Because organizations often lack time to fully analyze their system to prevent errors, they often make lower level, reactive, and highly specific (low leveragability) corrections.  The goal of design process error-proofing (Chao et al., 2001) is to develop methods and strategies to predict and prevent design errors that result in quality losses.  This research can aid organizations in reaching a higher level of problem solving by developing a design error classification system, based upon historical and empirical data, and discuss applications for implementing a failure knowledge system.  A design process error taxonomy can aid identification, prediction, and prevention of errors on a higher level.  This can save time and money in the long run and prevent repeating the same corrective actions over and over again.  With the decreases in costs of computer memory and storage and increases in speed, ease, and interconnectivity of file access, storing data is easy.  Though knowledge management and data mining techniques exist, putting information in a context where engineers are excited about using and benefit from it is the difficulty.

This research aims to develop a classification system that can aid the documentation of failures and their root-causes and help identify suitable error-proofs against future incidents.  By having a common classification system, it can be easier for engineers, designers, and managers to identify, store, and share information vulnerable areas in their design process. 

Once a classification is in place, one can also transition the system to methodologies that allow design teams to quickly and easily identify or check problematic areas.  The long-term goal of this research is to develop an array of tools and methodologies to aid the prediction and prevention of design process errors.  The error-classification system can impact design for manufacturability tools like Failure Modes and Effects Analysis for the design process and the development of both failure and error-proofing solution knowledge databases.

2.         Error Classifications

Researchers have taken different approaches at cataloging and classifying errors.  Most have focused on the operator errors that can result in catastrophic accidents or those in manufacturing that can affect product quality.  This research takes the viewpoint that errors are the starting points of defects and failures.

Dr. Martin Hinckley, a reknowned expert on quality and reliability in design, has gathered methods for reducing complexity, variation, confusion, and other root causes of defects (1997).  He explored various classifications of mistakes and mistake-proofs, summarized in Table 1, and also proposed a taxonomy for mistake-proofing, shown in Table 2.

Table 1.                     Error classification systems (Hinckley, 1997)

 

Name or Type

Authors

Example classifications

Human Reliability Assessments / Human Error Probabilities Classifications

Swain

Error of omission, error of commission (selection errors, sequence errors, time errors, and qualitative errors)

Performance Shaping Factors (PSF)

Swain, Meister

Inadequate lighting in the work area, inadequate training or skill, Poor verbal communication

Ergonomic Method

Alexander

Errors during the perception stage, errors during the decision-making process, errors during the action process

Psychological Classifications

Norman, Salvendy

Slips in the formation of intention, from faulty activation of schemas, from faulty triggering of active schemas

Stress Based Classifications

Altman

Work load, occupational change, problems of occupational frustration, occupational stress like noise, lighting

Task Based Classification

Dhillon

Design error, operator error, fabrication error, maintenance error

Classification of Human Performance in Industry

Harris and Chaney

Planning, designing and developing, producing, distributing

Behavior Based Classification

Berliner

Perceptual processes, mediational processes, communication processes, motor processes

Mistake-Proofing Classification

Nikkan Kogyo Shimbum

Forgetful ness, errors due to misunderstanding, errors in identification, errors made by amateurs

 

Hinckley’s criticisms of some of the above classifications include:

      Too simplistic

      Not easily understood by individuals in manufacturing and design

      The classified errors can not be detected

      Elimination of the factor does not eliminate mistakes

      Does not lead to direct identification of appropriate control methods

Hinckley (2001) developed a classification system based on the description of the outcome of the mistake rather than the root cause.  His motivation for classifying this way is that it can be more easily used in “poka-yoke” devices that detect or prevent undesired outcomes, rather than controlling root causes.

Table 2.                     Outcome-based classification (Hinckley, 2001)

 

Defective Material

Defective or inadequate material entering

Information Errors

Ambiguous information

Incorrect information

Misread, mismeasure, misinterpret

Omitted information

Inadequate warning

Misalignments, Misadjustments, Mistiming

Misaligned parts

Misadjustment

Mistimed or rushed

Omission or Commission Errors

Added material or part

Commit prohibited actions

Omitted operations

Omitted parts and counting errors

Selection Errors

Wrong concept or material

Wrong destination

Wrong location

Wrong operation

Wrong part

Wrong orientation

 

James Reason researched errors from a psychological standpoint (1997).  In addition to looking at different types of errors, such as variable versus constant, he focuses on human errors from the standpoint of their intentions, actions and consequences.  Figure 2 shows how mistakes, slips, and lapses are human errors that occur at different times in a task.

 

Figure 2.                   Types of human error (Reason, 1997)

 

Similarly, Donald Norman (1988) uses his background in cognitive science to discuss errors resulting from similarity in actions.  The six types of slips are shown in Table 3.

Table 3.                     Types of slips (Norman, 1988)

 

Type of Slip

Definition

Capture errors

Appears whenever two different action sequences have their initial stages in common, with one sequence being unfamiliar and the other being well-practiced

Description errors

Intended action has much in common with others that are possible, usually result in performing the correct action on the wrong object

Data-driven errors

Automatic actions triggered by the arrival of sensory data

Associative Activation errors

Internal thoughts and associations that trigger automatic actions

Loss-of-activation errors

Forgetting to do something because the presumed activation mechanism has decayed

Mode errors

Devices that have different modes of operation and the action appropriate for one mode has different meanings in other modes

 

Yet another way of classifying errors is by the end-state or the failure resultant from the error, or the immediate cause of the failure.  For example, in civil engineering, Blockley looks at causes of structural failures, shown in Table 4.  (Reason, 1990)

 

Table 4.                     Causes of structural failure (Reason, 1990)

 

Limit

states

Overload

Geophysical, dead, wind, earthquake; manmade, imposed

Understrength

Structural, materials instability

Movement

Foundation settlement, creep, shrinkage, etc.

Deterioration

Cracking, fatigue, corrosion, erosion, etc.

Random hazards

Fires

 

Floods

 

Explosions

Accidental, sabotage

Earthquake

 

Vehicle impact

 

Human-based errors

Design error

Mistake, understanding of structural behavior

Construction error

Mistake, bad practice, poor communications

 

4.         Goals

What are the attributes of that we wish to establish in our error classification scheme?  In his paper on an error-proofing taxonomy (1997), Hinckley writes an “ideal mistake-proofing taxonomy” should have the following attributes:

      Collectively exhaustive

      Mutually exclusive

      Simple

      Useful

      Intuitive

      Easily understood

      Common attributes

A classification scheme for design process errors must also be collectively exhaustive.  And certainly, ease of use and understanding are important, but unfortunately they can be negatively correlated and trade-offs must be made.  But, there are several differences between the design process and other processes, like manufacturing and operation, which must be considered.  Design process errors are near the front of the cause-effect chain.  Though there are likely numerous factors that contribute to an erring action, they are often the fundamental system-level as well as simple human errors. 

It is difficult to fully comprehend and anticipate all the outcomes that may result from making a particular error.  One error can lead to any number of outcomes, and one outcome can be the result of various errors.  There is no one-to-one mapping of It is difficult to fully comprehend and anticipate all the outcomes that may result from making a particular error.  One error can lead to any number of outcomes, and one outcome can be the result of various errors.  There is no one-to-one mapping of error and outcome.  error and outcome.  Though this overlap can be beneficial in that the information can be reused and repeated, mapping the relationships can get messy and cluttered.  For this reason, it is difficult to build a mutually exclusive classification.  Certainly from an ease standpoint, it would be preferred if every failure had only one cause.  But often this is not the case.  Often, multiple root causes contribute to a failure.   

Not all errors escape to the field nor do they always have such disastrous consequences.  Though design process errors are often associated with the defects that escape and affect the end-users directly as failures, as in these historical cases, not all design errors lead to disastrous consequences.  Some defects stay internal as rework or redesign loops.  Design errors occur early in this cause-effect chain and to apply effect error-proofing principles to them, the devices must prevent them from occurring or detect them immediately after occurring.  For that reason, we want a design error classification system that is not based on outcomes but the causes.

Table 5.                     Design defects

 

Defect in

Examples

Consequences

Feature

Error left in system, product does not operate as specified, is not as robust or reliable

Project does not meet specifications, product does not behave as expected

Cost

Error is found, but costs money to remove it, redesign it, repair it

Money diverted from other necessary areas, project goes over-budget

Time

Error is found, but takes time to correct it

Project delays, product does not make it to market in time

 

In the Stanford Design for Manufacturability course ME317, the project priority matrix is a tool used to aid product definition.  (Ishii, 2004) Design process errors can hinder any of these three project priorities of feature, cost, and time-to-market.  Even if these design defects seem minor, they should be eliminated.  They can harm an organization through monetary losses, wasted time, or delivery of a substandard product.  Rework and redesign efforts are often accepted as part of the design process but they should be treated as process defects to be removed.  Large design-related failures can be seen as a subset of these defects.

5.         Historical Error Benchmarking

5.1.      Approach

Stanford University has engaged in much work with organizations in various industries, gathering data on observed and predicted errors for the design process.  But due to the competitive environment of industry, often times both the incident and the costs associated with them are unavailable to be shared publicly.  For those reasons, exploring famous historical errors is preferred as it allows discussion and demonstration of the impact of design process errors.  From researching texts and the Internet as well as talking to experts, a brief explanation of each design failure, associated losses, and underlying design errors were found.

5.2.      Historical Errors

Mars Climate Orbiter

The Mars Climate Orbiter was launched December 11, 1998 to relay signals from the Mars Polar Lander.  The craft vanished after it fired a rocket on September 23, 1999 to enter orbit.  The engine fired properly but the spacecraft was along a course that brought it too close to the planet's surface and deep into the atmosphere.  The probe came within 60 km (36 miles) of the planet - about 100 km closer than planned and about 25 km below the level at which the spacecraft could survive. Lockheed Martin, the primary contractor for the project, had measured the thruster firings in pounds of force even though NASA had specified metric units.  This unit error sent the Climate Orbiter in too quickly and too low, causing the $125-million spacecraft to burn up or break apart in Mars' atmosphere.  The root cause review board found that the error went undetected in ground-based computers. Also, the mission navigation's team had an imperfect understanding of how the craft was pointed in space, was overworked, and not closely supervised by independent experts..

Ford Explorer/Firestone Tires

The National Highway Traffic Safety Administration has documented 203 deaths and more than 700 injuries linked to tread separation rollover accidents involving Firestone tires and Ford Explorers during the 1990’s.  Each company blamed the other, and the fallout led them to sever their century-old business relationship.  A "root cause" investigation by Bridgestone/Firestone found four major contributing factors: problems with the tread design that caused tires to be prone to cracking, problems with one plant’s unique process of adherence of rubber to the belts, Ford’s under-recommendation of only 26 psi, and improper maintenance and inflation of the tire by drivers which can cause overheating.

Intel Pentium

The Pentium floating-point flaw was discovered by a mathematician in October 1994.  This bug, affecting the 6 million chips sold in 1994, involved the Pentium incorrectly and inconsistently performing floating-point calculations with certain number combinations.  The bug appeared to have been caused by 5 missing entries in a look-up table implemented in a programmable logic array.  The result is the loss of precision in certain floating-point calculations.  Intel initially denied the problem, saying the Pentium used the same FPU design as the 486, which works correctly.  Intel had discovered the error but declined to recall the chip, saying it would cause a rounding error only once every nine billion times and that the average PC user would be affected only once every 27,000 years.  IBM, however, published its own estimates, saying that some users might encounter the flaw every 24 days.  As an aside, several companies came forth with software patches to cure the problem.  Microsoft posted a patch for its Excel spreadsheet, only to withdraw it within the week because the patch contained its own errors.

Hyatt Regency Kansas City

The Hyatt Regency in Kansas City, Missouri opened in July of 1980 after two years of design and two years of construction. About one year later, on July 15, 1981, the grandiose atrium was filled with more than 1600 people, most dancing to music.  On the walkways were two groups of people on the second and fourth-floor walkways, observing the festivities and stomping in rhythm with the music.  The two walkways fell together, killing 114 and injuring over 200.  Many assumed that the accident was caused by resonance from the people stomping in rhythm.  In the end, it turned out that a design change by the contractor was responsible for this accident.  The walkways system not only was under-designed but also lacked redundancy.  In the shop drawings, the box beam hangers had two holes through both flanges, whereas the original design had only one hole with the hanger rod extending through the box beam.  Unfortunately, in this design, the load of both walkways was transmitted to the roof trusses by the shorter upper rods, thus in the design the fourth-floor transverse box beams supported the loads of two walkways rather than the one, as in the original design.  This change was approved by the architects and reviewed by the structural engineers.

American Airlines Flight 191

The crash of Flight 191 at O'Hare Airport on May 25, 1979 was the worst aviation disaster in U.S. history.  Two hundred seventy one people were aboard the plane, which carried a full load of fuel.  Immediately after takeoff, the number one engine on the port side of the plane fell off.  About 30 seconds after takeoff, at an altitude of about 400 feet, the DC-10 rolled and hit the ground a half mile from the end of the runway and exploded, killing all on board and two people on the ground.  The official NTSB report on the crash stated that one of the main causes of the accident was the vulnerability of the design of the attach points to maintenance damage.  In addition, there were deficiencies in FAA surveillance and reporting systems to prevent improper maintenance procedures as well as miscommunications among operators, manufacturers, and the FAA.

Kemper Arena

The Kemper Memorial Arena was a 17,000-seat covered arena built in 1973 for the Kansas City Kings, the local basketball team.  The arena was very popular and was honored in 1976 with a prestigious award by the American Institute of Architects.  However, on June 4, 1979, a downpour of over 4 inches of water per hour began falling in the area accompanied by a north wind.  The central portion of the hanging roof, about one acre in area, rapidly collapsed and, acting like a piston, raised the interior pressure in the hall so much that some of the walls of the arena were blown out.  The Kemper Arena collapse was caused by a multitude of factors, including intensity of rain downpour, drain deficiencies, wind effects, fatigue of bolts, and lack of redundancy.  During the six years preceding the failure, the hangar assembly connections were subjected to at least twenty-four thousand oscillations, which introduced oscillating variations in the tension of the bolts.  Steel codes warned against the use of these bolts under variable loads.  The variable loads were probably not taken into account, and the safety of the bolts appeared to be sufficiently high under these unrealistic constant design loads.

Hartford Civic Center

The Hartford Civic Center was constructed in 1972 for the NHL’s Hartford Whalers and the University of Connecticut.  In order to save money, the engineers proposed an innovative design for the 300 by 360 ft. space frame roof over the arena.  The completed center was considered an engineering marvel.  However, on January 17, 1978, the roof of the Hartford Arena collapsed after several hours of rain and snow, merely hours after five thousand fans watched a UConn basketball game.  The collapse resulted from the buckling of the roof truss.  The progressive failure of load bars resulted in the transfer of loads to new bars until the roof collapsed.  The Hartford Arena contract was divided into five subcontracts.  This fragmentation allowed mistakes to slip through.  The excessive deflections apparent during construction were brought to the engineer's attention multiple times by inspection agencies and concerned citizens but ignored and not checked again.  The observed deflections were twice that predicted by the computer analysis, but the engineers expressed no concern saying that discrepancies were expected due to simplifying assumptions made in theoretical calculations.

Teton Dam

The Grand Teton Dam was a $4.5 million dam built to hold back river water in the Teton River Canyon of Idaho for the purpose of irrigation.  During its construction, on July 5, 1976, the western third of the dam disintegrated.  An estimated 80 billion gallons of water rushed out and headed for the Upper Snake River Valley.  The failure of the Teton Dam resulted in the loss of 11 lives, 100,000 acres of farmland, 16,000 head of livestock, and up to a billion dollars in property damage.  Fortunately, there was enough time for evacuation of some areas or casualties might have been even higher.  It was the first failure of a dam built by a federal agency.  The construction of the dam was filled with numerous ignored warnings.  As construction began in 1972, the U.S. Geological Survey became concerned about the dam because of recent earthquakes.  The Bureau of Reclamation warned that the dam might not be safe because the rocks in the area were full of fissures.  In addition, there were miscommunications about filling the reservoir and the rate was increased to four times normal.  The runoffs from the winter snows were higher than expected.  Leaks were discovered but dismissed as normal until they ultimately led to the breach of the dam.

Ford Pinto

The Ford Pinto was designed to directly compete with the influx of Japanese cars to satisfy the rising American demand for subcompact cars.  Because of the strong competition, Ford rushed the Pinto into production, compressing the development time from three-and-a-half years to two.  Between 1971 and 1978, the Ford Pinto was responsible for a number of fire-related deaths.  Ford puts the figure at 23; critics say the figure is closer to 500.  In 1978, Ford recalled all 1971-1976 Pintos for fuel-tank modifications.  In addition, many lawsuits resulted from the accidents, including a record-breaking $128 million sum on February of 1978 by a California jury.  Before production, Ford engineers discovered a major flaw in the car’s design.  Because the machinery was already tooled when the defect was found and cost-benefit analysis found repairing the defect to be more expensive, Ford officials decided to manufacture the car regardless of the flaw.  According to sworn testimony of Ford engineers, 95 percent of the fatalities would have survived if Ford had located the fuel tank over the axle as it had done on its Capri automobiles.

Chevrolet Corvair

The Chevrolet Corvair came out of Detroit in the 1960’s with a very innovative design and style.  Initially popular, things changed after 1965 when Ralph Nader’s book, Unsafe At Any Speed, appeared, attacking safety practices of the auto industry.  He singled out the Corvair for a particularly scathing attack, saying, among other things, that its swing-axle suspension used on '60 to '64 models caused the rear wheels to "tuck under." Nader alleged that this caused the car to flip over during even relatively low-speed cornering.  Others have concurred and said that it was a case of cost-cutting winning out over intelligent engineering.  The designers knew that anti-sway bars would be needed to support the added weight of the rear-mounted engine.  But to save $4 per car, those bars were not included in the final product.  By the time that the '65 redesign fixed the problems, it was too late for the Corvair’s reputation.  On top of that, a study by the National Highway Traffic Safety Administration — the agency Nader's book helped create — concluded, in July 1972, that the '60-'63 Chevrolet Corvair models were at least as safe as comparable models of the time.

de Havilland Comet

The de Havilland Comet was one of the first commercial aircraft driven by jet propulsion.  The four engine, 550 mph jet arrived in 1952 and was a big step into unknown territory.  Never before had a commercial aircraft been designed for such a high cruise speed at such altitudes.  Nearly every component of the aircraft was designed within de Havilland.  Though initially successful, the fortunes of the Comet turned after two fatal crashes.  The Comet was grounded and during this time, Boeing and Douglas were able to design and manufacture the 707 and DC-8.  Though it was three years until a jet aircraft from another company took off, the confidence of airlines and passengers in the Comet had disappeared.  The Comet designers had not properly taken into account the possibility of stress concentration at the window corners and the consequent danger of fatigue, exacerbated by the unavoidable microscopic imperfections of the metal, such as pinholes or cracks.  de Havilland had tested samples of the corner panel, but their tests did not accurately represent actual cabin conditions since the complete windows were not fitted and the smooth operation of a testing machine did not compare to the turbulent behavior of a plane.

Tacoma Narrows Bridge

The Tacoma Narrows Bridge gained notoriety as “Galloping Gertie” for its severe vertical motions in moderate winds.  It collapsed on November 7, 1940 when oscillations became large enough to snap a support cable at mid-span, producing an unbalanced loading condition.  The bridge was designed to accept some horizontal displacement under static pressure of much larger winds but was not designed to handle the dynamic instability caused by an interaction of the winds and the flexibility of the light and narrow bridge.  Modeling this type of fluid/structure interaction was not difficult and was within the technical capability of engineers at the time but was evidently not considered.  The possibility of failure of the Tacoma Narrows Bridge in a crosswind of forty or so miles per hour was completely unforeseen by its designers.

Titanic

At 882.5 feet long with a gross tonnage of over 46,000 tons and a construction cost of $7.5 million, the H.M.S. Titanic was thought to be the fastest ship afloat and almost unsinkable.  On her maiden voyage from Southampton to New York, on the night of April 14, 1912, the liner struck an iceburg which caused the plates to buckle and burst at the seams, producing several small ruptures in up to six of the first compartments.  Two hours later, the ship sunk into the sea with the loss of 1,490 lives.  712 people were rescued alive.  The Titanic’s design included a double-bottomed hull divided into 16 watertight compartments.  Because at least four could be flooded without endangering the liner's buoyancy, it was considered unsinkable.  Unfortunately, these compartments weren’t sealed off at the top, so water could fill each compartment, tilt the ship, and then spill over the top into the next one.  The designers failed to correctly model the interaction of uncertainty in the environment with the dynamics of the system.  The purely static view ignored the interaction with the iceberg and the water flow between the compartments and could not predict the actual disaster.  Other issues about the design and operation of the boat include why there were not more lifeboats and why the lifeboats were not full.

6.         Analysis And Applications

6.1.      Key Factors

The design process can be interpreted as a number of tasks in series and parallel.  Depicted in Figure 3, each task requires certain inputs and produces certain outputs.  The task is a transfer function of sorts which requires certain agents, possibly people, machines, or other resources, and can be subject to certain noises, such as environmental variations, uncertainty in information, and changes from other agents. 

 

Figure 3.                   Generic task

 

Figure 4 shows trends in the causes of design errors.  From studying these case studies, errors in the design process can be viewed by where they are committed during a task.  These case studies show examples of failure at different times and types of tasks with various root causes.  With these ideals and the identified key factors in mind, a design error classification was developed.  This proposed system is based on the key factors of tasks identified.  The case studies reviewed represent a wide range, not only in terms of industry but also in type of error.  By categorizing them as such, the scope of the scheme can be evaluated.  For greater detail and differentiation between errors, subcategories are identified for each key factor. 

Figure 4.                   Trends in the cases analyzed

 

The results of the case studies showed errors in several key areas in the design process.  Depicted in Figure 5, in any task, the agents involved must perform an analysis of the situation to determine what must be done.  The agents must communicate the requirements to begin the task and the completed work once the task has been executed.  Design tasks require knowledge by the agents to perform the task.  At all times, however, the agents and information are subject to change from other areas in the organizations, as well as noises or uncertainties.

 

Figure 5.                   Key factors mapped on to a design task

 

6.2.      Knowledge

An assumption that some might have about design errors is that they are occur primarily in “new” projects, where the designers have not gained the requisite experience or familiarity to anticipate or understand what may go wrong.  For this reason, knowledge management has become prevalent in industry today.  None of the cases had disruptive technologies used in unfamiliar applications.  However, the case studies show that design errors occur for products of all maturity levels, ranging from standard and familiar designs, to newer versions of existing designs, to new designs using innovative ideas.

6.3.      Analysis

Often, failures are a result of not fully or properly appreciating the situation.  The designers failed to predict or under-predicted the extent to which external inputs would affect the system.  There were also often errors in the analysis of the internal system; the designers did not understand the behavior of the subcomponents or the interactions between different parts on a system-level.

6.3.      Communication

Most communication errors involve the incomplete or inaccurate delivery of information to the agents of the task in order to complete them, or the delivery of information by the agents of the task when they are completed.  Communication errors can include ignored warnings and incomplete context.

6.4.      Execution

Execution errors include those where all the information and instructions received and used were accurate and appropriate, yet for some reason, the agents did not execute the task properly or comprehensively.  Often, they are simple human mistakes.  But these errors can be made in different ways – some times a task is not finished; others, it is done wrong.

6.5.      Change

Changes can result in errors where the agents perform the work correctly, but changes from other parts such as management or manufacture altered the design from the designers’ intent.  Some times, the designers may have had full control and even approved the design change without fully appreciating the consequences.

6.6.      Organizational

Though a task-based analysis can be very effective in identifying areas where errors can be made, this data shows it is insufficient to just compartmentalize the design process and analyzes separate tasks individually.  An analysis of each task separately cannot capture large-scale, system level problems.  In certain cases, the lack of a strong central, managing agent resulted in carelessness in work, poor communications, and other errors.  In others, the lack of organization-wide standards allowed or even caused mistakes to be made.

6.7.      Indexing

Our study of manufacturing and assembly poka-yoke identified attributes that are similar in design which can be leveraged. Errors, whether they occur in the manufacturing, design, or operational domains, occur for similar reasons.  Mental, perception, and communication errors, for example, can occur in any domain.  The error commonality index (ECI) quantifies decomposition of errors into fundamental causes (error factors) to more easily compare and relate errors from different domains and even find analogous error-proofs.   Table 6 summarizes and illustrates this breakdown for famous historical errors.

Table 6.                     Classification of historical failures

 

 

Though the idea of design process error-proofing is far from universal, there are many tools already which aid the effort.  Often times, tools originally designed to simplify or speed-up the process benefit error-proofing.  Through this system, one can leverage experience from other domains, particularly manufacturing and assembly error-proofing, into the design domain.  Current search parameters are often limited to text or keywords.  This development relies heavily on commonalities in the nature of error, and the categorization of them.

To relate the error factors of the elements quantitatively, category score values (sc) from 0, 1, 3, or 9 are assigned like in QFD, where 9 indicates a high agreement with the category.  Matches are found by calculating the average difference in each of the n categories and matching two errors with the highest ECI, shown in Equation (1).

                      (1)

The ECI can vary between 0 and 1; an ECI of 1 would indicate an exact match in every category.  Though using a quantitative index like this would benefit the differentiation and search of the catalog, assigning the numerical values to each error and category can seem subjective or arbitrary.

Though using a quantitative index like this would benefit the differentiation and search of the catalog, assigning the numerical values to each error and category can seem subjective or arbitrary.  The search can be further refined by using customer-weighted indexing rather than using equal weights, shown next.  Represented in Figure 6, Chao and Ishii (2003a) have shown how Design Error QFD correlates design errors to the customer through engineering metrics. 

Figure 6.                   Using QFD to rank design errors

 

Not only can QFD be used to prioritize design errors, this weighting of different design errors on the organization can also customize the error commonality index based on the utility of the different factors.  For the customer-weighted error commonality index, shown in Equation 2, the different error factors should be mapped against the customer requirements to determine the weight or utility of each factor. 

              (2)

To extend a failure knowledge management system to a design process error-proofing solution system, where an organization can track different tools and solution elements to prevent errors or enhance projects, we recommend adding two dimensions to the design error classification system.  Figure 7 shows the definitions of solution levels on the left and robustness levels on the right.

Figure 7.                   Error-proofing levels: solution and robustness

 

Current error-proofs and error mitigation techniques have also been catalogued using this taxonomy to help identify the best strategies for various design errors.  Using the classification scheme and index as proposed facilitates the connection between domains.  In addition to the error factors, the solutions should be tracked for what solution level the solution tries to solve the problems and the robustness level of the solution.  Higher levels are generally preferred for both.

This system can be implemented as the framework for an error/failure or an error-proofing database.  Some preliminary work has already been done in creating searchable, online databases.  In addition to cataloguing design errors, the system has been used to not only catalog and track design process error-proofing solutions but also to relate manufacturing poka-yoke to the design domain.

6.8.      Other Applications

As mentioned earlier, these classifications can build sets of questions to help users identify possible errors in their design process.  Shown in Table 7, Design Process FMEA (Chao and Ishii 2003b) is a task-based method which uses the failure modes and effects analysis approach to systematically understand the risks inherent in each step of the product development process, by analyzing each task for design errors in the six error factors identified. 

 

Table 7.                     Design Process FMEA table headings

 

 

Performing the analysis can help design teams improve both individual projects and identify higher-level system issues at the organization.  Categories like these would be useful not only in performing a design process Failure Modes and Effects Analysis but also for creating checklists to verify the process. 

7.         Survey Of Failure Databases

Given this error classification system, to better learn how to implement an information system for design errors, it was next of interest to see how failure information currently is and should be stored, we used a simple questionnaire to survey the design process, common errors, and error-proofing techniques in industry.  The survey results indicate a discrete discontinuity between organizations that had and used failure databases with those which simply had no database at all.  Of the twenty-some companies contacted, about two-thirds responded.  Only half of those who responded were familiar with any sort of tracking of engineering failures.  Figure 8 shows a breakdown of the common methods.  The other respondents reported a lack of familiarity with any sort of failure tracking at their organization, though some reported that there was some work towards that goal.  Computerized failure databases were the most common way of tracking failures.

Figure 8.                   Failure tracking methods

 

The organizations rated themselves in five different areas related to the storage and use of failure knowledge, the sharing of knowledge, the benefit observed, and so on.  The organization also rated the degree of agreement to each statement from “strongly agree” to “strongly disagree,” or “not applicable.”  The quantification of the level of agreement can range from a maximum of 2 to a minimum of –2.  Figure 9 shows the overall averages and standard deviations. 

 

Figure 9.          Comprehensive survey results plotted

 

These results should not be taken as a snapshot of industry given the small sample size and also that most organizations that had no such database in use did not respond.    As Table 8 shows, most respondents agreed to the statements.   The companies reviewed show an overall average of 0.9, slightly less than “agree,” for the five statements. 

Table 8.                     Reports of statistics for the surveys

Some organizations reported no systematic activity with transferring information, usually done by a face-to-face basis.  The average scores for all six categories are positive, even though there are some negative scores for all the categories except “The failure tool benefited the organization.”  Failure database usage was on average greater than knowledge management usage.  The highest scores are on “tracking failure” and “benefited the organization,” with the lowest marks on storage of knowledge.

At most organizations, all or most of the work in creating the database was done internally.  At some, the databases were created and maintained by a combination of both internal and contracted personnel.  For example, at one company, internal personnel usually were more responsible for the design and layout of the database while the contractors were more responsible for the programming and implementation of the database.  Usually, higher level support was done internally while lower level support was done by contractors.  Current systems seem to be largely Intranet based using a web interface.  The databases are usually open to every employee within the company who is part of the process.  Often, suppliers and partners involved in the process also have access.

There is usually limited oversight of the validity of data entered and no discipline in filling out the data consistently and accurately, which limits the usefulness of the data.  Ways to make the data more consistent include using question-based, binary selections rather than unregulated blanks which can accept any form of information - numerical, text, or other.

These databases are usually works in progress.  In addition to the constant stream of new information, changes to improve the usability and format of the database are continually made.  Updates and requests are done every few months to improve and correct the database.

The respondents indicated the failure databases were open to members throughout the organization.  They viewed the database as beneficial but saw many difficulties in implementation and made recommendations for future systems, including:

      After determining the useful fields, consider the behavior of the people entering the data and really ask if you’ll get what you want

      Keep it simple – what critical few pieces of information needed

      Have routines or guidelines for use of the data

      If you are going to create a field to capture, make sure you know how it will be used in the analysis

Every group that used failure databases believed that the system benefited the organization.  The key is to not be mesmerized by the availability of cheap storage or fast access and thereby attempt to store anything and everything.  Guards to verify the validity of the information are important.  If the database is useful and accessible, then the useful knowledge that can be shared can grow.

8.         Conclusions

This paper explored historical design error case studies and identified key attributes of design process errors.  These benchmarked cases are not just interesting stories but also illustrate that simple mistakes can result in huge losses, in both dollars and lives lost, shown in Table 9.  The actions necessary to prevent such catastrophic failures would only have cost a portion of the loss.

Table 9.                     Costs of several historical failures

 

Product

Date

Costs

Dead

NASA Mars Climate Orbiter

1999

$134.5 million for launch and operating, $125 million orbiter

0

Ford Explorer / Firestone Tires

1991-2000

6.5 million recalled tires; lawsuits including a $7.5 million injury settlement and $41.5 million settlement with states

>203

Intel Pentium

1994

~6 million chips sold in 1994

?

Hyatt Regency Kansas City

1981

$140 million in judgments and settlements, $3 billion in lawsuits filed; more than 90% of claims were settled

114

AA Flight 191 DC-10

1979

$100 million in lawsuits estimated, worst aviation disaster in history

273

Kemper Arena

1979

$23.2 million for construction

0

Hartford Civic Center

1978

Stadium capacity of 10,000 seats unavailable for two years

0

Grand Teton Dam

1976

$4,575,000 in design and construction, $1 billion in property damage, 100,000 acres of farmland and 16,000 herd of livestock lost

11

Ford Pinto

1971-1976

Lawsuits including a $128 million settlement; 1.5 million recalled cars in 1978

>500

Chevrolet Corvair

1960-1963

Lawsuits including a $2.4 million settlement; production ended in 1969; 235,000 sales in 1965 to only 6,000 in 1969

?

de Havilland Comet

1953-1954

21 planes built between 1952-1954, 7 crashed; 2 with passengers

>56

Tacoma Narrows Bridge

1940

$6.6 million for construction

0

Titanic

1912

$7.5 million for construction

1491

 

Generally, design errors fall into six areas, shown in Table 10.  This first level decomposition of design errors aids in enabling error-proofing efforts of the design process.  Framing classifications and questions around these areas can aid the identification of design process errors.  The use of these cases demonstrates and confirms the scope of these categories.

Table 10.                  Categories of key design  error factors

 

Category

Description

Knowledge

Inexperience or misunderstanding of the system

Analysis

Inaccurate assessment of the system

Communication

Mistransfer or misinterpretation of information

Execution

Improper or inaccurate implementation

Change

Unanticipated variation or modification

Organizational

System or managerial deficiencies

 

Framing classifications and questions around these areas can aid the identification of design process errors.  The use of these cases demonstrates and confirms the scope of these categories.  The design process FMEA questions can help organizations discover problematic areas in their development processes, and the categories help organize problems in order to mitigate them.

With these tools, organizations analyzing their design process have guidance on what to look for and improve.  This research will continue to be refined, but work must also be done by each organization to customize their analysis appropriately.  The most crucial step in error-proofing design is taking an active approach in improving on a system-level. 

A problem with using any categorization is that there is no guarantee that the users will correctly identify the category.  They may not know or refer to the definitions of the various types.  Users can easily misidentify the level, making any user-inputted database populated with inaccurate data and useless.

The design process contains a range of different tasks and operations but often involves iterative tasks and operation. Thus, the application of quality strategies to design is much more challenging than to manufacturing processes. By establishing an error-proofing culture and implementing an effective information system to support the practice, an organization can leverage its know-how to eliminate serious field failures and quality problems.  One must collect, store, and utilize both “positive” knowledge such as best practices and “negative” experiences such as failure modes.  Because of the vast range of errors and error-proofs, it is important to have different dimensions and factors on which to compare and prioritize the solution elements available for error-proofing. 

9.         References

Chao, L.P., Beiter, K., and Ishii, K., 2001, "Design Process Error-Proofing: International Industry Survey and Research Roadmap," Proceedings of the ASME Design Engineering Technical Conference: DFM, September 2001, Pittsburgh, PA.

Chao, L.P., and Ishii, K., 2003a, “Design Process Error-Proofing: Development of Automated Error-Proofing Information Systems,” Proceedings of the ASME Design Engineering Technical Conference:  DAC, Chicago, IL.

Chao, L.P., and Ishii, K., 2003b, “Design Process Error-Proofing: Failure Modes and Effects Analysis of the Design Process,” Proceedings of the ASME Design Engineering Technical Conference:  DFM, Chicago, IL.

Hinckley, C.M., 1997, “A Mistake Proofing Taxonomy.”  Assured Quality memorandum, Manteca, CA.

Hinckley, C.M., 2001, Make No Mistake, Productivity Press, Portland, OR.

Ishii, K. (ed.), 2004, ME317 Course Reader, Stanford University, Stanford, CA.

Norman, D.A, 1988, The Design of Everyday Things, Doubleday/Currency, New York. 

Reason, J., 1990, Human Error, Cambridge University Press, Cambridge, England.

Reason, J., 1997, Managing the Risks of Organizational Accidents, Ashgate Publishing Ltd., Aldershot, England.


Acknowledgements

The authors would like to thank General Electric for providing motivation and funding for this research.  Thanks to Gene Wiggs, Mike Cimini, Kevin Otto, Kenji Iino, and others for input on the cases.


Contact the Authors:

Lawrence P. Chao, Research Assistant / Ph.D. Candidate, Stanford University, Department of Mechanical Engineering
Design Division, Terman 551; Phone: +001 (650) 723-7340;
Fax: +001 (650) 723-7349; E-Mail: lpchao@stanford.edu

Kosuke Ishii, Professor, Stanford University, Department of Mechanical Engineering Design Division, Terman 509; Phone: +001 (650) 725-1840; Fax: +001 (650) 723-7349; E-Mail: ishii@stanford.edu