A tale of three labs
Labs offer similar services, but they differ in how they serve their customers.
Jon Titus, Contributing Technical Editor -- Test & Measurement World, 3/1/2003
|
The investigation of failures involves taking apart failed devices, tracking down a problem, and establishing the root cause for the failure. At the simplest level, failure-analysis (FA) labs answer two questions, "What went wrong?" and "How do we keep it from happening again?"

Accurel concentrates on failure analysis of semiconductor devices both before and after production. If a chip company decides to offer a new device, it may first modify an older chip to prove the feasibility and operation of the new circuits.
![]() |
|
Lapping, or polishing, a device embedded in plastic (the small hockey puck) using finer and finer abrasives of a flat surface (the red area) lets analysts remove layers of material so they can observe the overall structure while keeping all component parts in place. |
By running tests and making modifications on a known-good test device at an FA lab, the chip vendor should end up with good parts in the first production run. Taking advantage of this circuit-modification capability lets the chip vendor avoid the cost of making iterations of test parts on a fab line.
Sandra Delgado, director of failure analysis at Accurel, says companies often rely on Accurel's lab to find out why devices from a third-party fab appear dead on arrival. Fabs make mistakes, she says, and every so often they might not properly deposit a metal layer or they might forget a processing step. Thus, when first-run parts fail, the lab can determine whether the fab improperly produced devices or if the device had design problems to start with.
Delgado recommends that when it comes to testing failed production devices, engineers should supply good parts along with the failed unit. Labs can measure the differences between the operation of a good device and one that failed. (Even failed parts may function partially.) By comparing good and bad parts, a lab can establish a test strategy that lets it get the most test data from a dead part before destroying it in a test.
Delgado notes that one of the lab's biggest challenges is dealing with people who have only a limited understanding of device physics. These people have a difficult time relating a chip's high-level functions to the actual failed circuits and components. "Unfortunately," says Delgado, "many colleges and universities don't have the capability to show students the relation between the circuits they produce using CAD systems and actual devices on a wafer."
![]() |
|
Embedding, or "potting," a connector in plastic lets an analyst hold it firmly in place and slowly grind down layer by layer through the connector body, while still maintaining the integrity of the overall structure. Observations take place after each removal of material. |
Benchmark Electronics, which offers its lab services only to its own customers, takes a different approach to failure analysis. Dan Gibbs, the failure-analysis lab manager at Benchmark's facility in Winona, MN, sees failure analysis as a valuable adjunct to the company's contract manufacturing business. Gibbs says customers like having an in-house lab that will analyze manufacturing and reliability problems and recommend ways to overcome them. Because Benchmark buys large quantities of components, the company can get attention from suppliers when quality problems arise—often the kind of attention a small buyer could only dream of.
In addition to offering advice to customers, the Benchmark lab gives advice to suppliers, too. Recently, the company discovered a solderability problem with a module it purchased for use in a customer's product. The lab contacted the module supplier, who in turn contacted its component vendors to get the problem solved.
Gibbs says planning takes center stage in failure analysis because customers often can supply only one failed part. Analysts must carefully plan how they'll process the part. Some steps destructively test parts, so once these tests are completed, it's too late to say, "I wish we also had run tests x and y."
Testing follows logical steps. First, a customer must describe the failure mode, which should include the history of the PCB or part and a description of when it failed, such as after power-up at a customer's site, after thermal cycling, or in an unknown environment.
If a customer knows the site of a failure on a product, all the better, but if not, the lab will locate it. During analysis and testing, the lab determines the failure mechanism and, if possible, the root cause of the failure.
When the lab finds a root cause, it may recommend ways to avoid future problems, such as using a different component, placing a component elsewhere on a PCB, selecting a part with a higher working voltage, or using a different material.
A final report from the Benchmark FA lab includes background, conclusions, and observations, and Gibbs stresses that his reports include many images that clearly label materials, components, surfaces, and other features identified during the analysis. The report also provides complete procedures so an analyst can run the same tests and analyses should a similar problem occur later on.
Some labs do both in-house failure analysis and provide services for outside customers, too. The Reliability Analysis Laboratory (RAL) at Raytheon spends most of its time working with internal customers but spends as much as 20% of its time working with outside customers. (T&MW reported in detail on Raytheon's lab in 1998 (Ref. 1).)
RAL Manager Nicki Girouard explains that the lab tends to do a mix of mechanical and electrical analyses, and it treats a failure as a system problem, not just a "spot problem" with one component. If a motor fails, for example, the lab will try to determine not only what mechanism caused the motor to fail, but also the root cause of the failure and its impact on the user. Recommendations could include a risk/safety assessment and a plan for corrective action to prevent future problems.
Because Raytheon takes a system approach to failure analysis, analysts may have to travel to equipment and manufacturing locations to examine failures and the environments the failed devices were exposed to. Often, they can solve problems, particularly those that involve nondestructive testing, on site. Other types of analysis—usually those that involve chemical tests or large instruments—require transporting a sample back to the lab for testing.
Girouard says getting the lab involved early in a design helps identify potential problems that can affect a production schedule or the cost of a product. She adds that the more information the designers can provide about a failed device and the circumstances surrounding the failure, the better.
Standard information from a customer includes electrical parameters, pin descriptions, and so on. Information about the conditions under which a device failed, including environmental conditions such as temperature and humidity, also help identify or locate a failure. A customer should determine when a device failed and the conditions under which it failed and should also note whether the device failed during production testing, in the field, or somewhere else. (See "Failure-analysis reports .")
Raytheon's lab recently revised its business systems to give customers faster service over the Internet. The lab's Web-based services manage the entire flow of information to and from customers, including quotations for customers and delivery of test results, photos, and recommendations. Customers can track the lab's work through a Web browser.
Initial communications between a customer and the lab can involve many people. At a larger company, the RAL staff may initially work with engineers and technical managers as well as with a purchasing manager. Often, nontechnical people aren't familiar with failure analysis or the need for a lab's services, so a sale may require more communications to "educate" the person who signs a purchase order. At a small company, though, the engineer who discovered a problem probably serves as the primary contact and may authorize the purchase of lab services. Whenever possible, the communication of results takes place directly between a failure analyst and the engineer who requested testing and analysis.
The military and aerospace customers the RAL works with tend to use proven technology and suppliers and adopt new technologies more cautiously than consumer-product companies. Thus, the lab learns about new technologies long before they get used in designs. This "lag time" gives the lab opportunities to teach its customers about coming technologies, and the lab may involve its staff in the design steps that lead to a new product. The RAL also has experience qualifying vendors and can help customers develop qualification strategies of their own.
When a customer approaches the lab as a product design gets underway, the lab can help determine what parts of the design might cause problems. By acting on the lab's suggestions early in a design, a manufacturer can avoid problems when a product gets into use. Girouard says people must understand that in the same way design represents a process, so does failure analysis.
| Companies mentioned in this article | ||
| ASM International Materials Park, OH 877-983-3327 www.asm-intl.org |
Accurel Sunnyvale, CA 408-737-3892 www.accurel.com |
Benchmark Electronics Winona, MN www.bench.com |
| Electronic Device Failure Analysis Society ASM International Materials Park, OH 877-983-3327 www.edfas.org |
Raytheon Reliability Analysis Laboratory Lexington, MA 800-725-4787 www.reliabilityanalysislab.com |
|
| Author Information |
| Jon Titus, formerly the editorial director at T&MW, has written real-time software and designed embedded systems and computer/instrument interfaces. He worked in electronics for 10 years and spent nine years at EDN magazine prior to joining T&MW in 1993. He has a BS from WPI, an MS from RPI, and a PhD from VPI. |
| Reference |
|
|



















