Detective work finds board failures
Analysis of board failures requires standard lab tools as well as engineering insight and intuition.
Thomas Paquette, Insight Analytical Labs -- Test & Measurement World, 8/1/2006
![]() |
|
READ OTHER AUGUST ARTICLES: |
At our lab, the analysis of one typical PCA failure followed this sequence: A customer sent us a PCA with a burned area (Figure 1). Some ICs and discrete components had fallen off the board, and the customer sent those, too. We also received the unit's schematic diagram and printed-circuit board (PCB) layouts, along with a known-good PCA for comparison.
![]() |
| Figure 1. A burned PCA shows charring near the J2 label. Note the damage to the component area. The high temperature in this area melted solder and let several components fall off the PCB. |
The analyst also determined where loose components had been attached to the burned PCA. In this case, several components showed severe damage and only a few were missing. By comparing the state and location of the damaged components to equivalent parts on the known-good PCA, the analyst began to conclude excess current caused the failure.
The PCA's burned section provided the overall system with only one direct signal—an output that passed through a connector. But neither the PCB trace nor the connector appeared damaged. So, perhaps a damaging current did not flow on this conductor, although excess current might have led to the failure.
While a technician tested discrete components removed from the burned area (they all passed), the analyst examined possible current paths, one of which included the output signal's CMOS driver. This IC might have experienced a voltage spike that triggered a latch-up event. The chip did not appear damaged, so the analyst removed it from the PCA and checked its DC electrical characteristics, which met specifications. Finally, she decapsulated the IC and examined its die, which showed no damage.
Of the missing components, only one—a tantalum capacitor—seemed like a good suspect. The missing capacitor could have provided a low-resistance path between the PCA's power-supply buses and ground, which would generate excessive heat.
![]() |
| Figure 2. This infrared image shows two hot areas (red and pink) and cooler areas (purple) on a PCA. Analysts use such images to locate abnormally hot areas that may indicate or cause failures. |
In this example, detective work focused on a defective passive component. Other problems, such as the increasing numbers of counterfeit components, the use of fine-pitch ball-grid arrays (BGAs) and buried vias on PCBs, and damage caused by electrostatic discharge (ESD) or electrical overstress (EOS) events, also lead to board failures that require careful analysis in order for their source to be tracked down.
Components wear disguisesBoard failures, such as the one described above, may arise from a problem of increasing proportions—the inclusion of counterfeit parts from distant vendors in the supply chain. Engineers and managers often cite the use of counterfeit ICs as a growing problem that affects the quality and reliability of electronic equipment. But counterfeiters also produce more mundane discrete parts such as resistors and capacitors. The unsuspecting use of such parts, which can cost only a few tenths of a cent each, can cause manufacturers to spend hundreds of thousands of dollars on recalls and corrective actions.
Detection of counterfeit parts may require that analysts involve component manufacturers in their investigations. Recently, after we performed an analysis of a capacitor failure for a customer, the customer returned the defective capacitor to its supposed manufacturer, only to learn he had purchased a counterfeit device. From the exterior, the fake capacitor appeared identical to a legitimate component. An x-ray inspection of the fake's interior showed major differences between its construction and that of legitimate products.
These days, failure analysts who work on PCAs must consider the possibility that one or more counterfeit component exists on a failed board. And although a counterfeit part might not fail outright, it could start a chain of events that causes a catastrophic failure.
![]() |
| Figure 3. The red area in this acoustic-microscopy image shows delamination (separation) of the die and the packaging material. The large black dot indicates carbonized plastic caused by excessive heat. |
![]() |
| Figure 4. An x-ray view taken at an angle through a BGA shows individual solder balls and wire bonds but reveals little about micro-cracks and corrosion on PCB solder pads. |
Individual components do not have to fail to cause a PCA failure. Often, electrical contacts are the culprits. Complex surface-mount ICs often demand the use of BGA packages that offer a dense matrix of solder-bump or solder-ball contacts. We frequently hear of a circuit problem that engineers have temporarily "solved" by pressing down on a BGA or exposing the BGA to a temperature change. To a failure analyst, this type of stress-related defect usually indicates poor solder-ball adhesion between BGA and PCB solder pads.
Additional BGA-related problems include bump-to-bump shorts, micro-cracks in solder, and poor contact coplanarity. To find such defects, PCA analysts need a way to view the small solder balls, sandwiched between the BGA's underside and a PCB. Relying upon their experience with PCAs, analysts may be tempted to use a vision system employing acoustic microscopy, x-ray inspection, or visual inspection to reveal defects, yet each technique has limitations that make it generally unsuitable for examining BGA problems.
Analysts can use acoustic microscopy to "see" delaminations, voids, and damage within a packaged device (Figure 3), but components inside a BGA usually disrupt the sound waves and prevent good imaging of the solder balls. Likewise, acoustic-microscopy images of the solder balls taken through the board from beneath a BGA yield poor results. A PCB's woven fiberglass and the layers of metal and vias, not to mention the components on the PCB's bottom side, attenuate and diffuse the sound waves.
X-ray techniques can produce images of BGA solder balls, but analysts can determine only the shape of the balls, their opaqueness, the presence of voids in the solder, and signs of solder "wetting" (Figure 4). Analysts will not see problems caused by micro-cracks in the solder and by "black pad," a form of corrosion. And analysts can only infer the presence of cold-solder joints from x-ray images.
Visual inspection of BGAs with a fiber-optic camera system lets an inspector see many of the solder balls. But this type of inspection proves ineffective when analysis requires inspection of hundreds of solder balls to find one or two defects.
Generally, electrical testing still provides the best way to identify open solder joints or shorted solder balls as well as opens and shorts within a BGA. Testing also has drawbacks, though, because it requires electrical access to the BGA connections, either through test pads on the PCB's component side or test pads placed on the back of the PCB to duplicate the BGA's solder-ball pattern. This type of access comes at the cost of extra "real estate" on the PCB. So, although analysts may request extra test points, the test points may cost too much to include.
As a last resort, when all other techniques have failed and analysts have no other option, they may use the "dye and pry" technique, which relies on a liquid dye that penetrates into existing micro-cracks or under open solder balls. After analysts let the dye dry, they pry the BGA off its PCB and inspect the solder balls for the presence of the dye and investigate problems the dye reveals (Figure 5). Unfortunately, this method destroys a PCA its owner might have hoped to salvage.
![]() |
| Figure 5. Liquid dye placed between this BGA and the PCB it was soldered to penetrated a space that existed between the center solder ball and its solder pad. The other balls were soldered properly to their respective pads, so no dye penetrated these junctions. |
![]() |
| Figure 6. This cross-section image of a PCB via shows a crack that caused an open circuit. Stress, contamination, or insufficient plating might have caused the crack. |
Contact failures also may originate within a PCB. Complex designs may include 20 or more metal layers, sandwiched deep inside a PCB. That type of construction makes it difficult to connect test probes to some internal conductors. Designs that employ densely packed components with small lead pitches require the use of smaller vias—the contacts created through a layer to connect conductors. PCBs may include buried, or blind, vias (those that do not come to a PCB's surface) as well as high-aspect-ratios vias (those with a narrow diameter for a given layer thickness). The use of smaller and buried vias can cause failures attributed to incomplete plating, cracks, and contamination from materials used in manufacturing (Figure 6).
The inclusion of an increasing number of vias and metal layers seems to make tracing a short circuit exponentially more difficult. To start an analysis of short circuits, analysts require PCB layout files and a schematic diagram. With this information in hand, they can follow a "brute force" approach.
First, analysts identify possible areas in which a short may occur, and they then probe surface points. When possible, they isolate specific conductors. Some analyses require cutting into the PCB to expose internal conductors for probing. Or, analysts may need to cut into a board to sever an internal conductor so they can isolate a circuit. As a last resort, they can saw through a PCB to gain access to conductors along the side of the cut.
ESD events trigger problemsIf the cause of a board failure still remains elusive, analysts look for possible damage within ICs. ESD and EOS events still cause the most IC failures. Analysts can determine why individual devices fail, but analysis of an IC as it relates to the circuits and conductors on a "host" PCA may lead them to the origin of the ESD or EOS event. An analysis of the IC itself can show which pins an overstress event affected. Then, the circuit diagram and PCB layout can help analysts identify the path and source of a damaging voltage.
Such analyses can lead manufacturers to redesign their PCAs and circuits to improve reliability. Analysis of overall IC failures in conjunction with PCA failures also may point to production equipment or to a handling procedure as a cause of ESD or EOS problems.
In some cases, even a known-good IC can cause a failure. Subtle differences in processing can make a component from one IC manufacturer more susceptible to problems in a given environment than a pin-for-pin replacement from a different manufacturer. In other cases, a specific production lot of ICs may cause higher numbers of failures.
Visual inspection, infrared imaging, acoustic microscopy, and other tools can help failure analysts investigate circuit failures and their causes. But failure-analysis detective work also requires good engineering skills and intuition about how circuits and components fail and the damage these failures can cause.
AcknowledgementThanks go to my colleague Chris White for his help with this article.
| Author Information |
| Thomas Paquette is president of Insight Analytical Labs, a company he founded. He has a BSEE from Clarkson University and has worked in the electronics field for over 30 years. He specializes in analysis of PCAs, discrete semiconductors, and IC failures. |



























