Quality by negotiation
Engineers at Mercury Computer Systems work together to find the optimal balance among test, signal integrity, thermal analysis, mechanical fit, and component selection.
By Martin Rowe, Senior Technical Editor -- Test & Measurement World, 11/1/2008 2:00:00 AM
CHELMSFORD, MA—For the past 25 years, engineers at Mercury Computer Systems have designed processor boards and systems for military applications, but yesterday’s design and test methods no longer work. Higher functionality, increased heat, more sensitive signals, shorter design cycles, higher test coverage, and tighter budgets have all changed how the engineers perform their jobs.
To produce high-reliability products that must run for years in harsh environments, the various engineers at Mercury push for design details that often compete with each other. Test engineers, signal-integrity engineers, and mechanical engineers often negotiate with each other and with circuit and PCB (printed-circuit board) designers early in a design. The result: fewer board iterations, shorter design cycles, and more reliable products. By implementing new procedures, engineers have reduced the number of engineering change orders by a factor of 10.
![]() Figure 1. Mercury Computer Systems has implemented a design-for-reliability program in steps. Today, most of theses steps occur in parallel early in product design. |
In 2005, Mercury embarked on a DfR (design-for reliability) project for gathering input from all engineering departments and giving it to designers early in a design cycle. The program stemmed from the fact that many component suppliers were moving to lead-free processes, so Mercury needed to overhaul its entire design, analysis, and test practices. In addition, devices such as microprocessors and FPGAs (field-programmable gate arrays) were running faster while their size was shrinking, which increased heat density and threatened reliability.
Darryl McKenney, director of engineering services, was put in charge of the program. McKenney set out not only to give information to designers earlier, but also to automate the process. He started with component selection, the first part of the DfR project staircase that the company implemented. Since then, the company has implemented programs for all the functions and departments listed in Figure 1.
“The component turn from lead to lead free has been incredible,” McKenney exclaimed. “It’s the biggest change in electronics manufacturing since the move from through-hole to surface-mount components.” At Mercury, the change to lead-free products affected more than 37,000 components that the company uses in its boards.
Evaluating new components
Because Mercury’s boards are often used in military systems, which have long lives, many of the company’s customers want designs that will remain unchanged for as long as 20 years. The move to lead-free components, however, has made it impossible for Mercury to fulfill this request. The company must now develop new designs with new components.
![]() Darryl McMenney, director of engineering services at Mercury Computer Systems. Photo by Mark Wilson. |
When developing a design, Mercury engineers needed a method for calculating each component’s effect on a PCB’s reliability so they could eliminate any unacceptable lead-free parts from manufacturing. Gene Bridgers, principal reliability engineer, developed software that lets him calculate the MTBF (mean time between failures) of a board based on its components.
“We calculate MTBF using MIL-HDBK-217F1 and Telcordia SR332 standards,” Bridgers said. “We make that data available to design engineers so they can calculate how a new component will affect MTBF. We advise designers about risky parts and provide ratings on those parts.”
McKenney added, “If an engineer tries to select a component that lowers MTBF, we’ll catch it. When we have a reliable part, we try to reuse it.”
Mercury has also implemented a risk-rating system for components that all engineers can use to maximize MTBF. Table 1 lists examples of low-, medium-, and high-risk parts. Components such as processors pose a high reliability risk because their designs are more likely to change than are the designs of resistors. When a semiconductor manufacturer announces a revision or discontinuation of a part, that decision can have a profound effect on Mercury’s products.
For example, a processor or memory IC manufacturer may move to a smaller die size to reduce costs while at the same time making the part run faster. Both of those factors can increase the heat a device produces, making it less reliable unless the system provides additional cooling. A faster part can also result in timing errors.
Design changes that result from company buyouts can also affect a part’s reliability. When a producer is bought by another company, the new owners may try to cut costs, often by moving manufacturing overseas. “Products made overseas may not be as reliable as those made in US,” noted Bridgers.
Get the heat out
Managing a board’s thermal characteristics is perhaps the most challenging part of producing reliable PCBs. Finding the best placement for high-heat components such as processors and FPGAs is critical for keeping the devices cool.
As signal speed increases and die sizes shrink, ICs produce more heat per unit area (W/cm). “The thermal energy produced by a board has more than doubled in the last five years,” said McKenney. He pointed out that the thermal specification for a PMC (PCI Mezzanine Card) that mounts on a processor board is 7.5 W. (PMC modules add processors or I/O to a mother board.) “We’re designing PMCs that produce from 25 W to 30 W today.” A full-height (6U) processor card produces 190 W of heat.
To combat the heat, mechanical engineering services manager Mike Gust and a team of mechanical engineers simulate the thermal characteristics of a board as soon as they know which components it will hold. They perform their analysis (Figure 2) while electronic designers develop schematics. Using Flotherm software from Flomerics, engineers including mechanical engineer Don Blanchet look for the best placement of components from a thermal perspective. In addition to performing heat simulations, Blanchet simulates air speed across a board. He can relay his findings to circuit designers and PCB designers before board layout begins.
![]() Figure 2. Simulation software lets engineers analyze a PCB for thermal characteristics and thus influence a PCB layout. Courtesy of Mercury Computer Systems. |
To test the thermal simulations, Blanchet and others will create mockups of boards using the components slated for the final design. They can then measure temperature using thermocouples and Fluke dataloggers. They also use infrared cameras to create a heat profile of the board. “We place thermocouples on processors, FPGAs, and memory devices,” said Gust. “Anything with a heat sink gets a measurement.”
When testing a design for thermal characteristics, Mercury engineers need powered devices. They also need software to exercise the parts. Software engineers create diagnostic routines that run the parts during a test. “Software routines should be a realistic representation of the worst-case heat that a device will likely produce in the field,” commented Gust. “But there’s no point in programming an FPGA to toggle all of its gates at once. That will never happen in the field.”
Clean signals
Mercury’s mechanical engineers can find the best component placement from a heat perspective, but they don’t always get the placements they want. Signal integrity is also a factor in where a component will reside on a board. Signal-integrity engineer Paul Wade sees to that. Under the DfR program, Wade can influence where PCB designers place parts. Mercury’s boards carry high-speed serial-bus signals such as RapidIO and PCIe (PCI Express).
![]() Manufacturing engineering manager Tom Orser uses optical and x-ray inspection to catch manufacturing defects before test engineers run board tests. Photo by Mark Wilson. |
Just as mechanical engineers simulate a board from a heat perspective, Wade simulates signals, looking for the best design. Using SIWave from Ansoft along with Spice simulations, Wade has automated the process. “Signal-integrity analysis used to occur when a board was 90% to 95% complete,” he said. Now, Wade makes his case to the PCB designers at the start. He no longer spends most of this time fighting fires as he once did.
Because of the DfR program, tradeoffs between Wade and the thermal engineers happens early in a design. Wade tries to keep PCB traces as short as possible to minimize signal loss and jitter, but he often negotiates with the mechanical engineers. “People always ask if a signal trace can be 6-in. longer than I like,” he said. “It’s a constant tradeoff.”
Wade has developed design rules to help PCB designers minimize signal-integrity problems. He has to explain that whenever data rates increase, the techniques that used to work will now cause problems. For example, he must tell PCB designers about the effects that vias through a board will have on signal integrity. At high frequencies, vias add to signal loss and add reflections to signals.
In 2009, Mercury’s engineers will be using serial buses that run at 5 Gbps to 6 Gbps, about double the current speeds. That will have a profound effect on signal integrity and trace length.
As data rates increase, PCB materials absorb more energy. A PCB trace acts like a low-pass filter, limiting how much energy reaches a signal’s destination. “We used to use PCB materials with loss-tangent specifications of 0.01 to 0.03,” said McKenney. “Now, we need loss tangents of 0.008 to 0.009 because of the higher frequencies.”
Like the mechanical engineers, Wade uses a mockup of a board design to verify his signal analysis. At the current data rates, he can analyze signals with a 6-GHz oscilloscope, but the higher data rates will require more bandwidth—at least 12 GHz. He also needs an oscilloscope with a 20-Gsamples/s acquisition rate to sufficiently capture the signals.
![]() Mechanical engineering services manager Mike Gust and his team perform thermal simulations on boards and systems. They then measure temperature on processors, FPGAs, and memory devices. Photo by Mark Wilson. |
Wade uses a vector network analyzer to characterize signal traces on PCBs, measuring signal loss versus trace length. “The schematic doesn’t tell the story,” he said. “Schematics don’t indicate trace lengths, nor do they show vias on a board.” Both of these can cause voltage drops in high-speed signals. Furthermore, Wade noted that the spacing between traces has shrunk. At one time, all traces had 5-mil (0.005-in.) spacing, but that has dropped to 3.5 mil. The smaller spacing breeds more crosstalk between signals, which can increase bit errors.
His simulations also account for changes in temperature, not just because of device heating, but because some boards will operate in hot and cold environments. A signal’s eye diagram will change depending on temperature. As temperature drops, signal edges get faster—overshoot and undershoot becomes more pronounced. Resistors increase their values with temperature, so Wade may need to ask for changes to the resistor values in a design.
Wade’s work includes analyzing a board’s power integrity. In the past, Mercury’s boards used 12-V and 5-V power supplies, but today those boards have eight voltages ranging from 12 V to 0.9 V. At the low voltages, there’s little margin for voltage dips without causing system failures. Thus, Wade must ensure that a board has enough properly placed bypass capacitors to minimize dips.
Having multiple switching power supplies and high-speed serial buses on a board opens the opportunity for switching signals to create bit errors. Wade must look at the interaction between power supplies and jitter that can cause bit errors.
The power supplies switch at around 600 kHz, yet they can cause errors on 3-Gbps serial data streams. To see those errors, Wade needs a deep-memory oscilloscope to capture a few milliseconds of data at a 20-Gsamples/s rate.
McKenney said that design-automation tools such as those Mercury uses for thermal and signal simulation have improved dramatically over the last several years. Computer speed helps too. Simulations that used to take days now take hours. The simulations let engineers run “what if” scenarios before committing a board to a schematic.
Test’s voice
At Mercury, test engineers also have a say in a PCB design because they need test points. Serial-port access to a board is also crucial for board configuration and diagnostics. Test engineering supervisor Jim Ternullo explained that test engineers work with circuit designers and PCB designers regarding the placement of test points. They review CAD files and schematics as well. They also work with diagnostic software engineers to get software to run functional tests. “We make sure that the PCB designers provide easy access to a board’s serial ports,” said Ternullo.
Figure 3 shows the test flow that the engineers use when testing the Mercury boards. The boards first go through automated optical inspection and automated x-ray inspection, which, according to manufacturing engineering manager Tom Orser, catch most manufacturing errors before any tests.
![]() Figure 3. Test engineers have developed a test flow that produces structural test (orange boxes) and functional tests on processor boards. |
Because of the company’s low volume and high mix of boards, test engineers such as Roy Thompson use boundary scan to test as much of a board as possible. Thompson uses tools from Asset Intertech to develop and run boundary-scan tests. “All I need is a netlist from the PCB designer to start developing a boundary-scan test,” he noted.
Thompson reuses test code wherever possible. The code starts as a software model of the board, which typically takes about two weeks to develop. But he noted that a model for DDR2 (double data rate) memory has taken as long as two months to develop, because the tests will operate the memory device outside of its operating specifications. Thompson runs boundary-scan tests at 8 MHz, far slower than normal operating speeds.
![]() Test engineering supervisor Jim Ternullo and his team perform boundary-scan, flying probe, in-circuit, and functional tests on all new designs. They also provide feedback to circuit and PCB designers. Photo by Mark Wilson. |
In places boundary scan won’t cover, Mercury uses flying-probe testers and, in a few instances, in-circuit testers. The engineers use flying probes to test for shorts and opens. A flying-probe tester also measures component values such as capacitors and resistors with no power applied to the assembled board.
Although few boards get ICT (in-circuit test), Mercury engineers design them for that test anyway. “By the time we find out about a board’s production volume, it’s too late to design for in-circuit test so we have to design it in up front,” said Ternullo. For boards that follow the ICT path, boundary-scan is performed as part of the test.
Flying-probe tests are easier to develop than in-circuit tests and they don’t require an expensive test fixture, but they are much slower. A flying-probe test must check every node on a board, and a board can have as many as 2000 nodes. Each board can take as long as 45 min to test. But, as Ternullo pointed out, “Unless you have a $150,000 board and you build 20 or more a year, it doesn’t pay to use in-circuit test.” The cost of a test fixture often makes ICT cost prohibitive.
Either or both of these test methods are required because boundary scan can’t test for everything on a board. Thompson said that 90% of Mercury’s boards have at least 50% coverage with boundary scan. Under the DfR program, any new board with less than 90% coverage must get McKenney’s approval to go into production.
Test engineers, signal-integrity engineers, mechanical engineers, and reliability engineers all have the means to influence circuit and PCB designers at Mercury Computer Systems. The company’s DfR process provides a communication path and a system of checks and balances for making sure that each board is optimized for reliability, which encompasses thermal performance, signal integrity, and ease of test.
Table 1.
Risk classifications for typical components.
| Risk classification | Types of components |
| Low | Capacitors |
| Inductors | |
| LEDs | |
| Resistors | |
| Standard hardware | |
| Medium | ADCs |
| DRAMs and SRAMs | |
| Oscillators | |
| PLD ICs | |
| Standard cables | |
| High | ASICs |
| Fiber-optic transceivers | |
| Large memory devices | |
| Microprocessors | |
| Unique hardware |
No related content found.
- 0 rated items found.
Datasheets.com Electronic Parts & Inventory Search
185 million searchable parts
- Part Number
- Description
- Inventory
- Products
- Manufacturers






























