“We have cracked insulators in an alternator bridge.  We thought we detected them all at the end of the line but the small number we fail to catch can result in a thermal incident in a vehicle.”

This wasn’t the first time I had heard this annoying distortion of language.  A thermal incident is a fire, but fire is on the list of words the legal department banned. And there were…fires. An engineer needs to think and express oneself with precision.  Distorted language is confusing and lacks urgency. When we talk about defining a problem, let’s do it properly.

It is not uncommon for people to think, or perhaps hope, they can catch defects at the end of a line.  Maybe it is a function of no alternative; we have an alternative at The New Science of Fixing Things. If you are using inspection to find products which are at risk of failure, then you are selling enough of them to make your life, and your customers lives, miserable and perhaps dangerous.

At The New Science of Fixing Things, we know that when a small numbers of the products you sell suffer catastrophic failure, we must assume each one you make been exposed to the same set of physics and each is at risk!  The ones you sell are not just OK!

The New Science of Fixing Things has developed and used Functional Determinism for fifteen years to rapidly solve tough performance and reliability problems for our clients, solving such problems around the world in short order.   Functional Determinism is our way of decomposing the behavior of machines based in how energy and power flow to create the functions you design, and unwanted functions that damage and destroy your products when energy and power “leak.”

A factory is not a casino where the chance of a win is based on probability. Everything that happens is deterministic (based on physics) and best discovered through the principles of Functional Determinism (How Stuff Really Works.) Speed and precision matter when your products are catching on fire!

Every assembly has been exposed the physics of failure.  Whether or not the assembly fails, is not probabilistic, but rather based on the distribution of power which created the destructive event versus the desired effect.  This makes it easier to figure out What’s Happening.  The evidence is available, if you know how to look.

It was easy to see what was wrong if you find a part that had been identified as a failure before it burns up. At the end of the line, the assemblies were checked with a multi-meter for continuity across the heat sink and the mounting frame (frame not shown.)  If the insulator was cracked and the heat sink contacted the frame, the resistance was low, and there was a short circuit. A short requires that insulator was cracked AND the heat sink contacting the base. If the insulator was cracked but not in contact with the base, the multi-meter would not find it, but it certainly is not safe to sell!

When I arrived, the team thought they had the answer to, “What’s wrong?” They proposed a fix, which required a redesigned insulator.  They had learned a statistical test in a seminar that provided for ranking a number of different insulator designs from best to worse, based on the percentage of failures.  The test required a large sample size since percentage of failure was the response, a dead giveaway that the physics of failure were not being considered.  This was not the first time I had seen problem solvers reach into quality tool box that was being taught and promoted as a panacea. Without figuring out “What’s Happening?,” probabilistic tools are not the answer.  The thing is to figure out what response will let us see “What’s Happening.”

The insulator is supposed to prevent current flow, while passing heat. The proposed new insulators had to be balanced with the fact that the bridge would run hotter.  The multi-meter finds some of the cracked bridges, but not all. The bolt-screw assembly provides the torque to crack the bridge.  All of this is known. Why can’t these smart folks find the last piece?  I know why. The model they were using was wrong.  They did not know how to change the strategy from Symptomatic to Functional Determinism, the path for tough problems.

[1] Diagnosing Performance and Reliability The New Science of Fixing Things, David J. Hartshorne, 2019

The engineering manager was a good guy and went on to run the entire division.  He was smart and calm. We were of like minds.  Our nature was to think in terms of a causal explanation, not a root cause.

He took less than an hour explaining the problem (it rarely takes more.) David Hartshorne and I were developing a better model for solving such problems, but it was in the discussion and white board stage at that point.  We had not yet constrained it to fundamental principles which we teach and demonstrate in our workshops, but even then we knew we needed to force a product or process to reveal its nature; to see what’s happening.

An automotive ignition coil, when charged from a 12 volts supply, releases about 40,000 volts. It takes about 30,000 volts to jump a 100mm gap.  A spark plug gap is about 1mm. The insulator was a few millimeters thick.  I had a functional system of measurement in mind that should do quite nicely!

Think, for a moment, about the function of the multi-meter; to test for continuity of a circuit with about 3 volts, providing a feedback in the form of a beep or resistance reading in ohms.  It will only find the most severe of failures.  Often a team finds a measurement system that might figure out “What’s Wrong,” (though not in  this case) but won’t help to see “What’s Happening.”

The idea was not just to find cracked insulators, but to create them and know as soon as they crack. Given that 40,000 volts should jump the gap as soon as it cracks, all we needed was create cracks in a meaningful way.

I was off to the auto parts store for a coil, battery and a Fluke hi-voltage low current test probe, leaving my teammates to find a torque wrench and a bit of wire. With 40,000 volts we should be able to figure this out in a hurry!

The test was simple.  The idea was to tighten the nut with the torque wrench, monitoring torque and angular displacement in 45° increments.

We took a couple of assemblies from the end of the line, and torqued the nuts down.  I was surprised when the bolts twisted off on the first, second and third ones, but there was no indication that the coil had fired. Was something wrong with the test? I got that feeling in my stomach that I had missed something. When we took the assemblies apart, the insulators were intact.  I had a think about what we had seen.  Twisting the bolts off, without damaging the insulator is just what should have happened. Twisting off the bolt protected the insulator. Whoever designed this thing had done a good job.

We went back out to the line and found a couple of assemblies the multi-meter found as failures.  I checked the torque, and found them to be close to the spec. We then took each apart, replaced the insulator and torqued them up using the new bench test. As we torqued the nuts, I saw that the angular displacement was more than the previous three.  Then…zzzzzap!  We really didn’t need the test probe!  You could hear the spark as it jumped the gap.  We took it apart and the insulator was cracked!  How fun!  It reminded me of the excitement I felt when I was 14 years old making pipe bombs with Ed.  (We only blew up safe things…mostly.)

All that was left was the nut-bolt assembly. Since it was easy, we built two more and swapped the nuts.

Everything we needed to know lived in the difference between the nuts! What was the difference?

The rest was simple.

The nuts were supplied in boxes of hundreds each, with a lot number on the box.  The nuts were poured into a vibrating bowl feeder which fed and aligned the nuts with the bolts as the fixtured bridge assembly passed by.  Then an air operated torque wrench ran down the nuts to the specified torque.[2] 

We took a few nuts from the bowl, tested them in our new rig, and the bolts twisted off, the proper failure, keeping the insulator safe.

The casual explanation was forming. When torqued, the effort was split, some going to store potential energy in the bolt thus seat and hold the heat sink for the bridge against the insulator, while allowing heat transfer from the heat sink, through the insulator to the base and out to the atmosphere.

[2] Next article will be about the risks of such air operated machines.

The friction in nuts that cracked the insulator was low.  If the friction was low, the torque was not dissipated through frictional losses while torqueing, but stored as excess potential energy in the bolt, cracking the insulator, many of which were not detected at the end-of-line test. 

The figure above shows the results.  We always plot an Effort variable (torque) against Flow (angular velocity) or Displacement.  As stated earlier, the function of the bolt is to store potential energy, thus torque vs. displacement.  The low friction nuts reach a level of displacement that is dangerous to the insulator, causing a crack.  This shows the danger of a torque spec without a corresponding displacement specification.

We were able to discover the physics of failure with a proper knowledge of how energy and power flows, and to use it to decompose the behavior, discovering “What’s Happening.”

All that was left, was to find out why the friction was different in the nuts.

I went to the warehouse, pulled a few nuts from each of the several boxes, each marked with a lot number, and tested them on the rig.  From two boxes of several, a nut cracked the insulator.  The next clue made the job easy.  If one nut from the box cracked the insulator, so did all the rest.  If one nut from the box caused the bolt to twist off, so did all the rest. The lot number was key to figuring this out.  Who put the lot number on the box?

Well, I was excited and pleased.  We removed all the nuts that would create failures, cleaned out the nuts from the hopper on the vibrating bowl (once poured, nuts were mixed and traceability was lost) and refilled the bowl.  I confidently told the plant manager he could remove the inspection from the end of the line. He decided to leave it in place for a few days with no failures.  I shrugged.

When I came in the next day, there was a single failure.  I had no explanation why, and could not leave without one!  I was more than disappointed.  I had to figure it out. I knew the Causal Explanation was right, but doubt was running rampant through the plant!  What could have happened?

I went out to the line and watched.  After a bit of time, an assembly came from under the vibrating bowl feeder without a nut. The line attendant at the next station reached into a drawer, pulled out a nut and manually torqued it down.  Was that it?

I took a couple of nuts from the drawer where she kept them, went back to the test rig.  Low and behold, the attendant had a small box full of the bad nuts!  We removed and replaced the nuts in the drawer, got out a vacuum cleaner and a broom, and gave the entire area a good going over.

The plant manager left the end of line test in place for a couple of days without a single failure, then suspended the test.

Once we understood the physics of failure, you really didn’t need to test the nuts with the test rig.  You could place a nut on a threaded rod and give it a spin.  If it whirred down the threaded rod, it would crack the insulator.  If you needed two fingers with a light touch to turn the nut down the threaded rod, it was a good one, and would twist off the bolt before the insulator cracked. As for the nut, it was either the threading operation or the plating.  With this measurement system, a supplier quality engineer went to the plater, because he stamped the boxes with the lot number, and found the plating was a bit thicker in the good batches, and was because the bad batches were pulled from the plating tank earlier than the good ones.

This project took 3-4 days from start to finish. Not all are that fast, but with poor strategy, large sample sizes and data sets, and probabilistic decomposition, projects become studies, and are never fast.

Lessons Learned

  • When a small percentage products fail in the field, more are at risk.  There were certainly assemblies in the field with at-risk nuts.
  • There are always assemblies and parts that were exposed to the some level of energy as the failures, so discovering “What is Happening” is not as difficult as you might think, as long as can see the energy distribution with the Source-Load-Impedance Model.
  • The measurement system needs to be designed with the Source-Load-Impedance Model in mind.
John Allen 
January, 2021                     
Naples, Florida
John.allen@tnsft.com

The New Science of Fixing Things is located in North America (+1 603 969 0563) and Europe (+44 797 072 0437) and available to help you solve your chronic quality, product and process performance and reliability problems.

Diagnosing Performance and Reliability written by David J. Hartshorne, The New Science of Fixing Things, is the most powerful description of effective problem solving ever written with an excellent section on Small Multiples and multi-vari with examples and graphics, available at www.tnsft-bookstore.com


[1] Diagnosing Performance and Reliability The New Science of Fixing Things, David J. Hartshorne, 2019

[2] Next article will be about the risks of such air operated machines.