Machine Learning Is Here Now to Make the Inspector's Life Easier

Machine learning is used to improve the efficiency and reliability of train axle inspections on site at the Helsinki subway train depot.

Revolutions happen in two ways: gradually, then suddenly. So it has been with artificial intelligence (AI) and machine learning (ML). For years the media reported that AI will change the world, but we did not see much of this firsthand. Now, suddenly, we all use AI every day. It helps us to use our phones and finds movies for us to see. It shapes the digital world we see around us, although not always for the better.

However, AI is not yet helping inspectors in their daily work. So, when will AI and ML be ready to help us in the NDE community to do our jobs? And how can we make sure that the change is for the better?

Opportunities for AI

Automation has been part of NDE for a long time, and various mechanized systems are used across the industry. For simpler inspections and clear signals above noise threshold, the data analysis can be automated with traditional means. However, for many interesting inspections, the data is complex and the evaluation criteria do not lend themselves to direct automation. The human inspectors can look at things such as signal dynamics and separate a flaw signal from other spurious indications, but encoding this into a set of rules for a traditional automation system is not feasible. Thus, most of the data is still analyzed by human inspectors.

A lot of an inspector’s time is spent on manual operations to better visualize the data to find the flaws that might hide in the noise. In many inspections, flaws are rare, and most of this time is spent on looking at data where there’s nothing to be found. This is tedious work. Missing a single flaw could be devastating, and thus the work requires constant focus and vigilance. Paradoxically, the work is both tedious and stressful. If the data with no defects are screened automatically by an ML program, and the inspectors are required to see only a subset of data that might have higher probability of defects, it would make the inspectors more efficient and the inspections more repeatable, reliable, and efficient. However, missing a single flaw is still devastating, and we need to make sure the ML is as good as the human inspector before it can really help us get the job done.

In many inspections, flaws are rare, and most of this time is spent on looking at data where there’s nothing to be found. This is tedious work.

In principle, ML excels in automating tasks which can also be applied to analyzing inspection data where the decision criteria are unavailable or too complex to be effectively encoded for traditional automation. Given enough data, a sufficiently sophisticated ML model can “learn” the internal model necessary to reach the correct conclusion. So naturally, people tried using ML for NDE quite early.

The first success stories for ML in NDE date back more than 30 years: a handful of important features was hand-picked from the raw NDE data. These were chosen to contain the most significant information for the evaluation. The features were optimized for each application and contained things like signal amplitude, envelope width, and frequency content for ultrasonic A-scans. The set of features were then fed into an ML model, such as a neural network. Multiple authors reported good results with such systems using various NDT techniques; see for example Lee (1993) or Aldrin et al. (2001) for notable examples. This approach is now called “shallow learning.”

Despite these early successes, this approach did not revolutionize the industry. As it turns out, extracting the significant features from the raw NDE data is not always straightforward or even possible. They also need to be hand-tuned for each inspection procedure or even developed with the procedure. Thus, this approach does not transfer easily even to quite similar inspections. Finally, using just a handful of extracted features makes it difficult to benefit from the ever-richer data sets available with modern NDE equipment. In this way the approach goes against the tide of NDE development.

The problem of extracting features to take advantage of ML is not limited to NDE. The same kind of issues were met within different fields of AI—for example, in image classification and facial recognition. Thus, the area was under active development throughout the field. Since the 2010s people started training deeper and deeper networks with the intent to avoid hand-picking the features. Combining multiple advances in ML techniques and computational capacity, it became possible to feed raw data (such as pictures) to neural networks and have these networks learn to extract the features as well as the final decision criteria. This is known as “deep learning” and it powers many of the applications we use today.

The development of deep learning is also significant from an NDE point of view. The deep learning models can take full advantage of the rich data sets NDE instruments provide and work better with more data. In the last three years, deep learning models have been applied to ever richer NDE data. In ultrasonics, they were first applied to raw A-scans, then to B-scans, and now to multichannel phased array data for complex noisy inspections, which are very difficult even for the human inspectors. Most methods with digital data can benefit—digital radiography, visual testing with digital cameras, eddy current, and so on. The current state-of-the-art models have shown validated human-level performance consistently across different techniques. The current ML models are finally good enough to be used for NDE.

But there is a catch. The richer and more complex models give better performance, but they also require extensive real training data to get good results. It takes in the order of hundreds of thousands of samples to train these models. And the data needs to be balanced; we need to have both flaws and non-flaws roughly equally. For most NDE procedures, this is simply infeasible. The simplified EDM-notches or simulated results we use to train humans are not sufficient to teach the ML what a real crack might look like. Thus, one more breakthrough was needed to make these models generally available for a wide range of inspections: the eFlaw.

Developing and Utilizing eFlaw

The key insight with the eFlaw is that flaw detection and characterization is not just about the flaw, it’s also about the background. The same flaw signal is detected in a noise-free environment, but easily missed when located in a noisy weld, for example. Before machine learning, this was a problem in qualifying nuclear inspectors. To demonstrate performance, expensive massive mock-ups were needed to provide sufficient distribution of flaws in various challenging locations. This is where the eFlaw was first developed and used.

In simple terms, the eFlaw works as follows: first, a real mock-up with some real flaws is needed. The flaws may be harvested from the field or they may be artificially produced, but need to be representative. True flaws are the only true source of information about the signal response of flaws. The mock-up is then scanned with the used method to produce a data file that contains the flaw signals of interest. The flaws are then extracted from the data to capture pure flaw signals, the eFlaws. These eFlaws are the flaw part of the flaw–background challenge and we need to have sufficient different real flaws to represent different flaw types. The eFlaws can then be re-introduced to flawless background data (canvas) to form new data files with different flaw populations. To the inspectors, these look just like data files from real inspections, only the component does not really have the flaws they are seeing in the data. The eFlaws are used in training, performance demonstration, and virtual round robins. They finally allow us to have an unlimited number of flaws in performance demonstration, training, and probability of detection (POD) evaluation.

Unlimited flaws are also exactly what we need to train the modern ML networks. In addition to varying the flaws and the background separately, the extracted pure eFlaws allow us to introduce all kinds of variations to the flaw signals that might represent deficiencies in practical inspections or just introduce variation to cover very unlikely cases and to make the model robust against small changes in the inspection data. In short, the eFlaws allow us to take the data commonly available with a feasible number of flaws and explode it to the hundreds of thousands of examples needed to train the modern ML networks. In addition, we can exploit the more generic techniques, like transfer learning, generative adversarial networks, and so on.

Moving Forward with AI and ML

While we critically need eFlaws, simulations, or other tools to make do with the limited data we have, these tools are not magic. We still need sufficient real data as raw material. We still need to make sure that the training data covers and represents the application area. We still need to train and validate the models for a defined scope and make sure they are not used outside their validated scope. But to reach the human-level performance needed for critical inspections, we need a robust way to represent the flaws we want to detect. The eFlaw provides this. With this final piece of the puzzle, ML is finally ready for real use in NDE.

AI and ML is ready for NDE today. It is commercially available. The first applications are already actively used in the field, and many more are in the process of being adopted. Applications are found across industries: nuclear, heavy manufacturing, aerospace, etc. ML takes the tedious and taxing part of the inspector’s work and lets the inspector focus on the important parts. ML helps get the job done, quickly and reliably.

ML takes the tedious and taxing part of the inspector’s work and lets the inspector focus on the important parts.

And then there’s the other question: how can we make sure this change is for the better? That the ML actually can help the inspectors? Speed and reliability are big promises in this industry. How can we make sure that the ML actually delivers?

History is full of examples of carelessly using new technology and the perils it can lead to. In our industry and with ML, we should be cautious in introducing ML for NDE data analysis. ML is notorious in providing great demos but then failing in unexpected ways when faced with the messy real-world applications.

Luckily, in NDE we have had high reliability requirements in place for a long time. We have guidelines like the MIL-HDBK 1823A and standards like the ASTM 2862 hit/miss probability of detection (POD) to measure and demonstrate the required performance. We have existing requirements in place in various industries, like the nuclear industry and oil and gas. These same standards and practices are, in general, applicable for the ML models we develop. In many ways, they are even more applicable, since the automated systems are oblivious to whether they are in a test environment or in actual use, whereas the human inspectors may react to the test setting. For human inspectors, the POD has been a difficult standard to apply, since it requires a statistically significant amount of data and data is costly. However, for machine learning, we need a lot more data for training anyway, and so the data requirements of POD are not as problematic. (Conversely, the techniques we use to get data for ML can also be used to make POD easier for human inspectors.) Thus, we need not be discouraged, we just need to apply the tools we already have and know. This has the added benefit that the results are comparable and compatible with the existing requirements.

Demonstrating performance with tools like the POD is necessary, but it is not enough. It’s not enough to have a working system—we also need a path to adopt it to our existing structures and procedures. We need to learn to take advantage of the new tool.

The inspectors that we’ve worked with start out skeptical. This is as it should be. Many people are skeptical about new technology and machine learning, but for inspectors it’s practically a job requirement. After all, they are responsible for the inspection. It is their duty to critically evaluate anything that might affect inspection reliability.

It is crucial that the inspectors get to build trust in the ML systems. They need to be able to monitor and evaluate the ML’s results in parallel, like they would evaluate a new inexperienced colleague. This is not just about the ML performing well in the POD test. It is also about understanding how the inspectors can work with this tool. There might be certain areas where the ML output can be trusted to work independently, and there might be areas where the results have to be corrected and fed back to the training model on a periodic basis until the confidence in the results can been achieved.

Conclusion

To get better results, its best to start with a well-defined scope provided by an existing procedure, one that has digital data available and where the benefit of ML would be significant. The benefit could be improved with repeatability and reliability, and could then also be extended for challenging types of flaws such as smaller sized flaws, hidden flaws, etc.

With experience, the inspector’s healthy skepticism quickly turns to earned appreciation. They find that in ML they get a tireless helper that takes care of the most burdensome aspects of their work. They get more done with less time and stress. This is the change we want the ML to provide.

__________

Author

Iikka Virkkunen is the managing director of Trueflaw Ltd. and adjunct Professor at Aalto University in Espoo, Finland. (iikka.virkkunen@trueflaw.com)

Virkkunen presented an ASNT Learn webinar titled “Automated Flaw Detection Using Machine Learning,” now available for free on demand. For more information on this webinar, or other live and on-demand ASNT webinars, go to ASNT Learn.

Acknowledgments

Train axle inspection case provided by DEKRA industrial, Finland.

References

Aldrin, J., J.D. Achenbach, G. Andrew, C. P’an, B. Grills, R.T. Mullis, F.W. Spencer, M. Golis, 2001, “Case study for the implementation of an automated ultrasonic technique to detect fatigue cracks in aircraft weep holes,” Materials Evaluation, Vol. 59, No. 11, pp. 1313-1319

Lee, G.G., and C.H. Chen, 1993, Neural networks for ultrasonic NDE signal classification using time-frequency analysis, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol.1, pp. 493-496, DOI: 10.1109/ICASSP.1993.319163

Mentioned in the Text

MIL-HDBK-1823A, Department of Defense Handbook: Nondestructive Evaluation (NDE) System, Reliability Assessment (07 APR 2009)

ASTM D2862 – 16: Standard Test Method for Particle Size Distribution of Granular Activated Carbon

Additional Information on Machine Learning

Additional papers on ML can be found on the ASNT NDT Digital Library and ASNT Pulse. For example:

Cao, B., E. Cai, M. Fan, 2021, “NDE of Discontinuities in Thermal Barrier Coatings with Terahertz Time-Domain Spectroscopy and Machine Learning Classifiers,” Materials Evaluation, Vol. 79, No. 11, pp. 125–135 DOI: 10.32548/2021.me-04189

Imani, A., S. Saadat, and N. Gucunski, 2018, “Full-Depth Assessment of Concrete Bridge Decks in A GPR Survey: A Machine Learning Approach,” 2018 NDE/NDT for Highways & Bridges, pp. 179–187

Volker, C., S. Kruschwitz, C. Boller, H. Wiggenhauser, 2016, “Feasibility Study on Adapting a Machine Learning Based Multi-Sensor Data Fusion Approach for Honeycomb Detection in Concrete,” 2016 NDE/NDT for Highways & Bridges, pp. 144–148.

Machine Learning Makes Light Work of AM Aerospace Alloys

Machine Learning Is Here Now to Make the Inspector’s Life Easier