Semiconductor Engineering sat down to discuss the impact of advanced node chips and advanced packaging on automotive reliability with Jay Rathert, senior director of strategic collaborations at KLA; Dennis Ciplickas, vice president of advanced solutions at PDF Solutions; Uzi Baruch, vice president and general manager of the automotive business unit at OptimalPlus; Gal Carmel, general manager of proteanTecs‘ Automotive Division; Andre van de Geijn, business development manager at yieldHUB; and Jeff Phillips, go to market lead for transportation at National Instruments. What follows are excerpts of that conversation. To view part one of this discussion, click here. Part two is here.
SE: At 5nm, which is where some of the automotive AI chips are being developed, we’ve got process variation, electromigration, electromagnetic interference, power delivery issues and inspection challenges, among other things. And we’ve never put an advanced-node chip into an extreme environment in the past. Do we really understand what’s ahead and how to deal with it?
Phillips: We know there’s going to be a lot of change in the requirements, the use cases, the expectations, and the standards around autonomous driving, including how vehicles react and the types of decisions they can make and not make when a human life is on the line. Ultimately, it’s going to be us figuring out how to consolidate and tie those things together. That will be necessary for us to even be able to adapt from the production and verification of the behavior of the chip. On top of that, we need to put in place the appropriate behavior and autonomy, having algorithms on there so the car can make the right decisions. Data is the key to that.
SE: We also have software coming into this picture. If you update one part of a complex system, you’re potentially affecting everything in that system. And if you add lots of software, performance degrades, and that can impact every car on the road.
Ciplickas: Software is challenging because it doesn’t follow any rules of physics. Hardware sounds hard, but it actually follows some boundary conditions. With software, you can change one thing that can have massive unintended consequences.
Carmel: Using deep data, we virtualize the hardware to better sense the impact of software operations. With these virtualizations, you can shift to an adaptive software model that is tailored to the vehicle’s ECU performance and in-field degradation. The AI application increases the portion of AI on the chip to meet the software’s demand. This feedback will help to reduce redundancies and ensure that functions are optimized. In addition, in-field inferencing and training will continuously improve how hardware and software interact with one another.
Ciplickas: We’ve talked about 5nm chips as a whole new world where we’ve never been before, and the challenges of taking all that data and assimilating and connecting all of it. The key in the advanced technologies is actually to understand what data you’re missing. For example, the Middle-of-Line (MOL) in 5nm has three-dimensional electrical interactions that you simply can’t see with a physical inspection. This is a major reason we’ve been pursuing inline ‘Design-for-Inspection’ — to get a sensitive measure of leakage, which in turn indicates latent defects that risk turning into real defects. To properly respond, you have to know that the defects are there in the first place, which means you have to create new data. Simply taking the data that’s presented as an artifact of the manufacturing process is not sufficient. Differentiated data is required.
SE: What has to change in both inspection and metrology to be able to identify these problems? And what has to change from a test perspective to understand what’s going on here?
Rathert: The biggest challenge is not seeing the defects per se, but understanding which ones are going to be relevant — which ones might become an activated latent defect. What I would love to see, and it doesn’t exist today, is some connection to the designer’s mind that says, ‘These are my critical areas for reliability,’ and some connection to the test engineer’s mind that says, ‘These portions are difficult to test.’ Then I would improve my value proposition by being able to focus inspection there and report data that is isolated to those regions, and feed that back to harden designs and improve test vectors. There’s a whole unharvested opportunity in that.
Phillips: There’s value in connecting those two. You talk about a vector we need for the design team and a vector we need for the test team. The more we can bring those two together and have an iterative, or collaborative, aligned data set, the better the understanding of what inputs and outputs need to happen on the chip. That’s the one of the keys to trying to accelerate through this process. We need to bridge design to test and eliminate the proverbial wall that exists in product development lifecycles.
SE: So basically the feedback loop has to go much further left and much further right?
Carmel: It needs to go much further right before it can go further left. We need to go through the tool chain and use that data to circle back and improve the chips.
SE: One of the other challenges that we have looking forward, in addition to safety and design, is security. That can affect safety and the value of this whole system. How do we build security into these systems?
Ciplickas: There is definitely a linkage between reliability and safety and security. There are a lot of angles to security, but one thing I’m finding is that some of the techniques and measurements that you would use to optimize reliability, can give you tools to increase security. Debug monitors or drift and shift monitors, for example, could detect certain kinds of attacks, whether it’s detected at t = 0, or whether it’s detected as abnormal behavior or drift in the field. But those same monitors are already being used for system operation and optimization. There’s correlated infrastructure between the two, although they are applied in very different ways.
Carmel: We need to look at it as an opportunity for using data, since the more valuable the data you produce, the better the chip’s signature becomes. Eventually, that data helps you to understand if something is abnormal. This can be even more pressing in shutdown-sensitive vehicles. Using deep data, you create 24/7 fleet visibility and identify problems as soon as they occur.
SE: Given the amount of data that’s moving through those systems, are you actually going to be able to pick up a very slight anomaly, or is it just going to be noise in the midst of all the other noise?
Carmel: What we provide is deep data, based on Universal Chip Telemetry measurements. We are delivering insight to the actual chip and system operation, performance, reliability margins and performance degradation. This real-world data does not rely on shifting touch points, but on in-field operational outputs.
Ciplickas: To your point about the signal and the noise, I’m optimistic the industry will be able to develop techniques to find that signal. If you look at the sensor data that comes off a tool while it’s processing a wafer or wire bond, the variety of good signals you can get is huge. And the anomalies you find in those signals are sometimes tiny little blips. We’ve developed machine learning techniques to find those tiny little blips in the sea of otherwise ‘good’ noise. Instead of thinking about it as a tool making a wafer, if you think about it as a system operating in the field, understanding those tiny blips is within the realm of possibility. But it’s going to take a lot of work.
SE: Going back through the manufacturing cycle, are you finding any glitches in your data where you’re going to say, ‘Okay, this is a potential security risk that we didn’t understand before?’
Ciplickas: Understanding downstream signals using the upstream data is a very powerful technique.
Baruch: People often tend to look at predictive models as if they actually are predictive. But they miss the fact that the feature set — the thing that is actually contributing to your ability to predict something — is the most important part to filter noise and see what’s important and what’s not, and what is the root cause of any issues. We often use a shift-left model, but that has to be done in an educated way. You don’t want to go back to look for a needle in a haystack. Good models can help you find what’s important and what’s not, as long as you decide at which angle to look at them. When you build those models, you want to predict something. But you also need people who can go back and fix the attributes in those models when they’re wrong.
Ciplickas: Excellent point.
SE: A big gap seems to exist between ADAS versus autonomous vehicles. Moving into full autonomy, you have to start thinking of systems of systems working together. What happens when you have cars and devices on the road that use different generations of chips and different generations of software because they were produced 10 years earlier?
Carmel: The fundamentals of moving from ADAS to AV is to understand what kind of failures are experienced in the field. Eventually, it’s a matter of defining the performance envelope. Every car has its own performance envelope, because it has different hardware, different software, different layers. When you know exactly how to define this performance envelope and create the balance between safety, reliability and security, then you have control over the fleet. Using deep data, we can define each model and each unit’s standalone capabilities and outline an autonomy hierarchy.
SE: Will we start seeing AVs on the road in anything other than in geofenced areas, such as a single lane on a highway set up for autonomous vehicles, where you have to take over when you get off the highway?
Carmel: The key to allowing vehicles to incrementally go out of a geofenced area is coverage and scalability. When operating outside of a geofenced area, reliability and predictability will ensure that fail safe protocols can be followed, and that requires absolute certainty about the ECU’s operational capabilities and safety profiles. This can only be achieved with continuous monitoring and non-intrusive in-field system integrity verification.
Ciplickas: It sounds like a very natural evolution. You start with an area in which you can perform well, based on that learning. I like that you said the geofenced area will grow. Those would give us a ton of learning, which would then enable the next levels of autonomy.
van de Geijn: It’s not only the cost. It takes time to improve the products and components and to learn from them. Autonomous driving is not something you just switch on one day and it exists. It will improve over the next 10 years until you really have something you feel comfortable with, and which can do 80% or 90% of the things a human can do.
SE: We seem to be a long way from removing the steering wheels in cars.
Baruch: If you look at the regulations associated with that, on one side you have China, which is quite loose on what they can do and what they control from a regulation standpoint. On the other side, European countries are quite far away from approving it. But this also overlaps with a second trend, which is electrification for emissions control, and there is much they can do in parallel, when introducing a new vehicle to the market, that needs to be both fully autonomous and fully electrified. Given all the fines and regulations driving electrification, we’re seeing a much bigger movement in that direction versus the need to make fully autonomous cars quickly.
SE: Advanced packaging in cars is new, as well. We’ve had multi-chip modules for decades, but not like the kinds of packages we’re seeing with sensor fusion or some 7/5nm chips. What impact does that have on reliability? Is it just another layer of complexity and data that we have to deal with? And do we have to make sure that all the chips are not just within the margin of acceptability in terms of known good die?
van de Geijn: It depends on which part of the car they’re going to be used in. If they are for the entertainment system and those kind of things, and you can use the same components that go into millions of mobile phones, you can trust those parts. If you have high failures in the mobile phones, you will not use them anymore. Many companies say, ‘That goes into entertainment system and it’s a component that I can replace by taking out the module and putting in a new module.’ That’s completely different from if those packages were used, for example, for your motor management system. Companies that make the buttons to move a seat back and forth may develop completely new technologies to replace those buttons when they don’t work anymore. But if it’s a motor management unit, that’s a completely different story. It’s also where you put them and how you use those parts.
Carmel: Advanced packaging adds another layer of complexity because it lacks visibility and relies on a high-density architecture which limits redundancy fallbacks. In addition, the AI portion of the chips is growing. It’s not only about packaging and advanced nodes, but the fact that the chip architecture is AI driven and uses in-field inferencing and training to continuously improve the hardware architecture. Using that feedback loop, you can reduce hardware redundancies and optimize complexities.
Baruch: In addition to that, the packaging does add complexity to the notion of hierarchy and assembling components. If you have one on top of the other, you need to cross-correlate in three dimensions. That by itself introduces the semantic notion of the data. It has multiple vectors, and one of them is also a hierarchy element. It does add complexity, because when you look at a component you’re not seeing it as a single unit by itself. You’re also looking at the hierarchy of the components that are part of it. If you don’t do that, you’re very much limited in what you get out of that analysis. However, if you do this right, it can be super valuable for pinpointing where a problem is.
Ciplickas: That comes back to the E142 spec, representing that hierarchy and knowing all the relationships of all the parts that have been placed into this three-dimensional stacked package. The system-in-package, or 3D integration, is going to bring new fail modes due to the interaction between the components. The chip-to-chip communication is different than chip-to-board communication, and the electrothermal/mechanical interactions are different. One carmaker showed that under stress, the SRAM failed in ways that were very predictable. They actually measured those on a bench. That led to design rules in the PCB itself for how to build mount points in an ECU enclosure. That’s a macro version of the challenges that are going to happen in the 3D or 2.5D integration in these packages, which will be put into harsh environments. So it’s not just the chip-to-chip communication. Now, imagine these things have different thermal profiles than you expect. That’s going to change the expansion and the stress on these things, which is then going to change the performance, because we know the stress changes the device behavior. Knowing the behavior of the individual chips at wafer sort test, and then knowing what was put together in a package — and having the package-level evaluation, and putting all that together — is a huge challenge. It’s a whole new frontier to use advanced 2.5D integration in a car, and especially in a safety-critical system.
[Uzi Baruch has since left Optimal Plus and joined proteanTecs as chief strategy officer.]
Chips Good Enough To Bet Your Life On (Part 1 of roundtable)
Experts at the Table: Strategies for improving automotive semiconductors.
Predicting And Avoiding Failures In Automotive Chips (Part 2 of above roundtable)
Experts at the Table: New approaches rely on more and better data, but it also requires sharing of data across the supply chain.
IP Safe Enough To Use In Cars
A look at the different ASIL levels in safety-critical design.