Getting Design Wins for AI Accelerators
by Andrew Richards, Founder and CEO of Codeplay Software
 

Committing to an AI chip is a big deal. When a social network, or online shop, or search engine company decides to commit to a particular AI chip, then billions of dollars are at stake. Or when a car company makes a decision to buy an AI chip, it will be driving millions of cars on the road with lives at stake. These companies are operating at enormous scale. There will be huge numbers of software developers working on the AI software – different departments, companies, suppliers and partners while using continuous development for updates and the next generation of applications. They all need to work together.

To make that work at scale you need solid, well defined standards so different teams can work together, documentation so you can maintain the code in the long term, specifications so you can define quality, and training so you can hire engineers for the project. These companies can’t risk unleashing chaos to their software development. It is not credible to simply say “this compiler converts all AI into optimized code for our processor and we can’t tell you how”. Can they take that risk? This blog is not about technology, it’s about our experience of getting chip companies through the design process, getting design wins for accelerator processors and bringing the latest AI accelerators to market.

This blog will discuss the process used by large customers when deciding to adopt a new AI processor:

  • How a customer will benchmark a new processor
  • How they will calculate the total cost of adopting an AI accelerator
  • How a customer understands the risks in switching from their current AI hardware platform

How do we find the market for these AI processors?

There is a huge market for accelerators, but currently it’s massively dominated by Graphics Processor Units (GPUs) largely from one vendor, and most new processors are targeting that market. So the first thing we need to do is understand the market and the customers, and then we need to define a product that customers are going to buy.

But where is it?

Many people in this field will have heard of ResNet-50. It’s an image classification network from 2015 and you give it an image of a dog or a cat and it will classify the image whether it thinks it is a dog or a cat, and it does it quite accurately. But how many of these systems will there be a market for? In 2013, Facebook users uploaded 350 million photos a day, and I’m sure this has continued to grow since then. If we then look at the latest MLperf benchmark results, we can see that a single Intel Xeon CPU can do over 345 million ResNet-50 classifications per day. So realistically, even with some growth, the market for accelerator’s running ResNet-50 in hyperscale data centres is less than 10. My conclusion then is that we must ignore ResNet-50 if we’re going to find a market for these new processors, since this can be executed with acceptable performance using some existing CPUs.

We talk a lot in business about vertical versus horizontal integration

For the vertical integration principle to designing an accelerator, we take an AI model like ResNet-50, and we analyze the algorithm for the performance requirements. And then we design and combine a software and hardware solution that runs that AI model very fast. This is called vertical integration because it starts with software all the way down to the hardware, a very narrow implementation that will only work with one thing, in our example ResNet-50. The advantage of vertical integration is we get very high efficiency because we’ve designed the hardware and software specifically for our use case.  The disadvantage is there’s a long time to market and it works on the assumption that ResNet-50, or the algorithm you have optimized it for, is still relevant at that time.

Now for the horizontal approach instead, where we want to look at a lot of different models and algorithms. We define an interface specification and we provide AI devkits. Today those AI devkits are GPUs and the interface specification is generally CUDA. What happens over time is that the new AI models are developed from the old ones, so we have research, we have improvements in the field and we get new AI models with that performance analysis fed back into designing new accelerators. But you need to be able to run that CUDA software to do this. The advantage of this horizontal integration is it gives you very high performance on the latest AI algorithms. The disadvantage is that it requires a more complex organization.

So we need to be able to define an alternative to CUDA and we also need to define everything to do with the organizational structure of that.

 

vertical chart 1

What fits this market?

With the video game market, we find that vertical versus horizontal integration isn’t an excuse. It’s about $3.7bn a year achieved by Nvidia and growing pretty rapidly. That’s a pretty good market, and it’s horizontal. So they’ve got the GPU running lots of different algorithms and they’ve got the CUDA programming model for software with a large ecosystem of CUDA software including libraries and frameworks. This represents a market leading horizontal integration.

For vertical solutions, companies like MobilEye doing driver assistance systems exist that go into a lot of cars and they provide the full software and hardware solution; this has a market value of around $800m per year. Google, with their TPU deliver data centre software and data center hardware, a full top to bottom integrated stack, but it’s difficult to source an estimate of the value of their TPU work. With Tesla, the ultimate vertical integration in many ways, doing the whole car with AI software and SoC. Tesla values the AI SoC and software at hundreds of millions of dollars a year. So not as much annual revenue value compared to Nvidia’s horizontal market, but it’s still pretty significant.

We also see is a lot of very wellfunded companies doing a mixture of horizontal and vertical solutions. They don’t have any applications and they don’t have an open software platform. So they are part vertical and part horizontal. If you consider the revenues within that market by looking at the revenues of some of these companies, you can see that the total size of this market is measured in tens of millions of dollars today. Also, it is divided amongst hundreds of different competitors who share this much smaller marketplace. It’s an unhappy intermediate place between not providing a horizontal integration and not quite vertical.

 

vertical chart 2

 

So vertical integration provides high efficiency, but it takes a long time from algorithm to market. However vertical integration is limited to niche markets where it is possible to hit, or very small markets where you are unlikely to succeed as a business.

For horizontal integration, they can achieve high performance on the latest algorithms, which has a large, fast growing market, but the organization needed is more complicated.

What do we have to do to make these models work?

For vertical integration, we need to find an AI algorithm with a large market where fast time to market is not required. Automotive is a good example, where it takes quite a long time from designing your algorithm to making it into a car. You need to co-design the accelerator and the software to achieve maximum performance of per watt of power and per dollar cost. Then you release a combined product as a whole integrated system, not just the chip.

If we want to make horizontal integration work then we have to focus on creating an organization that can support a broad horizontal platform approach and creating partnerships. Combined this then becomes the basis of the horizontal approach you need. So build an organization capable of creating partnerships and then you are better set up to deliver performance to a wide range of customers.

Let’s turn it around and look at it from a customer’s perspective, how do they see the market?

From a customer’s point of view, they will see huge benefits in the adoption process. There’s going to be some performance or cost improvement, but they’re also going to see some costs in moving to your processor. They’re going to have to make some investment porting their software and adapting it for your platform. Therefore, they’re going to want to understand what is that investment cost and what the benefits are. They are also going to want to think about the long term for maintenance and updates, especially if it is going into car for safety. A customer is going to want to understand this for their whole application, not just one neural network like the ResNet-50 we used earlier.

What does this look like as a sales process?

You’ve got your demonstration and you have your chip with your fast neural networks running on it. Your customer has an application, which is unlikely to just be using one of the neural networks you have optimized for and it is likely to contain some of their secret IP. They generally want to create representative workloads for the kind of workload they’re going to have in their data centre or car.

So Facebook, for example, has DLRM, a representative workload of the recommender networks that they use in their data centers except it’s not the real recommender network used in their data centers but a benchmark that allows them to understand how well a processor will run their production recommender networks. They will want to work with you to port those workloads on to your chip. They’re looking for two things:

1) The performance of their benchmark workloads

2) How much effort is it to get their workload on your chips

Most likely they’re considering a larger project and will evaluate a few representative workloads and estimate the size of the project to achieve this on a whole system. This is important for the overall cost benefit analysis of buying your chip.

Then you need to provide a roadmap because they’re investing in the future of your chip and porting their software to it. If it takes a year or two to port their software across, then actually they’re not going to be buying your first generation other than as a devkit. It’s the second-generation chip that they’re going to be interested in.

So that’s why it’s so important to understand your roadmap, to show what the future looks like. This allows customers to understand the total cost of adoption, including the future maintenance.

If you get through this then you get a design win.  

How do I compete in a horizontal AI market?

The good news is that there are lots of opportunities. GPUs are designed for a wide range of high-value AI workloads but also balanced for graphics processing. They have to be balanced, a midpoint between those graphics and compute systems to appeal to both sets of developers and customers. To compete with GPUs, it’s possible to choose workloads that GPUs are not well-tuned for:

  • Big-data workloads like recommender networks where compute is less important. You don’t need as many teraflops since many of these networks need more bandwidth. You can tune your processor more for bandwidth and less about compute. Or to have access to very, very large amounts of memory
  • Sparse operations, e.g. language networks. GPUs are good at dense operations and ok at sparse operations, but it’s possible to do better with language networks and recommender networks
  • Lower-precision, but still high-value networks: e.g. semantic segmentation, which can give efficiency savings

Then you can design hardware that’s tuned for that workload or that range of workloads. But the interesting thing is that even if the workload varies, the software building blocks look very similar. Software building blocks don’t give you an indication of where the performance is being use, whether it is compute bound or bandwidth limited, The software building looks pretty much the same, so you can invest in the building blocks for all of these kind of workloads.

What does a horizontal AI hardware organization look like?

You’re not just selling a chip, you’re selling an organization that your customers can invest in. They’re not investing in your equity like a VC would, they’re investing in your platform, they’re investing software development effort. You’ve got to come across as investable from an organization point of view.

The first thing is to be clear about – what is the IP Barrier? What is your secret sauce and what is out in the open? Fundamentally this is your product definition. Your product definition for your customers is NOT known as running ResNet-50 really fast. You are defining a platform that your customer is going to build their software on top of. The software capability of that platform is absolutely crucial to the product definition and they need to have clarity on that. But also your own engineers need to have clarity, and they need to know who owns what, because they’re going to be working on both sides of this IP Barrier.

That’s one of the big challenges right now. You’re going to have your teams working on secret sauce hardware and secret sauce software. You’re going to have to work out how to expose that to software developers in some way. The specifications must be based on things like industry standards (to be discussed later). However, a lot of the other software is out in the open, TensorFlow is a good example of this. The interesting thing is that AI graph compilers generally need to be out in the open and be modified by the customer, because when you add new classes of AI, you need to modify the AI graph compilers. You cannot do everything, and the graph compiler will be limited to certain kinds of operations, developers need to be able to add more of their operations themselves. All of this ecosystem work progresses where you’ve got to be clear to your engineers and your customers that development is going on outside the IP barrier, and there is code within the IP barrier.

Get this right or it is very difficult to do business with software companies buying your chips.

Conclusion

There is a genuine opportunity for chip vendors to take advantage of a fast-growing market, and developers are demanding solutions that can help to process their increasingly complex AI software with better performance. These same developers are looking to avoid being locked in to a proprietary environment and are embracing open standards.

It’s crucial that chip vendors offer an environment that gives developers both performance and the flexibility to optimize software without having to come up with a vertical solution. A horizontal integration is essential to get your next design win.

Want more tech news? Subscribe to ComputingEdge Newsletter today!