Wednesday, March 23, 2011

Redeploying Itanium Chip Designers -- Part One

The Itanium is Doomed

The Itanium is doomed; of that I have no doubt. That means that soon an entire ecosystem of architects and engineers will be looking for something to do. I suggest that we move them to a 'skunk-works' where they can recharge their batteries and reflect on lessons learned from their time working with the Itanium. [As an aside, I was informally asked if I wanted to work on the Itanium in the very early days of the project. I demurred, but mostly because it could only be as an employee and I worked for myself. But for the Grace of God ...]

Problems Itanium Had to Solve

The Itanium was intended to replace the x86 by addressing a variety of problems with the IA32 x86 architecture. It failed entirely in that mission.

In no particular order, here is an incomplete list of the x86 deficiencies as they occur to me:

It was 32 bits. Without getting into what 'bitness' might mean for a processor, it was not big enough.

Addressing more than 2^32 bytes was difficult or impossible (depending on circumstance). No 'flat address' space larger than 2^32 bytes wide could be addressed by one pointer.

It had a crufty old instruction set with a variety of 'hacks' to make it backward compatible over a number of generations spanning from the 8008 (released before the 4-bit 4004 and worked on by the same people). The nature of the instruction set made the chip awkward to program.

It did not have enough registers.

It had (at least for me) an odd addressing scheme and odd protection modes.

It was not well suited to multi-processor configurations.

It had bandwidth problems. Feeding the CPU with instructions from memory was about to become a bottleneck.

It made some compiler optimizations difficult or impossible.

It was not fast enough.

Problems Solved by the competition (x86-64)

In the years since the IA32 architecture was the last generation x86 architecture, some of the problems have been addressed in chips presenting the x86-64 architecture (such as Opterons, XEONs, etc), .

The chip is now 64 bits rather than 32 -- that is better.

It is now possible to address 'flat' address spaces much larger than 2^32 bytes.

There are more registers. That's also better.

I am not entirely sure where we stand with the addressing scheme, but for practical purposes, you can use a 64-bit flat space for most things that concern me. Messing around with segment registers is long behind me now.

With respect to the protection schemes, this is still messed up, in my opinion.

The chips are all now much better suited to multi-processor configurations. In addition, they all can support multiple cores. Both AMD and Intel will support 4 socket designs. AMD supports a 4-socket design with 12 cores per socket for a total of 48 cores. As of this writing, even a consumer grade machine can have 6 real cores or 4 that look like 8 cores with hyperthreading.

Bandwidth problems will always be with us, but the bandwidth situation has been very greatly improved through a variety of techniques.

Certainly EPIC style optimizations are still not possible. However, as in past CPU upgrades, modern x86-64 chips have a variety of built-in extended instructions that can be used to further optimize code.

Chips will never be fast enough, but the current generation is significantly faster than the chips were in 1989 when HP first contemplated building the EPIC based system that they began developing with Intel in 1994. In 1989, the problem they were solving (performance-wise) was an upper speed limit of 25MHz with a nominal limit of 1 instruction per clock cycle. Even by 1994 when they began in earnest, clock rates were hovering around 100MHz. Today, as this is being written, consumer systems can deliver performance in the 3.46GHz range across 6 cores -- effectively delivering more than 20,000MHz -- more 200 times the power of the chips the Itanium was supposed to best, (nominally) 25 times the power of the first Itanium released a decade ago. None of the top ten supercomputer sites use Itaniums. Most use Itanium's x86-64 rivals.

Business case is an EPIC Fail

In fairness to the Itanium, I think there may be lots of ways that a given off-the shelf Itanium box can best an off-the shelf x86-64 box. There *is* some merit in the EPIC notion and the chip accomplishes at least some of its goals for scaling well with certain types of workloads. However, for any workload I am likely to have, it is no contest and when making a business case for a purchase like this you are not comparing 'box to box', you are comparing 'cost per usage' to 'cost per usage'. On that basis, there is no ordinary scenario where the x86-64 does not flatten the Itanium. If you factor in the cost of re-writing your software on the Itanium before you can run it, it is hopeless. That last may seem extreme, but the Itanium has a very small ecosystem and the ecosystem has to somehow support the cost of that port.

The Good News

The Itanium, as I say, is doomed. That's the bad news. The good news is that those systems people are due to come free one way or another. That means that we have a team of experienced and battle-hardened veterans with (at least many of) the skills needed to solve the remaining problems with the x86 architecture. Just because they got it wrong with the Itanium does not mean that all of the ideas were bad or that their skills are lacking. Just because the x86-64 replacement of the x86-32 architecture does a better job of improving things than the Itanium does not mean it has actually done a good job or that no more problems remain.

In another posting, I will make some notes as to the x86 landscape as I see it and how I think it can be improved.

--- Copyright (C) Bob Trower. You may use this freely as you wish, but please attribute the work if you can ---

No comments:

Getting my World Dominashe On

[This is a light edit/update of a Reddit post I made about three or four years ago now.] More than thirty years ago now, a colleague initiat...