"Just yesterday, the appearance of the flagship threadripper 2990wx, AMD's second-generation thread Ripper, was officially lifted. AMD's official website also revealed some of its performance. Thanks to its 32 cores and 64 threads, threadripper 2990wx easily won the title of the king of CPUs in the consumer market. Previously, the i9-7980xe sitting on the throne was easily killed, which can be described as a moment of infinite scenery.
But as the saying goes, how beautiful it is now, how much it has fallen before, and AMD is no exception. Just two years ago, amd was still an opponent ignored by Intel. There was a huge gap between its product performance and Intel. Its stock price was only over $2 at the lowest, and it was on the verge of bankruptcy. One of the main reasons why AMD will be so frustrated is that it launched bulldozer, bulldozer processor microarchitecture that year.
Just two days ago, the foreign media extreme tech summarized the worst 10 CPUs so far, among which the bulldozer entered the top three. It can be seen that the failure of the bulldozer is a relatively recognized fact. Today, on the day when threadripper 2990wx is in high spirits, pconline Xiaobian will lead you to review the failed architecture of AMD and see its story as a loser from its birth to its end.
The birth of bulldozer: the mission of revenge
AMD is an adventurous company. Although its revenue scale is only 1 / 10 of Intel's, it dares to innovate in technologies such as HT bus, DDR memory and multi-core processor. At the beginning of this century, with K8 architecture, its processor is significantly ahead of Pentium 4 in IPC and efficiency. During this period, intel was "taught to be a man" by AMD. However, Intel is still the big brother in the CPU industry after all. Its huge financial scale, massive talent reserve, huge industry influence and strong relationship with OEM core partners have become Intel's capital to turn the tables.
Finally, in 2006, the core series processors became a blockbuster, and the first core 2 Extreme 6800 and core 2 Duo E6300 showed unparalleled performance. It not only surpasses the previous generation Chanpin by 40%, but also has an amazing performance in power consumption and heating. Intel's products have completed a gorgeous counter attack. If the first processor models launched by core are a clarion call for Intel to fight back, the core I series processors based on Nehalem will completely lay the foundation for Intel to dominate the CPU market in the next 13 years.
Powerful Intel
In November 2008, Intel released the Core i7 965e / 920 processor, a native four core processor, which integrates the internal memory controller, uses the QPI bus architecture that breaks the memory bandwidth transmission bottleneck and HT hyper threading technology, and improves the processor's energy consumption ratio to the extreme with the whimsical turbo Rui frequency acceleration technology. The release of core processor is an epoch-making change, It convincingly Ko AMD's K10 architecture processor at that time, forcing the other party to only rely on cost-effective or "open core" operations to occupy a small share in the market.
Having tasted the sweetness of success, amd naturally will not accept failure so easily. In order to change the market situation, amd made every effort to invest in the research and development of the next generation revolutionary architecture, and launched the first generation bulldozer processor in 2011, but no one expected that the launch of bulldozer architecture processor was the beginning of AMD's complete collapse in the CPU market.
Bulldozer architecture: an embarrassing "innovator"“
The micro architecture and manufacturing process of CPU directly determine the efficiency of CPU. Optimizing the micro architecture and updating the manufacturing process has become the most important way for CPU manufacturers to improve CPU efficiency. The bulldozer architecture is the CPU micro architecture painstakingly made by AMD in that year.
In the early stage, amd put forward many innovations when publicizing the bulldozer architecture. To sum up, 1. The new modular design is more efficient and easier to expand the core. 2. 32nm SOI fabrication process, better power consumption control. 3. The new multi-threaded architecture has stronger multi-threaded computing performance. 4. Instruction 4 emission (only 3 emission in K10) and AVX instruction, stronger integer / floating-point operation and improved single core performance. 5. The second generation turbo core technology can better adapt to various application environments.
The core foundation and soul of bulldozer architecture is modular design. As we all know, in the traditional sense, the CPU has more physical cores, the performance will be stronger, but the cost will be higher. Therefore, Intel has applied SMT technology, that is, hyper threading, to core processors. SMT allows multiple threads of a CPU core to share resources and execute synchronously. There is almost no need to increase the cost in hardware, but the efficiency is certainly not as strong as that of more physical cores.
Amd "bulldozer" microarchitecture
In order to balance the cost and multithreading efficiency, amd has uniquely introduced CMT technology. Amd encapsulates two cores and related units into a module on the "bulldozer". The two cores share a floating-point operation unit, but each core has a complete integer operation unit. Fx-8150 consists of four modules and eight cores. In fact, there are only four floating-point units. In the past, CPU has one floating-point unit for each core. So the four core consists of two modules, and the six core is made up of three modules. The advantage of modular design is that it can reduce redundant circuits and make it easier to stack CPU cores, which was really a wild idea at that time. Amd calls this CMT physical multi-core. Therefore, rather than calling the bulldozer's 8-core CPU an 8-core processor, I prefer to call it a 4-module 8-threaded processor, because each "core" is actually an incomplete disabled core and does not have a separate floating-point operation module, but the advantage of doing so is that the CPU costs relatively little, Eight complete integer arithmetic units are obtained, and only eight threads are owned by high-end processors.
There are three reasons why amd designs this bulldozer: first, he believes that more than 80% of the operations in the CPU are integer operations, and the benefits of adding an integer unit are obvious. With a small cost of increasing the core area by 5%, 80% of the integer performance can be improved, while floating-point operations can be handed over to GPU in the future, which is more efficient. Second, in the future, general computing will continue to develop to multithreading, and the requirements for threads are unlimited. Third, it is possible to significantly increase the CPU frequency in the future, which can make up for the weakness of single thread processor.
AMD's gambler enterprise style was reflected incisively and vividly in the R & D and design of bulldozers. He made three bets on the development direction of processors in the future. Unfortunately, none of them was right at that time.
8 nuclear bulldozer architecture
The advanced desktop processor with four modules and eight threads has a large number of integer threads, but the workload of most users can not be evenly distributed to eight threads (in short, the multithreading optimization of most programs is not good). Single thread operation still occupies the vast majority of usage scenarios used by users. On the other hand, the sharing of floating-point units means that applications full of floating-point arithmetic do not have enough running resources. Although GPU based computing is very important in some specific work - such as scientific supercomputing - mainstream applications still rely more on CPU for floating-point operations.
The architecture of the bulldozer has led to the regression of the single core performance of the processor, which is not even comparable to its previous generation six core flagship phenom II X6 1090t, let alone compared with Intel's SNB processor at that time. In AMD's vision, the retrogression of single core performance can be improved by greatly increasing the processor frequency, but the bulldozer processor with a more backward 32nm process blindly increases the frequency. As a result, it becomes a big stove in terms of power consumption and heating (this situation is very obvious when driving a pile driver).
Bulldozer flagship fx-8150
Another result of this design is that in most games that eat floating-point performance and single core performance, bulldozers are even worse than "Fat Dragon 2". They can only eat dust behind SNB processors, which is fatal in the DIY market. After all, most DIY players play DIY hardware for games. The consumer market does not buy it, and the reputation gradually collapses. The outcome of the bulldozer structure seems to be predictable.
3 development of bulldozers: agricultural machine series processor
Pile driver: unyielding challenger, but eventually become a laughing stock
AMD CPU development roadmap at that time
After amd launched the bulldozer architecture processor, the market feedback is not very good, but this is only the first generation after all. It may also be that the market and software environment have not adapted to the new species of this processor. So amd launched the version of bulldozer minor repair and minor modification in the second year, piledriver pile driver architecture processor. Compared with bulldozers, the core changes of "pile driver" with modular design are mainly reflected in: 1. New instruction sets such as fma3, avx1.1 and f16c are added; 2. Strengthen power management and reduce product power consumption; 3. L1 and L2 cache optimization; 4. The core frequency increases and the TDP remains unchanged.
The most important is the optimization of power consumption, which gives the processor about 10% overclocking space compared with the previous generation under the same voltage. At that time, the pile driver processor represented by fx8350 was only made by grofangde's backward 32nm process, but thanks to its long pipeline design, the pile driver could easily overclock to more than 4.5ghz, so that at that time, a fan in the market joked that "the performance is not enough, overclocking comes together" and "overclocking is not unusable". In order to maximize the performance of the pile driver, amd even launched a nuclear bomb fx9590 with TDP up to 220W and dynamic frequency up to 5.0ghz. However, its terrible power consumption, heat dissipation and power supply requirements, and the single core performance of 5.0ghz is only equivalent to the i7-4770k single core performance at about 3.8ghz have become the constraints for this processor to enter the mainstream market and let consumers choose it.
Fx9590 is just AMD's unyielding roar. The high-frequency pile driver can only bring amd the title of big stove, but it still can't change the fate of the failure of the bulldozer family. Finally, it won the humiliating title of "I3 silent seconds".
The failure of the pile driver made amd clearly understand itself. So far, amd has abandoned the high-end CPU market and devoted itself to the research and development of another big pit heterogeneous computing (for details, click here to jump to relevant articles). Since then, the steamroller roller and exciter excavators launched by AMD have only been applied to APU and low-end processor models, focusing on reducing CPU power consumption and improving processor energy consumption ratio. Although these low-end APU products are favored by many consumers in the market, most of the reasons why consumers choose it are because of its high-performance core display. Even many private netizens laugh that buying APU is to buy GPU for CPU, and buying Intel CPU is to buy CPU for GPU.
AMD's CPU has also lost its reputation in the market. No matter from the perspective of market or performance, the bulldozer architecture is very failed. It can even be said that it has been nailed to the stigma column of CPU history, so that until today, amd dare not enable the FX suffix that once symbolized AMD's highest performance processor. After all, FX is easy to associate with bulldozer series processors.
4 impact of bulldozer: failure is the mother of success
Modular design: predecessors plant trees and posterity enjoy the cool
Words that are widely spread are always more reasonable. For example, failure is the mother of success. For example, when you look at digital hardware information, you come to pconline. Since the failure of the bulldozer, amd has endured hardships and tasted bravery. Finally, in 2017, it burst out an amazing force, realized a counter attack when everyone was not optimistic, and released the Ruilong series processor comparable to the core I series processor. At that time, amd had been 13 years since its last peak.
Zen architecture
The reason why Ruilong series processors can become a blockbuster is largely due to the ingenious use of a modular design scheme called CCX (CPU complex) in the architecture design. Is the word modularity familiar? You are right. Although the modular design concept is the root cause of the failure of the bulldozer, the word modularity is not a scourge, and the Zen architecture is a more mature product under the modular design concept.
Although the modular design concept is also adopted, the Zen architecture draws lessons from the bulldozer. Each core is a complete core and greatly enhances the performance of floating-point operation. On the integer pipeline, Zen has four arithmetic logic units (ALUs) and two address generation units (AGUS). On the floating-point pipeline, the concept of shared floating-point units has been abandoned: now each core has a pair of independent 128 bit multiplication and superposition units (FMAS). There are separate addition and multiplication pipelines in the floating-point unit, which are used to deal with more mixed instructions without multiplication accumulation. However, the 256 bit AVX instruction set still has to be executed separately on two FMA units and use all floating-point units. The final result is that the ryzen series processor has a huge improvement of more than 40% in IPC compared with the bulldozer series processor.
Zen architecture
In Zen architecture, there are four cores in a CCX module, each core has its own independent L1 and L2 cache, and a CCX module shares 8MB L3 cache. Each core can selectively turn on or off the SMT function, or selectively turn off some cores. Compared with Intel's ring cabling, the benefits of Zen architecture are obvious. In AMD's latest processor layout, each generation from the highest server epyc to the lowest entry-level ryzen 3 series processor only needs to design a die (core) and then stack it continuously, which will save a lot of design and streaming costs compared with Intel, Therefore, we can also see that the price of the same core ryzen processor in the market will be much lower than that of core. In addition, the modular design also makes the increase of CPU core as simple as stack building blocks, and the birth of threadripper 2990wx is a natural thing.
It improves the weakness of bulldozer single thread and dependence on high frequency, and adheres to the modular design concept. It is believed that general computing still has a great demand for multithreading in the future. After six years, amd once again plays the clarion call to enter the high-end CPU market by relying on Zen architecture.
Behind AMD's reliance on threadripper 2990wx to win the throne of the consumer CPU market is the result that the most failed processor in history is willing to act as a stepping stone.
summary
Heroes will always be in the twilight, not to mention that the bulldozer is more like a "bear" in the traditional sense. However, when the bulldozer elders see that the threadripper 2990wx is so powerful, they believe they will leave without regret and disappear into the torrent of history. And Xiaobian, I can only do this
Our other product: