In recent years, domestic IC design vendors have emerged endlessly. There is a dragon core, Feiteng, Shenwei and other old-name design units, there is also a magicalin, macro these rookies, and ARM camp manufacturers such as Haisi and Exhibition in business. But in performance Intel has always maintained a huge advantage for the domestic chips and Intel's chip gaps?
How to evaluate the CPU performance of each home?
As a consumer, it is natural that the CPU can be as low as possible, while performance can be high. So, what kind of CPU performance is high? From the perspective of the architecture, there is a indicator MIPS, which is how many instructions per minute. The more execute the number of execution instructions, the better performance, but this exists, when the CPU instruction set is different, compare MIPS The meaning is not big - such as a one instruction only counts an addition, B one instruction can do a 1024 point FFT. Especially in the case of different instruction sets, how to evaluate the CPU performance of each home?
Evaluation of CPU performance must consider the diversity of applications, such as scientific operations pay attention to double-sided performance, but if the data is not up, the computational power is reserved; for example, the daily use of PC is more than integer performance; more likely, more The task environment is concerned about throughput ... Therefore, it is not scientific with a certain indicator to measure CPU performance, and it is necessary to constrict the consideration.
The industry has also launched a lot of reference test programs, such as SPECs for the CPU, EEMBC, etc. for embedded applications. SPEC test is a comparison of authoritative test programs, and some black box test procedures. The procedures for SPEC testing are all publicly transparent, and the coverage is wide - SPEC2000 has 12 fixed programs, 14 floaders And there is a relatively strong representation, such as Gzip, Vpr, GCC, MEF, EON, etc.
What is SPEC test?
SPEC uses a normalized geometric average method on score to perform comprehensive performance evaluation - obtain a relative value after comparing the execution time of different CPUs with reference objects.
The reference object of the SEPC2000 is the CPU of the Universal 300MHz of the Ultra SPARC 2 workstation. If the time of running the test program 1 is one-tenth of the reference object, the test is 1000 points, the time consumption of the test program 2 is one-third of the object, it is 800 points ..... finally Geometric average - such as 12 fixed-point tests, the sepc2000 has a multiplication of 12 test scores, so since the test has more emphasis on performance, because if a test exists, it will pull The last score of the low test, the most extreme case is 0 points, even if other test scores are higher, the total score is also 0 points.
However, the SPEC is not happy, and there is no problem that I / O bandwidth and run points are prone to compiler.
In the case, the previous generation of microstructure has caused it to run in the SPEC2000 due to the presence of visits, but the SPEC2006 is low, and the GS464E has solved the visits, there is no such problem. The SPEC2000 has a lower requirements for I / O bandwidth. In terms of compilers, Sun has been optimized by compiler to increase SPEC running 50%, and the LCC compiler of the previous generation of the dragon core is increased by 60% by the LCC integer. Even if the GCC compiler is also used, it will also be difficult to have the most accurate evaluation due to different versions, or optimization (GCC partial code is provided by Intel, the optimization of X86 is optimized, the ARM market share is large, the optimization is also good, MIPS, Optimization of Alpha is more general).
Recommended based on ARM development board:
The MCIMX6SX demo board is based on Cortex-A9 using I.MX 6 Series Application Processor Development Leading Multimedia and Image Application.
An EFM32GG development kit is an Excellent platform for evaluating the Giant Gecko microcontroller based on Cortex-M3.
AM335X Getting Started Site A Cortex-A8 processor speeds up development platform for developing smart home appliances, industrial and networking applications.
The SPEC test is very similar to the college entrance examination, although there are various flaws, but there is a wide range of private and clear, relatively fairness, which is a relatively reasonable evaluation of test procedures to make CPUs.
Different instruction sets CPU comparison
The author will make a form of X86, ARM, MIPS, Alpha instruction set, as follows.
The compiler is determined by the dragon core. Outside GCC4.8, the rest is unknown - VIA's white paper does not indicate the GCC version in the test, and the other compiler author makes a speculation: Wiwei may be SWCC; I3 550, I5 4460 may be GCC5.1. Because it is just the author guess, the compiler selection blank. (ICC is Intel's compiler, X86 chip can be used; LCC is a compiler of a dragon core; SWCC is the compiler of Shenwei. Intel and AMD chip are referred to)
(Because the compiler is not unified, the form is for reference only)
As can be seen from the table, when using the GCC compiler, Mid-Core, Shenwei, Feiteng Sepc2000 test and Intel Haswell still have a fairly gap, and the SPEC2000 test score is only the case of using their own LCC compiler. Under the cutting, the integer and Nehalem are 50 points, floating point and Haswell difference. In terms of the frequency, the domestic IC design company is only 2G, and Intel, the frequency gap between the AMD 3G or more is obvious.
Therefore, the gap between domestic CPUs and Intel is not just the frequency. Even if the zime ZX-C can reach 3G above the frequency, but because the gap in the microstructure is still about 40% of the I5 4660 performance, the microstructure is very important, it can be said that the security, performance, and power consumption of the CPU. Depending on the microstructure, AMD's CPU is inferior in Intel under the same frequency, and is largely due to the gap between microstructure.
When consumers purchase the CPU, only the parameters such as the frequency, core, the number, and the process are often ignored. Plus Intel begins to squeeze tooth cream from SNB, so that the microstructure update is very small, The importance of the microstructure is more neglected.
Causes of microstructure gap
Because the macro, Zha, Haisi, and Exhibition currently did not have independently designed microstructure, it was compared to Dragon Fin, Feiten's latest two products and intel. In terms of GS464E and Ivy gap, the cause can be found by comparing the following parameters.
(Data network collection, only entertainment)
If GS464E and Ivy can be discovered, the maximum short board for GS464E performance can be made on the fixed-point transmit queue and floating point transmit queue, and the GS464E is only 16 custom-point transmit queues relative to the 54-item and floating-point transmit queue of Ivy. , 24 floating point launches.
Dragon is also aware of this, and will improve the 3A3000 of the flow, improve the bottleneck of GS464E, and increase the fixed-point launch queue to 32 items, and increase the floating-point launch queue from 24 to 32 items. Enhance the cache and frequency. Obviously, although the dragon core declares that Tick-Tock, the 3A3000 is not a simple upgrade of the 3A2000, the improvement of the main frequency, the fixed-point transmit queue and floating-point transmit queue will inevitably bring the IPC upgrade.
According to the Spec 2006 emulator test released by the Speth, the integer is 9.6 / g.
What is the level of 9.6 / g? In the case of Intel, the author of Auto Parallel, Haswell uses GCC5.1 SPEC 2006's score of 32 points (@ 3.2g frequency). That is, "Xiaomi" can be close to HASWELL?
This is really too "horror". If you can do it, it is the big jump in technology. What is the reason for the SPEC2006 integer 9.6 / g? The root is to On / Off Auto Parallel.
Open Auto Parallel will result in the SEPC2006 integer score gain because it performs the program that the original single-thread execution is performed to multiple processors, and the gain effect depends on factors such as the core of the compiler, CPU. The quite common code does not support auto parallel. Therefore, Auto Parallel is currently more meaningful for SPEC. And "Xiaomi" SPEC2006 integer is up to 9.6 / g, it is very likely because of the results of the Auto Parallel in the test?
(Data network collection, only entertainment)
From the comparison of "millet" and Ivy in the above table, "millet" and Ivy still have a small gap, and the GS464E has a fixed-point transmit queue and floating-point transmit queue relative to Ivy, so in resources. In the case of a limited case, the probability of achieving the Haswell level is very small.
Compare "millet" and GS464E, suppose that the two pipeline efficiency is quite, the author believes that "millet" may be a microstructure of the GS464E a grade, and stronger than ARM Cortex A57. Of course, if the efficiency of the pipeline is poor, "millet" may be inferior to GS464E. The "millet" 32M L2 cache is very likely to be because of the products for servers, even high performance.
At present, Fengtun's "Earth" and Dragon Core 3A3000 are moving in the flow, look forward to the performance of "Earth" and 3A3000 Flow Capital.
Recommended based on ARM development board:
The MCIMX6SX demo board is based on Cortex-A9 using I.MX 6 Series Application Processor Development Leading Multimedia and Image Application.
An EFM32GG development kit is an Excellent platform for evaluating the Giant Gecko microcontroller based on Cortex-M3.
AM335X Getting Started Site A Cortex-A8 processor speeds up development platform for developing smart home appliances, industrial and networking applications.
Xiaobian vomiting blood recommendation:
FPGA entry must read, the preloader and U-boot of the Cyclone V SoC device are generated
FPGA Collaborative Design Collaboration, how to write and run bare metal C procedures for Altera SoCs in ARM DS-5 tools
It is not difficult to have a video, teach you how to optimize Memory mapping interconnection with QSYS system tools.
Altera expert video explanation, easy to implement transceiver design with QSYS and transceiver toolkits in Quartus II software
The most classic teaching video in history, implementing the design of SDI II IP core in Altera Arria 10 devices
Altera official video explanation, how to generate Arria 10 instance design using Quartus 14.1
Our other product: