"Beijing time on December 12th, Intel held architectural day activities in Santa Clara. In five hours of speech, Intel unveiled the 2021 CPU architecture road map, next-generation core graphics card, graphical business future, new 3D packaging technology, even part of the 2019 processor new architecture veil.
Dedicated consumer level CPU road map
For a short period of time, the industry has always been looking forward to seeing the Future Architecture Roadmap of Intel, but since Skylake has been in a state of being hidden. In recent months, Intel published a part of the data center product roadmap, including Cascade Lake, Cooper Lake, and Ice Lake, and the future generations, but consumer products are still difficult.
In this architectural day event, Intel has finally brought the consumer-level PC processor architecture roadmap and Atom architecture roadmap.
On the high-performance Core Series Line, Intel lists three new codes in the next three years: Sunny Cove, Willow Cove and Golden Cove, which will be listed in 2019 from 2019 (PS: You guess) from 2019 in the next three years: Sunny Cove, Willow Cove and Golden Cove. Will not pigeons ^ _ ^).
It is reported that the Sunny Cove architecture is designed to improve the calculation performance and reduce power consumption under the general computing task. It will have AVX-512 units, and contain new features that can accelerate artificial intelligence and encryption, will become the next generation of Intel. The infrastructure of the PC and server processor.
Subsequent Willow Cove is in 2020 on the roadmap, it is likely to be 10nm. Intel lists the key points here as cache redesign (possibly l1 / L2 adjustment), new transistor optimization (manufacturing) and other security functions may refer to further enhancement of new types of channel attacks.
Golden Cove is in 2021 in the chart, the process process is still a question mark, may be 10nm or 7nm, Intel will further enhance its single-threadability and artificial intelligence performance, and add potential network and AI in the core design. Function, security features seem to have improved.
The ATOM series of low-power processor architectural road maps are slower than the rhythm of the Core series, taking into account its history, this is not surprising. Given that Atom must adapt to various equipment, the industry is more expected to provide a wider range of functions, especially SOC.
The architecture that is about to be launched in 2019 is Tremont, focusing on single-threading performance, web server performance, and boosting battery life. Tight after Tremont will be Gracemont, Intel is listed as a 2021 product, which may have a wider vector processing unit or support new vector instructions.
From the road map, there will be a "xxxmont" series after Gracemont, Intel is studying the performance, frequency and characteristics of this new core in 2023.
The above is the name of the architecture, and the actual product may have an additional code, which is "XXX-Lake" named in the Core series, such as the processor code Ice Lake is the CPU of the Sunny Cove architecture. The kernel is combined with the Gen11 core graphics card.
Another message in the event is that the architecture of Intel is likely to be separated from the process process. Dr. Raja Koduri and Murthy Renduchontala said that in order to make the product line have certain elasticity, the latest products of these architectures will be pushed to the market at the best process process available at the time.
Although there is no clear, this should mean that the "Tick-Tock" strategy that has already been named is thoroughly swept into the historical trash can, and the future of certain core design may become normal.
Snown Sunny Cove Architecture
Every time I hear the new processor architecture, everyone is most expected to analyze the new architecture, as well as changes in the previous generation.
Since the first launch of Skylake in 2015, INTEL has already launched Kaby Lake, Coffee Lake, and Coffee Lake, three generations of small changes, because each generation is not big, and the player is called "squeezing". Although this Intel showed a new Sunny Cove architecture, unfortunately, its information is not comprehensive, mainly focused on the rear end portion of architecture design.
Intel divides its micro-architecture to two different parts: universal performance improvement and special purpose performance improvement, universal performance enhancement refers to the increase in the throughput or frequency of the original IPC (per clock command), and the increase in IPC may come from core wider ( Each clock performs instructions more), deeper (more parallel per clock) or more intelligence (more than data transmission through front-end), and frequency is usually functions for implementation and procedures, while special use performance improvements can be accelerated by other acceleration Methods (such as dedicated IP or dedicated instructions) to improve certain workloads used in specific schemes.
It is reported that Sunny Cove has an all-round improvement in both common performance and special purpose performance. At the rear end portion of the architecture, Intel has already made improvements including increasing the cache size, increasing the core execution width, increasing the L1 storage bandwidth.
The L1 data cache of the Sunny Cove architecture is upgraded from 32KB to 48KB. When the cache capacity is increased, the cache has not hit the probabilistic probability to decrease in the proportion of square roots, so the L1 cache of the Sunny Cove architecture can be reduced by 22%. At the same time, the SunNy Cove Architecture Core and Xeon processors will also increase their current 256kb and 1MB, and the specific capacity is not known.
In addition, micro-operation (UOP) cache and secondary TLB are not behind the rear end, but their capacity has increased compared to current, which will help the machine address conversion. Other changes can also be seen in the figure, such as an execution port increases from 8 to 10, allowing more instructions from the scheduler at a time; the scheduling of the reordering buffer is also increased from the 4 instructions from each cycle to 5 instructions; Port 4 and port 9 link to a loop data storage, double the bandwidth, but the AGU storage function has doubled, which will help to increase the L1-D size.
The execution port of the Sunny Cove architecture has also changed significantly. For details, see the following figure:
We see Intel's integer part of the integer to help improve performance losses by providing frequent memory calculations, or helping to provide constant offset by providing frequent memory calculations. High performance array code. Port 1 gets the MUL (multiplication) unit from Skylake port 5, which may be used to rebalance, but there is an integer divider unit here. This is a small adjustment. Cannon Lake also has a 64-bit IDIV (with symbol integer division) unit in its design. In this case, it reduces 64-bit integer division from 97 clocks (mixed instructions) to 18. A clock, Sunny Cove may be similar to.
In terms of integer arithmetic units, the multiplication unit of port 5 has become a "Mulhi" unit, in other architectures, it will leave the most important semi-byte in the register for further use, but do not certainly determine it in the core of the sunny Cove What is the location?
In terms of floating point arithmetic units, Intel adds the wafer resource, which is for the elimination of bottlenecks in the code. Intel does not illustrate the function of FMA (melting operation) in the core floating point calculation section, but since the core has an AVX-512 unit, there should be at least one interact with each other in these FMA. Cannon Lake has only one 512-bit FMA, which is likely to be here, and Xeon's scalable version may have two FMA.
Additional updates listed in Intel include the improvement of the branch predictor, and the payload delay is lowered by TLB and L1-D. Some people pointed out that these improvements cannot help all users, and may have only a new algorithm to use the core capabilities of these specific parts.
In addition to the differences in architectures, Sunny Cove also adds new instructions to help speed up professional computing tasks. With the emergence of AVX-512 units, the new architecture will support the IFMA (with symbol melting operation) instructions for large - count calculations, which are very useful in cryptography. Sunny Cove also supports Vector-AES, Vector Carryless Multiply, SHA, SHA-NI, and Galois Field instructions, which are also basic build blocks in some elements of cryptography.
Sunny Cove supports a larger memory capacity, and its main memory page table increases from 4 to 5 layers, supports up to 57bit linear address space and up to 52bit physical address space, which means that server processor can support single slots 4TB memory .
According to the Xeon Roadmap before Intel, Sunny Cove will be listed in the server field with Ice Lake-SP in 2020. For safety, Sunny Cove has multi-key full memory encryption and user mode instruction prevention.
Gen11 core graphics card
In 2015, Intel launched a Skylake processor using Gen9 core graphics, but the core graphics card for Kaby Lake and Coffee Lake was only Gen9.5 instead of Gen10. In fact, the Intel 10nm Cannon Lake processor should correspond to GEN10, but Intel has never released a PC side CANnon Lake processor with a core graphics card.
Today, Intel Chief Architect, Core and Visual Computing Group Senior Vice President and Edge Computing Solution General Manager Raja Koduri directly published a new Gen11 core graphics card and reaffirmed the plan to launch a separate graphics processor in 2020.
According to the roadmap, the Gen11 core graphics card will begin with the 10nm processor in 2019, with 64 EUS (enhanced execution units), the scale is 2 times the previous Gen 9 core graphics card, floating point calculation performance exceeds 1TFLOPS. These 64 EUS are divided into four slices, each of which consists of 2 8 EUS sub slides, each with instruction cache and 3D sampler, while large 4 slices have 2 media samplers, 1 Pixelfe and additional loading / storage hardware.
Intel did not disclose how much about how to improve EU performance, but the floating point operation unit interface inside the EU is redesigned, supporting fast (2X) FP16 performance. Each EU supports 7 threads as previously, which means that the entire GPU has 512 concurrent pipelines. Intel indicates that the memory interface has been redesigned, and the GPU's L3 cache increases to 3MB, compared to Gen9.5, 4 Multiplier.
A major improvement in the Gen11 core graphics card has finally supported tile rendering, which allows Intel to become a PC GPU vendor that implements this feature after the NVIDIA and 2017 AMD. Although tile rendering is not a panacea to solve the GPU performance problem, optimized good tile rendering can adapt to the bandwidth limit of the core graphics card.
At the same time, Intel's non-destructive memory compression technology has also improved, and performance can be increased by 10% in optimal case, and the average can be increased by 4%. The GTI interface now supports reading and writing 64 bytes per clock to increase throughput to cooperate with redesigned memory interfaces.
Gen11 core graphics card also supports Intel brand new multi-rate coloring technology Coarse Pixel Shading, which is similar to NVIDIA variable pixel, which allows GPU to reduce the rendering operation required for the shadow part of the pixel. Intel demonstrates two demonstrations for CPS, where pixel shadows are used as a function related to camera distance and screen center, and the rendering amount is reduced when the object is far from the camera or the screen center, its design is to help VR to achieve gaze point rendering, etc. Function, Intel indicates that the game can increase the frame rate of about 30% after supporting this technology.
Raja Koduri announced the new product brand of Intel discrete graphics business: XE, currently referred to as "Gen12" series, will cover all areas from the client to the data center from 2020, but also cover the core The graphics card solution, Intel hopes that XE can bring competition to the best product to the competitors from getting started to mid-range, and then to enthusiasts and AI.
XE will start from the 10nm node, lay the foundation for the next generation of graphics, and will follow Intel's single stack software philosophy, that is, hope that software developers can use the CPU, GPU, FPGA and Ai, all of which use the same set of APIs, indicating Intel is ready to develop around a brand.
As part of architecture day, Intel has conducted a large number of chip demonstrations on-site, which is said to be based on new Sunny Cove core and Gen11 core graphics card. The current demo involves two parts: 7-ZIP applications and Tiequist 7 games.
The 7-zip project is relatively straightforward, and the same frequency performance of the demonstration machine increases by 75% compared to the Skylake platform, showing the special purpose performance enhancement of the new instructions such as Sunny Cove architecture for new instructions such as SUNNY COVE architecture. In Tiequad 7, Sunny Cove + Gen11 demonstration machine is smoother compared to Skylake + Gen9, completely exceeds the minimum requirements of 30fps.
Change the foveros 3D package for chip manufacturing
People who focus on semiconductor chip should be clear. Most of the CPUs and SOCs currently produced are molds based on single chip, that is, before packaging and entering the system, the monolithic wafer has the desired thing. In addition, there are multi-chip packages with shared connections, as well as carriers or embedded bridges that connect different chips through high-speed interconnects.
In modern chip design, one of the biggest challenges is to minimize chip area, which can reduce cost and power consumption, and make it easier to implement in the system. However, one of the shortcomings of large single-chip or multi-chip packages is that one of the shortcomings of large single-chip or multi-chip packages is too far, so Intel is ready to introduce 3D stacks into the mass market.
Be
Raja introduction, inTel has been focusing on high-performance process nodes for decades, trying to release their kernel performance as much as possible. In addition, Intel also runs the IO optimization process node with a similar rhythm, but is more suitable for the PCH or SOC type.
126x and 127x are internal numbering systems of Intel process node technology, not distinguishing node variants with "+" suffix. Raja demonstrates the existing 2019 process technology, calculating the core aspects of 1274 processes, and there are 14 nm 1273 processes in IO, and the Foveros 3D stacking technology process code for this introduction is P1222. Looking forward to the future, Intel will expand its node foundation so that it can override more power and performance points.
In order to achieve this, a method is to select the best transistor for each case in each case, whether it is CPU, GPU, IO, FPGA, RF, and other things, as long as the correct package is used. They can be placed together for optimal optimization.
This is the land of Foveros's use. Foveros is the newly launched active carrier technology in Intel, which is designed to introduce EMIB (embedded multi-chip interconnect bridge) 2D packaging technology, which is more suitable for small-sized products or high memory bandwidth requirements. In these designs, the power of each bit transmitted is very low, and the package technology is to be processed, the unevenness spacing is reduced, the uneven density is increased, and the chip stacking technique. Intel said that Foveros is ready to produce large-scale production.
The first iteration of this technology is not as complex as the slide above, but only a set of CPU cores connected to the following PCH, but Intel can use different transistor types on different chips, such as using a 22ffl process in one piece. Place a set of 10 nm CPUs on the carrier.
Intel showed a FOVEROS chip on the architecture day, which uses a 22ffl IO chip as an active carrier plate, and a 10nm chip is connected to TSV (silicon vent technology), which contains 1 Sunny Cove core and 4 Atom kernels (possibly Is TREMONT). This micro-chip size is 12 * 12. The standby power is only 2MW, it seems to be a mobile device.
As can be seen on the INTEL's slide, the "BIG CPU" of the Sunny Cove kernel has 0.5 MB exclusive L2 cache, 4 small Atom kernels have 1.5MB shared L2 cache, two sets of core shares 4MB L3 cache. The chip also integrates 64EUS Gen11 core graphics card, four-channel LPDDR4 memory controller (4 * 16bit), and MIPI (mobile industrial processor interface) supporting DisplayPort 1.4.
Jim Keller said that Intel is trying to use Foveros technology to make many new stuffs to see which may become a good product, so more FOVEROS products should be seen in 2019 and 2020.
Some surroundings
In this schema day, the most "no passion" part should be discussed on the data center product. Intel has announced the next two products in the enterprise market. Cascade Lake and Cooper Lake are all based on 14nm, focusing on enhancing security and helping accelerated AI instructions, and then there will be 10nm Ice Lake Scalable, but also This is only.
However, Intel in the event confirms that ICE Lake will build a SUNNY Cove architecture and showcase the package of Ice Lake Xeon 10nm processor, is a new message.
In addition, Intel also introduces pride technology, one API software, and deep learning reference stack.
One API Software: Intel announces the launch of the "One API" project to simplify the programming of various computing engines across CPU, GPU, FPGA, artificial intelligence and other accelerators. The project includes a comprehensive, unified development tool combination to match software to hardware that maximizes software code. The public release version is expected to be released in 2019.
Pride Technology: Intel Pride Data Central Level Space As a new product, integrating internal orientation performance, and has a large capacity of data. This technology enables more data to get a faster processing speed by placing more data relatively near CPUs, making the more data set in artificial intelligence and large databases. Its large capacity and data have reduced the delay loss when the storage is visited, thereby increasing the performance of the workload.
Intel Pride Data Central Level Delivery Memory is read for the CPU providing cache line (64b). In general, when the application is oriented to proud persistent memory or requested data, the average idle read delay of proudly-lasting memory is approximately 350ns. If the scale is realized, the average idle read delay delay of the proud data center-level solid-state disk is about 10,000ns (10 μs), which will be significant improvement. In some cases, when the requested data is in the DRAM, the response speed of the memory subsystem is expected to be the same (less than 100 ns) as DRAMs (less than 100 ns) in the DRAM when the requested data is guided by the memory controller passed by the CPU.
Hot recommendation:
Intel RealSense Depth Camera D400 Series "More Convenient Secret"
Analog Devices Eval-ADXL362 Evaluation Board to Develop Designers More Choice
Cypress Semiconductor PSoc6 BLE development kit, beautiful design is about to happen
STMicroelectronics VL53L1X flight time ranging sensor, absolute ranging fearless color and reflection
NXP I.MX 8MQUAD Evaluation Kit MCIMX8M-EVK, a high-performance development tool for unpacking
Infineon IM69D120 and IM69D130 Xensiv MEMS microphone, designed for low distortion and high signal-to-noise ratio
Original address: https://www.eeboard.com/news/intel-2018-cpu-gpu/
Search for the panel network, pay attention, daily update development board, intelligent hardware, open source hardware, activity and other information can make you master. Recommended attention!
[WeChat scanning picture can be paid directly] "
Our other product: