Studying Apple's hardware is always a painful thing, because Apple has always been extremely closed in this respect, and rarely disclose the underlying information, so most of the time can only rely on guessing.
A7 is Apple's first 64-bit mobile processor in the industry, but when the iPhone 5S was first launched, we knew almost nothing about it. At that time, it was speculated that it was only a simple and improved version of the previous generation A6 Swift architecture, which solved the problem of memory delay, but the result was a big mistake and seriously underestimated Apple.
When the iPad Air was released, there was more information. For the first time, the architecture code was Cyclone, and some information about the architecture was also known:
For now, the peak emission width has reached six! That is equivalent to twice the A6 and Krait, and it can be three times more when mixing different commands.
The common emission limitation of floating point and integer is basically non-existent, and up to four integer additions and two floating point additions can be transmitted in parallel. Up to two loads or stores can also be executed per clock cycle.
Recently, AnandTech finally found Apple's official LLVM document, revealing and confirming a lot of details, although it is still vague, but it is impossible to go deeper in the Apple world.
According to this document, the architectural specifications of A6 and A7 are as follows:
Many of the specs that were previously guessed and speculated are correct. The A7 Cyclone is indeed a very wide architecture that can decode, transmit, execute, and reclaim up to six instructions/micro-operations per clock cycle, up to a maximum of three in A6 Swift.
A7's reordering buffer reached an astonishing 192, more than four times that of the previous generation, and coincidentally just like the Intel Haswell architecture. The branch prediction error penalty has also increased, but not so much, and it is exactly in the same scope as Intel Sandy Bridge and its architecture.
In other words, Apple's architecture is already at the same level as the Intel desktop product architecture in some respects.
The doubling of the level 1 cache capacity is understandable, and on the execution side, the integer ALU unit, the load/store unit, and the branch unit are also doubled, and for the first time, an indirect branch unit and at least one floating point pipeline are added, which can be paralleled. Perform three floating point operations, but note that the third floating point / NEON pipeline is used for division and square rooting, and the multiplication can only be paralleled up to two.
The buffer size corresponding to each unit is also basically clear, and should correspond to the micro-ops of each unit, but there should be no unified scheduler before all execution units, but a static partition buffer is placed before each port.
Weighing Scale,Weighing Machine,Body Weight Scale,Weight Measuring Machine
GALOCE (XI'AN) M&C TECHNOLOGY CO., LTD. , https://www.galoce-meas.com