Up to 30% reduction in power and 25% area reduction is claimed over previous generations of Imagination GPUs. “B-Series is a further evolution delivering the highest performance per mm2 for GPU IP and offering new configurations for lower power and up to 35% lower bandwidth for a given performance target,” it said.
The flagship part is the quad core BXT 32-1024 MC4 (right), which can offer 6Tflop of 32bit floating point calculation, 192Gpixel/s or image processing or an artificial intelligence processing peak of 24Tops – in 1.5GHz form on PCI express card.
“Out Series 5 GPUs had multi-core options. Since then we have designed single high-performance cores,” Imagination product manager Andrew Girdler told Electronics Weekly. “Customers liked single high performance cores, but wanted to go beyond for automotive and data centre applications, and also dual cores for phones.”
In the B-Series multi-core implementation, individual cores are essentially self-contained cores that can operate independently, but are designed also to work together on the same image if needed, according to Girdler. He added that the architecture allows a quad core to get almost 4x single core performance. “That is why we focused on a maximum of four cores.”
Cores can always work together, but a core has to have a ‘firmware processor’ (See ‘FW’ in diagrams) to run independently.
Imagination’s existing ‘IMGIC’ on-chip data compression hardware has overhauled for Series-B. It is implemented to compress data leaving the GPU and then decompress it when it arrives at the display controller – reducing internal system traffic and therefore power dissipation.
“It is significantly smaller – an order of magnitude – so it can be put on smaller chips – even the simplest core,” said Girdler. “The choice used to be lossless or 50% compression. We have added 75% compression for noisy images that would lose too much with 50%, and 25% compression to use if the system suddenly looses bandwidth, for example if something else is starting up for a few frames.” Compression rates can be shifted dynamically or fixed.
There are four branches to the series:
BXE – optimised for low area and low cost (an earlier xXE is to be found in the Amazon FireStick). Fill-rate focused, with 25% area saving over Imagination’s 9XE or AXE cores. Scaling is from 1 to 16pixel/clock and 720p to 8K resolution. There is only one FW in any design, so multi-cores always work together. Applications foreseen in user interface rendering, low-end gaming, digital TV, high-end microcontrollers transitioning from 2D graphics engines.
- BXM – Performance balanced between fill-rate and computation for mid-range mobile gaming and complex user interfaces, in digital TV and mid-range mobile phones.
- BXT – Optimisation is performance/mm2. High performance for high-end phones, Chromebooks and data centres for cloud gaming. Includes the flagship part above. Up to two cores in phones, four on PCIe card in data centres. Intended to compete in data centres on power consumption.
- BXS – Optimised for automotive use. Functional safety throughout aimed at ISO26262. The firmware processor is new in XS cores: Risc-V-based instead of MIPS-based in the rest of the B-Series. BXS is “the most advanced automotive GPU IP cores ever created”, according to the company, being up to 60% faster that previous Imagination automotive. Range covers entry-level to premium, for user interfaces, infotainment, digital cockpit, surround view and, through multi-Tflop computation configurations, autonomy and automated driving (ADAS).
BXS 32-1024 is the highest performance automotive core, with 2Tflop for automated driving Multiple neural network accelerators are included.
The BXS automotive cores includes what Girdler describes as a tool box of safety mechanisms that customers can pick from to meet industry-standard coverage standards, including the company’s ’tile regional protection’ (TRP, Imagination processed images as a series of square tiles) that gives displayed tiles tagged as safety-related higher processing priority that non-critical tiles when an image is displayed.
Million polygon meshes can be handled in some BXS cores, and multi-sample anti-aliasing can be used to smoothing edges for better images. Additional tolerance has been built in against varying memory access latency, said Girdler, for situations where the GPU looses priority over safety-related activities. Every BXS core gets ISO26262-compatible eight-way hardware virtualisation, he added.
At the same time as announcing Series-B GPUs, Imagination said that it has high-end ray-tracing hardware in development, scheduled to appear its otherwise un-announced C-Series products.
“Level two ray tracing is found in consoles and mobiles,” said Girdler, “level three is seen in desktops. Customers want level four, but there is nothing for them in the industry today. Level four ray tracing will be in Series C GPUs.
USE – unified shading cluster = computation core
TPU – texture processing unit
FW – ‘firmware’ processor – allows a core to operate independently