EDA
Optimising algorithmic performance in processors
By Alexey Shchekin & Roddy Urquhart, Codasip T
oday’s SoC applications often use computationally demanding algorithms such as cryptography, DSP and artificial intelligence. In the past, rapid advances in silicon processing technology enabled general-purpose architectures to tackle many computational challenges by simply moving to the newest technology node. A limited choice of processor architectures was used – mainly Arm – and it was rare and expensive for SoC designers to be able to create custom processors through architectural licenses. The alternative was to create application-specific instruction set processors (ASIPs), but these suffered from very limited ecosystems and difficulty in assembling the broad set of skills required. However, in recent years three enablers have combined to make Custom Compute accessible:
RISC-V ISA
The RISC-V ISA has been a game-changer for many applications due to its open, modular nature. Its openness means that the instruction set can be used without an expensive architecture license and royalties. Its modular nature means that there is a small base instruction set, a wide choice of standard extensions, and a framework for creating custom instructions. With RISC-V being an open standard, there is a low threshold for companies to contribute to its ecosystem.
Processor design automation Processor design differs from other digital hardware because it requires the development of a software toolchain as well as an RTL description. This has led to specialised EDA tools, such as Codasip Studio, which support the description of a processor using an architectural language and automatic generation of the SW toolchain, simulators, RTL, and verification environment.
An example of an architectural language is Codasip’s CodAL which describes:
30 May 2024 Fig 1: Three enablers of Custom Compute
FIR filters are widely used for audio signal processing, mitigating distortions in radio systems (including 5G) and in medical monitoring systems.
Unless the data to be filtered has a very low sampling rate, real-time FIR filtering is beyond the capability of a small microcontroller core. In the past, an SoC designer had the option to: Use a much more complex, expensive microcontroller core which might be over- specified.
Add a dedicated programmable DSP core in parallel.
Add a hardware accelerator with the FIR computation hard-wired.
1. Architectural resources, for example, registers and a program counter.
2. Instruction set. 3. Semantics of each instruction and exception.
4. Micro-architectural implementation. Although it is possible to describe processor hardware with a hardware description language (HDL), any software toolchain development would be disconnected. An architectural language allows a more compact description of the processor and ensures that the generated hardware and software outputs are consistent. Using design automation saves development effort and reduces errors.
Processor IP source code
Designing a processor from scratch requires a broad set of skills in a design team limiting it to a small set of organisations. The main IP licensing model for processors has been to license the RTL with no rights to change the microarchitecture and ISA. This has meant that processor cores tend to be used “as is”. If a processor is designed in an architectural language, the IP vendor can license the core using the architectural source code. This means that if an existing core
Components in Electronics
approximates to the functionality needed, it can be used as a starting point. It can be modified by adding custom instructions, additional registers, or extra ports to the existing design. This enables performance goals to be achieved through an incremental design and verification effort rather than designing from scratch.
Accelerating algorithms Embedded processors have limited resources and yet may need to run computationally intense algorithms such as an FIR filter.
Today with the Custom Compute approach an efficient alternative is to modify an existing RISC-V core to accelerate the FIR filter computation. The FIR filter is defined as: yn=f0sn+ f1sn-1+…+ fN-1sn-N+1 Where {si} are the last N input signal samples or “filtering window”, yn
is the
output and {fi} are the FIR filter coefficients. With a standard RISC-V ISA, the filtering operation will use 2N-1 arithmetical instructions (multiplications, additions), 2N loads from the memory, and ~N comparisons and jumps since it is most likely coded with the usage of for-loop construct. A 32-bit Codasip L31 RISC-V embedded core, with a 3-stage pipeline, was extended Continues on page 32
Fig 2: Principle of FIR accelerator
www.cieonline.co.uk
Page 1 |
Page 2 |
Page 3 |
Page 4 |
Page 5 |
Page 6 |
Page 7 |
Page 8 |
Page 9 |
Page 10 |
Page 11 |
Page 12 |
Page 13 |
Page 14 |
Page 15 |
Page 16 |
Page 17 |
Page 18 |
Page 19 |
Page 20 |
Page 21 |
Page 22 |
Page 23 |
Page 24 |
Page 25 |
Page 26 |
Page 27 |
Page 28 |
Page 29 |
Page 30 |
Page 31 |
Page 32 |
Page 33 |
Page 34 |
Page 35 |
Page 36 |
Page 37 |
Page 38 |
Page 39 |
Page 40 |
Page 41 |
Page 42 |
Page 43 |
Page 44 |
Page 45 |
Page 46 |
Page 47 |
Page 48 |
Page 49 |
Page 50 |
Page 51 |
Page 52 |
Page 53 |
Page 54 |
Page 55 |
Page 56 |
Page 57 |
Page 58 |
Page 59 |
Page 60 |
Page 61 |
Page 62 |
Page 63 |
Page 64