School of Science and Technology 科技學院
Electronic and Computer Engineering 電子工程學系

RISC-V SIMD Simulation

Student Law Tsz Chung
Programme Bachelor of Science with Honours in Computer Engineering
Supervisor Mr. Bruce Tong
Year 2021/22

Abstract

This project creates a simulator for RISC-V V extension. RISC-V is an emerging Instruction Set Architecture for microcontrollers, accelerators, and processors. The V extension adds vector-based SIMD capabilities to RISC-V. SIMD capabilities accelerates a wide array of task like media processing, graphics processing, scientific calculation and machine learning, therefore the adoption of the V extension is critical for RISC-V's adoption as personal computer processor and as fixed circuit accelerators.

RISC-V has seen widespread interest as an alternative to proprietary Instruction Set Architectures like x86 and ARM, for different reasons like lower cost, flexibility, the loyalty-free nature and geopolitics restrictions. As the V extension is a new extension, this project aims to create an open-source simulator as a tool for designing RISC-V based processors with SIMD capabilities, therefore assisting in the rising interest and adoption of RISC-V.

Demonstration Video

Objectives

Objectives

The first objective of this project is building a Java based Simulator with the described hardware components of for the “V” extension. This includes the vector registers that stores data for operation, context registers that states the states of the vector registers, parameter registers indicating like type and length of data stored within the vector registers, state registers that states operation parameters like rounding mode and saturation, and the Vector Control and Status Register. 

The second objective is the implementation of instructions described in the “V” extension specification on the Simulator, executing incoming instructions by manipulating the Simulator's components to perform the calculation. This includes defining the states of the registers during the execution of instructions, Implementing the mapping of vector elements from software to the vector register according to its states, implementation of decoder that decodes following the defined instruction formats, implementation of each instruction on the simulator, and implementation of memory operations. This also includes implementation of the handling of errors and exceptions. 

The third objective is a logging engine that extracts the states of the Simulator as well as tracing instructions. It will be able to trace the operation of the Simulator performing option and log the manipulation of its components during the operation to document information like cycles required to complete an execution, intermediate states of registers during the operation, and trace the operation of the Simulator in order to recreate any states the Simulator was in during its operation at any cycle. 

The fourth objective is the validation of this Simulator against the reference simulator and documentation released by RISC-V International. This includes the creation of testing steps and conditions that probes the operation of the developed Simulator so it still adheres to the specification in edge cases, and the application of those steps and conditions to each instruction's execution on the Simulator.

Methodologies and Technologies used

This project will involve extensive studies on the RISC-V ISA via its specification. As a Simulator must recreate each component of a RISC-V processor's hardware faithfully, it will require extensive knowledge on the defined hardware, the states of the hardware during operation, the manipulation of hardware from each instruction, and more. 

Analysis of The Specification

This project will involve usage of a recently released specification. As it is very recent, analysis into the specification will be carried out in order to understand its working principle. To faithfully simulate it as well as conducting validation, each instruction will be analyzed. Expected results from each instruction from each type of operand will be formed, and edge cases will be identified for validation purposes. 

Adoption of Quantr Foundation CPU Simulator

This project will use components of the Quantr Foundation CPU Simulator. In this step, components of this CPU Simulator, including the virtual memory system, the instruction decoder and general purpose registers will be adopted. This project will fork and maintain a branch of the Quantr Foundation CPU Simulator, and expand upon it with new virtual hardware and virtual instructions to build V extension simulator. 

Implementation of Instructions

In this stage, each instruction will be implemented to the specification's requirement and to the expected outcome found from the analysis stage. This includes creating a parser rule set to parse instructions into opcodes and programming the simulator to carry out the correct operation according to the opcode.

Validation of Instructions

In this stage, each instruction will be validated against a reference RISC-V Simulator, the Spike simulator. Tests with cases and edge cases identified in the analysis stage will be used, and the project may adopt available validation testing suite that may be developed in the duration of this project as the specification is only recently considered stable enough for such work to begin.

Implementation

The RISC-V V Extension
This RISC-V V Extension adds SIMD capabilities to the RISC-V ISA. This project has the goal of analyzing the specification of the Extension between 18th Dec 2021 to 20th Jan 2022. The goal is met with the task finishing on 14th Dec 2021 with Progress Report #2. The Extension describes 32 vector registers in the hardware processor for storage of vector values. A Vector Register Group groups vector registers together to form a single operand for use in a vector instruction. Each vector register has numbers of bits VLEN between 8 and 65536, and must be a power of 2, depending on the processor's design. The vector length multiplier LMUL indicates the number of vector registers grouped together for a single operand. LMUL ranges from the fractional value 1/8 to 8. The vector selected element width SEW indicates the length of each element within a vector register group, ranging from 8 to 64 in powers of 2. Each vector operand is a vector with elements, and they are organized with an effective element width EEW and an effective LMUL (EMUL). SEW and LMUL of a vector register group are configured to fit those requirements. When the instruction has a widening or narrowing property, the destination group has a different EEW and EMUL.
Major Opcode Base address Destination address Opcode
LOAD-FP rs1 vd 0000111
STORE-FP rs1 vs3 0100111
OP-V Vs1/rs1/immediate vd/rd 1010111

When an instruction is executed, the vstart register specifies the index of the first element that updates the destination vector register, and the vl register specifies the current evctor length. The instruction operates on the source group, and depending on the instruction, it updates the destination group.

The elements before the vstart specified element are considered the prestart elements, the elements after and including the vstart specified elements while having an index less than vl are considered the body elements. The elements past vl are consdiered the tail elements. The prestart and tail elements do not update the destination vector group upon instruction execution, and do not raise exceptions.

Within the body elements, there are active and inactive elements dictated by masking. The active elements have an enabled current mask at its position, the inactive elements a disabled one. The inactive elements do not update the destination group. The mask can be overridden if the masked agnostic register is on, where inactive elements can be overwritten with 1s.

The format of Vector instructions are under three major opcoes, LOAD-FP, STORE-FP, and a new OP-V. For instructions under LOAD-FP, rs1 is the base address and vd is the destination of the load. For instructions under STORE-FP, rs1 is the base address and vs3 is the data to be stored. For instructions under OP-V, depending on the operand type, the base address can be vs1 for vector operands, rs1 for scalar, or it could be immediate. The destination addresses are vd or rd.

The Simulator

This project builds with the open source Quantr Foundations RISC-V CPU simulator as a foundation. This segment discusses the adoption of the simulator and development toolchain for this project.

Quantr Simulator Projects

The Quantr Simulator is divided into two parts, the Assembler and the Simulator. The Assembler assembles RISC-V assembly code into binary machine code and disassembles binary machine code back to assembly code. The Simulator simulates a RISC-V processor including its registers and memory, and conducts instruction based on machine code by manipulating the registers and memory.

This project has the goal of adoption between 20th Jan 2022 and 25th Feb 2022. This has been amended to the 1st April 2022 due to a change in workflow. In order to make sure the implementation of instructions go smoothly, this project is setting up Spike, a simulator this project will use as reference. This work was previously planned to be done in the validation phase, and is now moved forward, therefore pushing back the expected date of completion for the adoption. 

Integrated Development Environment

The simulator is java based and runs in the Netbeans IDE. It is hosted on Gitlab. This project forks the simulator project in order to build V extension support into it.

This project has the goal of adoption between 20th Jan 2022 and 25th Feb 2022. This has been amended to the 1st April 2022 due to a change in workflow. In order to make sure the implementation of instructions go smoothly, this project is setting up Spike, a simulator this project will use as reference. This work was previously planned to be done in the validation phase, and is now moved forward, therefore pushing back the expected date of completion for the adoption. 

Integrated Development Environment

The simulator is java based and runs in the Netbeans IDE. It is hosted on Gitlab. This project forks the simulator project in order to build V extension support into it.

The Simulator expects a windows install of the xPack GNU toolchain to be installed to fully function. As xPack didn't support V extension at the time, it is replaced with Imperas' GNU toolchain based in Linux and operated via Windows Subsystem of Linux on Windows.

Tools

The simulator uses ANTLR4 for parsing instructions into opcodes, xPack is a prerequisite for the tool. ANTLR4 is a parser that identifies texts that meet certain rules. It will be used to identify input instructions and parse them as binary instructions.

Quantr has created a custom plugin that facilitates testing with validation packages in bulk without needing to manually validate with the output. It also offers features for creating ANTLR4 rules like syntax highlights and a parser tree that will be useful in the instruction implementation

Without the plugin, each validation package ran needs manual validation with the output terminal.

With the plugin, multiple validation packages can be ran and the results will be shown together without needing to individually check the output terminals, automating the process. It results in flexibility with the IDE interface and reduced chance of manual error.

The Imperas RISCV GNU toolchain is used for validation purpose in this project. As RISC-V toolchain is used for testing and development, it is required to perform instruction implementation and validation​. However, the official toolchain was not compatible with no explicit documentation of supporting V. Furthermore, several draft versions of V had conflicting information on support, causing the same to apply on the toolchain.

Several suggestions the author found on forums and issue discussions include using a different branch of the toolchain and using LLVM in GCC's stead, however they were attempted to no avail. The author was unable to self compile a compatible toolchain, and prebuilt toolchains were mostly not up to date, including the xPack toolchain used by Quantr and public toolchain provided by chipmaker SiFive​.

This project has elected to use Imperas' prebuilt toolchain hosted on https://github.com/Imperas/riscv-toolchains/tree/rvv-1.0.0. It is sparsely referenced and was only discovered by the author by early April 2022, and the work on implementation and validation significantly inhibited before. The implementation phase and validation phase is therefore combined.

Results and Discussion

This project has produced a partial simulator for RISC-V's V Extension. RISC-V is an open-source instruction set that is gaining strong traction in recent years in education, research and production owing to its open source nature, flexibility and simplicity. Due to time constrain as well as difficulty with working with a newly ratified extension that is ratified during the duration of this project, the implementation is not complete. An emulated CPU compatible with V extension is successfully implemented with the vector registers and vector control and status registers successfully and fully implemented, however not all instructions are implemented.

The first deliverables is the assembler. It parses RISC-V V assembly instructions correctly into machine code. The second deliverable is the disassembler. It does the reverse of the assembler and parses RISC-V V machine code correctly into machine code. The last deliverable is the simulator that can correctly emulate a RISC-V V compatible CPU with its register to spec, as well as manipulate that memory according to the instruction.

This assembler and disassembler will be useful as a basis for future development of both meeting a future V specification update and for custom instruction sets that wishes to share components of its instruction with V. There are other candidate instruction extensions that may be desirable for a custom processor, like the P Packed SIMD instruction set this project has evaluated, a previously published and widely distributed 0.7.1 draft 0.7.1 of the V extension that has been implemented in numerous processors like the Allwinner D1 Soc using the Alibaba made C906 core [10]. Custom instructions have also been widely discussed, as RSIC-V is extensible. This illustrates the flexibility of the RISC-V ISA for specific and custom processor designs. The V instruction specification itself also discusses the potential need of future expansions numerous times within itself. This parser can serve as a basis for future custom extensions as it will include all instructions under the major opcode OP-V that may be expanded upon with more instructions.

This project is limited in scope compared to full simulator products like Spike and full system emulators like QEMU, both offering simulation functionality similar to Quantr's simulator. This project does not product a standalone functional product, and is contributing a component to an existing open source project. While the project completeness is impeded due to tool availability and time constraints, the author plans on finishing the implementation of all instructions after this project as a personal project.

Conclusion

This project created one component in a toolchain that supports development of RISC-V Vector hardware. It adds support for the V extension to a portable and accessible open source simulator, which is a tool for designing RISC-V based processors. Therefore, this project has largely met its goals.

While work is incomplete due to time and tool limitations, it largely completed its objectives. Therefore, the goal of creating a more approachable simulator is achieved. This project's implementation is windows based and FOSS, making it much more approachable for local education and development purpose. This project's work has real potential to aid real hardware development work, in fact, Quantr Foundation has the goal of designing and fabricating a RISC-V CPU,  where this work will be most likely be used for that purpose

This project aids development of RISC-V hardware by adding a tool to developing SIMD capabilities with the RISC-V ISA. SIMD is an essential part of signal and information processing, and is used to accelerate key workloads like media processing of video and image as well as machine learning acceleration. This project's work would enable the tool to design both general processors with SIMD capabilities for personal computers targeting media processing, as well as accelerators specializing in vector acceleration that would potentially lack support for general purpose extensions like the Hyperviser set. It will aid SIMD development with RISC-V facilitates design of specific accelerators as well as general purpose CPUs, aiding development of RISC-V based hardware that solves relevant problems. 

As the Vector extension is very recent, being ratified from a draft within the duration of this project, it is sorely lacking support in an ecosystem of tools, which serves are a plentiful pool of future work. Other parts of the hardware design toolchain like hardware description language based tools, the physical implementation of a processor, the surrounding components in a processor around the core like a memory system and an IO interface are all topics for future work for hardware development of RISC-V. The RISC-F foundation has recently formed a strong alliance with Intel's Intel Foundry Service initiative, with its IP offered as part of the newfound foundry's offering, demonstrating the real world demand for these designs.

Another category of future work is software support with mainstream computing language and APIs like CUDA or OpenCL could be future efforts in expanding support for the V extension. Both of those APIs are popular tools for programs accelerated with SIMD capabilities, supporting existing software tools is a crucial step in becoming a viable alternative to existing processors in practical application. AMD's HIP initiative is a similar initiative to add CUDA support to AMD graphics processors.

Jonathan Chiu
Marketing Director
3DP Technology Limited

Jonathan handles all external affairs include business development, patents write up and public relations. He is frequently interviewed by media and is considered a pioneer in 3D printing products.

Krutz Cheuk
Biomedical Engineer
Hong Kong Sanatorium & Hospital

After graduating from OUHK, Krutz obtained an M.Sc. in Engineering Management from CityU. He is now completing his second master degree, M.Sc. in Biomedical Engineering, at CUHK. Krutz has a wide range of working experience. He has been with Siemens, VTech, and PCCW.

Hugo Leung
Software and Hardware Engineer
Innovation Team Company Limited

Hugo Leung Wai-yin, who graduated from his four-year programme in 2015, won the Best Paper Award for his ‘intelligent pill-dispenser’ design at the Institute of Electrical and Electronics Engineering’s International Conference on Consumer Electronics – China 2015.

The pill-dispenser alerts patients via sound and LED flashes to pre-set dosage and time intervals. Unlike units currently on the market, Hugo’s design connects to any mobile phone globally. In explaining how it works, he said: ‘There are three layers in the portable pillbox. The lowest level is a controller with various devices which can be connected to mobile phones in remote locations. Patients are alerted by a sound alarm and flashes. Should they fail to follow their prescribed regime, data can be sent via SMS to relatives and friends for follow up.’ The pill-dispenser has four medicine slots, plus a back-up with a LED alert, topped by a 500ml water bottle. It took Hugo three months of research and coding to complete his design, but he feels it was worth all his time and effort.

Hugo’s public examination results were disappointing and he was at a loss about his future before enrolling at the OUHK, which he now realizes was a major turning point in his life. He is grateful for the OUHK’s learning environment, its industry links and the positive guidance and encouragement from his teachers. The University is now exploring the commercial potential of his design with a pharmaceutical company. He hopes that this will benefit the elderly and chronically ill, as well as the society at large.

Soon after completing his studies, Hugo joined an automation technology company as an assistant engineer. He is responsible for the design and development of automation devices. The target is to minimize human labor and increase the quality of products. He is developing products which are used in various sections, including healthcare, manufacturing and consumer electronics.

Course Code Title Credits
  COMP S321F Advanced Database and Data Warehousing 5
  COMP S333F Advanced Programming and AI Algorithms 5
  COMP S351F Software Project Management 5
  COMP S362F Concurrent and Network Programming 5
  COMP S363F Distributed Systems and Parallel Computing 5
  COMP S382F Data Mining and Analytics 5
  COMP S390F Creative Programming for Games 5
  COMP S492F Machine Learning 5
  ELEC S305F Computer Networking 5
  ELEC S348F IOT Security 5
  ELEC S371F Digital Forensics 5
  ELEC S431F Blockchain Technologies 5
  ELEC S425F Computer and Network Security 5
 Course CodeTitleCredits
 ELEC S201FBasic Electronics5
 IT S290FHuman Computer Interaction & User Experience Design5
 STAT S251FStatistical Data Analysis5
 Course CodeTitleCredits
 COMPS333FAdvanced Programming and AI Algorithms5
 COMPS362FConcurrent and Network Programming5
 COMPS363FDistributed Systems and Parallel Computing5
 COMPS380FWeb Applications: Design and Development5
 COMPS381FServer-side Technologies and Cloud Computing5
 COMPS382FData Mining and Analytics5
 COMPS390FCreative Programming for Games5
 COMPS413FApplication Design and Development for Mobile Devices5
 COMPS492FMachine Learning5
 ELECS305FComputer Networking5
 ELECS363FAdvanced Computer Design5
 ELECS425FComputer and Network Security5