Best GPUs for Rust: Full Potential of Your Gaming Experience

byAll About Healthy World •November 07, 2023

0

A few days back, I tweeted that I had just run code written in Best GPUs for Rust. It's about time I provided some more details. This is a project I worked on with Milinda Pathirage, a fellow student at IU. This is very much in the proof of concept stage. I doubt it will work well enough to do anything useful, but it does work well enough to do something, and it would certainly be possible to extend this.

That said, I will include links to our code so the valiant hackers can try it. For posterity's sake, here is, to my knowledge, the first fragment of Rust code to ever execute on a GPU:

There are two main parts to this project. The first is compiling Best GPUs for Rust code into something suitable for running on the GPU. We do this using the PTX backend that is part of LLVM. The second part is loading and executing the kernel. For this, we use OpenCL and its Create Program With Binary API. In this post, I'll focus on the issues encountered with generating PTX code.

Work potential: manual process

The bulk of the work to generate PTX code was already done by the NVPTX backend, which was recently contributed to LLVM by NVIDIA. We started with a very manual process. First, we used the --emit-llvm flag for Best GPUs for Rust to save the generated LLVM bitcode. From there, we attempt to compile as PTX using LLC:

I wasn't surprised to see this fail with one of LLVM's typically opaque error messages. You can see it here if you wish. Rust was generating code that the NVPTX backend didn't know how to handle. This makes sense; I expect NVIDIA to primarily test the backend on code generated by CUDA, which looks different from the code Rust generates. The next step was to pare down the generated LLVM to something a little more manageable.

TIP: To read about Best ATX 3.0 PSU, visit Tech Gaming Media.

The Best Graphics Cards for Playing Rust at Peak Performance

After another minor fix or two, it became clear that we would have to modify the way Rust generates code as well. For example, the PTX code I linked above does not include a .entry line, which is required to indicate where a kernel function begins. One option is to add a new PTX target for Rust and set it up as a cross-compiler.

We want something else. We want to run only some of Rust on the GPU, just a few program portions. Other than the code generator, we want the PTX code to agree with the architectural details of the host system. Instead, I added a -Zptx flag to Rustc and made minor changes to the translation pass. Functions that have the #[kernel] attribute get compiled to use the ptx_kernel calling convention, which tells NVPTX to add the .entry line. According to Patrick, we should use a new ABI setting, as arbitrary attributes aren't part of the function's type.

Graphics Cards for Optimal Rust Gaming

At any rate, we could now go from Rust to PTX without any manual intervention. The next challenge was to execute the kernel. When we first tried to load the PTX file, OpenCL complained about an "invalid binary." We had previously been able to load a PTX file generated with OpenCL and extracted using clGetProgramInfo, so we decided to compare the Rust-generated code with the OpenCL-generated code. The parameters to the kernel were not being annotated with an address space. We manually added .global to the parameters in the Rust-generated code, and we could load and execute the kernel. Furthermore, we could manually annotate the LLVM code with airspace (1) to get the same behavior.

For some types, Best GPUs for Rust would have the airspace (1) annotation, but for others, it wouldn't. Rust was already using address spaces for something related to garbage collection. Unfortunately, Rust and NVPTX disagree on what these address spaces mean. To work around this, I had Rust generate different address spaces when the -Zptx flag is given. At the moment, these changes only take effect for & pointers. Others, such as @ arrows, will need more work to get working.

Generation side of things for gaming

The final missing piece on the code generation side of things is to have threads be able to do different things. This means providing equivalents of the blockIdx, blockDim, and threadIdx variables. These show up in LLVM as intrinsic functions, so all we need to do is expose those as new Rust intrinsics. We expect to have this part working soon.

Our work here shows it's possible to compile the Best GPUs for Rust to run on the GPU. We support an extremely limited subset of Rust at the moment. Most of the remaining challenges have to do with the way data is arranged in memory and how Rust provides safety at runtime. Best GPUs for Rust uses a lot of pointer structures, and moving these between host and device memory can be difficult.

Perhaps the best thing to do for now is to be careful about what data types we use in GPU code. Even if we use relatively flat types, however, we will still need to handle a few more things. For example, Best GPUs for Rust does array bounds checks at runtime. If we want to allow arbitrary array indexing safely in GPU code, we'll need a way to do bounds checks and report failures from kernel code. There are a lot of design issues left, but the initial results for compiling Rust to run on the GPU seem very promising.

Best GPUs for Rust: Full Potential of Your Gaming Experience

Work potential: manual process

The Best Graphics Cards for Playing Rust at Peak Performance

Graphics Cards for Optimal Rust Gaming

Generation side of things for gaming

Post a Comment

Contact form