A few days back, I tweeted that I had just run code written in Best GPUs for Rust. It's about time I provided some more details. This is a project I worked on with Milinda Pathirage, a fellow student at IU. This is very much in the proof of concept stage. I doubt it will work well enough to do anything useful, but it does work well enough to do something, and it would certainly be possible to extend this.
That said, I will
include links to our code so the valiant hackers can try it. For posterity's
sake, here is, to my knowledge, the first fragment of Rust code to ever execute
on a GPU:
There are two main parts to this project. The first is
compiling Best GPUs for Rust code into something suitable for running on the
GPU. We do this using the PTX backend that is part of LLVM. The second part is
loading and executing the kernel. For this, we use OpenCL and its Create Program
With Binary API. In this post, I'll focus on the issues encountered with
generating PTX code.
Work potential: manual process
The bulk of the work to generate PTX code was already done
by the NVPTX backend, which was recently contributed to LLVM by NVIDIA. We
started with a very manual process. First, we used the --emit-llvm flag for
Best GPUs for Rust to save the generated LLVM bitcode. From there, we attempt
to compile as PTX using LLC:
I wasn't surprised to see this fail with one of LLVM's
typically opaque error messages. You can see it here if you wish. Rust was
generating code that the NVPTX backend didn't know how to handle. This makes
sense; I expect NVIDIA to primarily test the backend on code generated by CUDA,
which looks different from the code Rust generates. The next step was to pare
down the generated LLVM to something a little more manageable.
TIP: To read
about Best ATX 3.0 PSU,
visit Tech Gaming Media.
The Best Graphics Cards for Playing Rust at Peak Performance
After another minor fix or two, it became clear that we
would have to modify the way Rust generates code as well. For example, the PTX
code I linked above does not include a .entry line, which is required to
indicate where a kernel function begins. One option is to add a new PTX target
for Rust and set it up as a cross-compiler.
We want something else. We want to run only some of Rust on
the GPU, just a few program portions. Other than the code generator, we want
the PTX code to agree with the architectural details of the host system.
Instead, I added a -Zptx flag to Rustc and made minor changes to the
translation pass. Functions that have the #[kernel] attribute get compiled to
use the ptx_kernel calling convention, which tells NVPTX to add the .entry
line. According to Patrick, we should use a new ABI setting, as arbitrary
attributes aren't part of the function's type.
Graphics Cards for Optimal Rust Gaming
At any rate, we could now go from Rust to PTX without any
manual intervention. The next challenge was to execute the kernel. When we
first tried to load the PTX file, OpenCL complained about an "invalid
binary." We had previously been able to load a PTX file generated with OpenCL
and extracted using clGetProgramInfo, so we decided to compare the
Rust-generated code with the OpenCL-generated code. The parameters to the
kernel were not being annotated with an address space. We manually added
.global to the parameters in the Rust-generated code, and we could load and
execute the kernel. Furthermore, we could manually annotate the LLVM code with
airspace (1) to get the same behavior.
For some types, Best GPUs for Rust would have the airspace
(1) annotation, but for others, it wouldn't. Rust was already using address
spaces for something related to garbage collection. Unfortunately, Rust and
NVPTX disagree on what these address spaces mean. To work around this, I had
Rust generate different address spaces when the -Zptx flag is given. At the moment,
these changes only take effect for & pointers. Others, such as @ arrows,
will need more work to get working.
Generation side of things for gaming
The final missing piece on the code generation side of
things is to have threads be able to do different things. This means providing
equivalents of the blockIdx, blockDim, and threadIdx variables. These show up
in LLVM as intrinsic functions, so all we need to do is expose those as new
Rust intrinsics. We expect to have this part working soon.
Our work here shows it's possible to compile the Best GPUs
for Rust to run on the GPU. We support an extremely limited subset of Rust at
the moment. Most of the remaining challenges have to do with the way data is
arranged in memory and how Rust provides safety at runtime. Best GPUs for Rust
uses a lot of pointer structures, and moving these between host and device
memory can be difficult.
Perhaps the best thing to do for now is to be careful about
what data types we use in GPU code. Even if we use relatively flat types,
however, we will still need to handle a few more things. For example, Best GPUs
for Rust does array bounds checks at runtime. If we want to allow arbitrary
array indexing safely in GPU code, we'll need a way to do bounds checks and
report failures from kernel code. There are a lot of design issues left, but
the initial results for compiling Rust to run on the GPU seem very promising.
