How do I get JAX to use the GPU on my M1 Mac - arm

I am trying to run jax on my M1 machine and would really like it to use my GPU. I haven't figured out how to do this.
I think I pretty much followed every step of this link. In particular, what I did was
Uninstall anaconda and remove every instance of it on my computer
Reinstall anaconda with the arm version in particular
Create a conda environment I called serotonin-gpu. I install everything into this conda env from now on.
install PyTorch nightly (to get M1 GPU support)
install MoltenVK for mac support
clone HEAD of jax repo
build+install a wheel of jaxlib
install jax
Install latest snapsot of iree
Modify jax/jax/_src/iree.py as in the above link to pass extra flags.
I then tried to run the following code in serotonin-gpu, but it appears not to have utilized the gpu.
I don't really even know how to debug from here.
Some thoughts
Could it matter that I originally authored the notebook shown above in a environment that was using regular jax (i.e. wo trying to make it use the M1 GPU)?
There are some additional commands in the link I was following that appeared to be more related to dalle-playground
They advised using the following code
# if you want to try the known-good CPU-only mode, then remove these env vars
JAX_PLATFORMS=iree JAX_IREE_BACKEND=vulkan python app.py 8080
I don't know where I would even type this commands in though
Are there better/more general resources for getting jax to use the GPU on M1s? I haven't found any yet though.
Is there any way to contact someone either at Jax or Apple and beg them to make these two great tools more compatible?

Some thoughts:
JAX is heavily dependent on Tensorflow framework. When building JAX from source, it requires to build Tensorflow from sources as well. Not clear why Pytorch is relevant here except getting Dalle to work
Core of JAX is XLA compiler which does not seem to support Apple M1 GPU hardware target, not clear if LLVM backend exists for Apple M1 GPU.
You can switch your code to use CPU only, example below:
import jax
import jax.numpy as jnp
# Global flag to set a specific platform, must be used at startup.
jax.config.update('jax_platform_name', 'cpu')

Related

Using Tensorflow-Lite GPU delegate in Android's Native environment with C-API

Info
I'm using Tensorflow-Lite in Android's Native environment via the C-API (following these instructions) but runtime is significantly longer compared to the GPU delegate via the Java API (on ART).
The JNI AAR file (2.2) offers C-headers and a shared-library, but it seems that the shared-library doesn't contain the GPU delegate, but only a framework to configure delegates on (TfLiteDelegate object and TfLiteDelegateCreate()).
** It doesn't provide any TfLiteGpuDelegateV2Create() or tflite namespace access, for example.
Trials
I've tried to include a libtensorflowlite_gpu_delegate.so in the project with cmake but though it seems to build and link OK - the library isn't accessible via Native code.
I tried following c_api.h's example of delegate usage, but I can't seem to configure a GPU delegate.
Docker container doesn't include toolchain (trying to build shared library in tensorflow/tensorflow:latest-devel-gpu Tensorflow Docker image with bazel build -c opt --config android_arm64 tensorflow/lite/delegates/gpu:libtensorflowlite_gpu_delegate.so fails with cc_toolchain_suite '#local_config_cc//:toolchain' does not contain a toolchain for cpu 'arm64-v8a')
Question
How can I run an inference with the GPU delegate in Android's Native environment using the C-API?
I managed to do it as follows:
1. Clone and configure tensorflow
Clone tensorflow repo from GitHub, cd into it and run ./configure. There it is important to answer
Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]
with y and correctly specify the Android NDK and SDK directories.
2. Build libtensorflow-lite_gpu_delegate with bazel
I succeeded building the GPU delegate shared library with
bazel build -c opt --cxxopt=--std=c++11 --config android_arm64 tensorflow/lite/delegates/gpu:libtensorflowlite_gpu_delegate.so
I built against Android NDK 18.1.5063045 with minimum API level 27. Note that I only tested for the android_arm64 architecture, I cannot give guarantees for other architectures.
(The time I compiled TensorFlow the HEAD pointed at commit 0f8a27183657972c8ba2bce150e1364179ded6f9.)
3. Update CMakeLists.txt
The relevant lines are the following:
include_directories(
/Users/<name>/tensorflow/tensorflow/lite/delegates/gpu # for Mac
)
add_library(tensorflow-lite_gpu_delegate SHARED IMPORTED)
set_target_properties(tensorflow-lite_gpu_delegate PROPERTIES IMPORTED_LOCATION
/private/var/tmp/_bazel_<name>/fe60511640322ef6962b77bab4b291e3/execroot/org_tensorflow/bazel-out/arm64-v8a-opt/bin/tensorflow/lite/delegates/gpu/libtensorflowlite_gpu_delegate.so) # I obtained this path pressing Cmd+Option+C on the libtensorflow-lite_gpu_delegate.so file on Mac, might be different on your OS
target_link_libraries(
tensorflow-lite_gpu_delegate
)
4. Use GPU delegate in code
The relevant lines are the following:
#include <delegate.h>
auto *delegate = TfLiteGpuDelegateV2Create(/*default options=*/nullptr);
// Create the model and interpreter options.
TfLiteModel *model = TfLiteModelCreate(/* create as usual */);
TfLiteInterpreterOptions* options = TfLiteInterpreterOptionsCreate();
TfLiteInterpreterOptionsAddDelegate(options, delegate);
// Create the interpreter.
TfLiteInterpreter *interpreter = TfLiteInterpreterCreate(model, options);
Note: For me, the GPU delegate did not yield a great improvement in inference speed. This might be due to my model using operations that are not supported by the GPU delegate (the set of supported ops seems to be quite small right now) and therefore have to be computed on CPU.

How to iterate effectively in Linux kernel development

I'm fairly new to Linux kernel development. It is certainly quite a bit different than the Windows kernel (I am a recovering Microsoft engineer). Can you provide advice on how to iterate effectively on updating modules that come with the Linux kernel?
Specifically, I am updating hid and bcm5974 to support the latest Macbook Pro (early 2015), and am using Ubuntu 15.04 (kernel 3.19). Would you recommend I test it out in a Virtual Machine? Are there ways to incrementally build instead of clean + build the whole tree? I'd love to be able to build just the affected modules but I can't find a good way to do that. The Makefiles are rather complicated.
Time to answer my own question. After doing a full build, incrementals are pretty straightforward given you're not editing headers that are consumed by other modules.
make modules SUBDIRS=drivers/input/mouse
Once I've installed the kernel from the full build, iterating on new module compilations is a breeze. sudo rmmod bcm5974. scp file from build desktop to Macbook Pro. sudo insmod bcm5974.

CUDA samples run but no nvcc found - Mint 15 64 bit

I have downloaded and ran the CUDA 5.0 installer on my Mint 15 64bit distro. After hours of agony adjusting / removing / installing packages, it was able to finish installation - at least that what it said.
I can go run the CUDA samples so I thought hey it's working. However, I just made a new cu file and wanted to compile but it said "nvcc command not found"
I have looked at a topic similar to this here and they are talking about /opt/bin/ directory however on mine, there is no such directory. Does that mean it actually did not install ? It tells me to install nvidia cuda toolkit with apt-get but I am not sure if I should do that.
Also, I did say I ran the CUDA samples fine but I have to say ldconfig /usr/local/cuda/lib64
before I can get it to working. Is there a way to automate that ?
Thanks
You need to add the bin directory of the nvcc compiler driver to your PATH (environment variable), and you need to add the appropriate lib directories to your LD_LIBRARY_PATH environment variable.
For an immediate test, this should be as simple as:
export PATH=$PATH:/usr/local/cuda/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/lib
These instructions should be presented to you at the completion of a successful cuda toolkit install, but it seems your install method may have been roundabout.
To make this "automatic" you may want to investigate one of the methods to add these statements to a script run at login. For example, if you have a .bashrc file in your user's home directory, try editing that with the above commands. It should probably be sufficient to put the above commands at the very end of your ~/.bashrc file if you have one.
Note that Linux Mint is not one of the officially supported CUDA distros, so your mileage may vary.

grep or find on Android

These are not installed on Android 4.2.1 by default, so is it possible to cross-compile the source for e.g. GNU grep or find and have it run on Android? ( Preferably without having to root the device or installing some app off PLAY e.g. busybox.) Are there any missing dependencies that will prevent this? I am developing on Ubuntu 10.0.04
Strange. I have them on /system/xbin/*. Maybe more luck with busybox. busybox find busybox grep Not sure if busybox is installed by default on Android 4.2 tho, but it's a pretty common binary.
This is not a complete answer because I haven't tried building grep or find. However, in general it is quite possible to build GNU utilities for Android. To do this, the best option is:
Download the Android native development kit
Build an Android standalone toolchain by referring to docs/STANDALONE-TOOLCHAIN.html in the NDK
Simply build the relevant GNU utility using the normal ./configure && make mechanism.
You'll then need to copy the resulting binaries onto your Android device, which you can do using adb push. You may need to arrange to put them into /data/ somewhere because /mnt/sdcard is often marked non-executable.
Missing dependencies
The main problem you'll find during the actual builds is that Android does not use the standard GNU libc (glibc). Instead, it uses its own, called Bionic. This does miss certain important APIs - for example, wide character string support.
I've found for some GNU utilities this is OK and they can be compiled with minimal source code changes.
However, if you run into trouble, you're probably better off using other versions of these utilities which are typically designed for more flexibility in terms of the underlying libc. Specifically, the previous advice about using busybox is excellent. If you don't wish to install it from the Android market, you can find the source code here.

What is better downloading libraries from repositories of or installing from *.tar.gz

gcc 4.4.4 c89 Fedora 13
I am wondering what is better. To give you a compile of examples: apache runtime portable and log4c.
The apr version in my fedora repository is 1.3.9. The latest stable version on the apr website is 1.4.2.
Questions
Would it be better to download from the website and install, or install using yum?
When you install from yum sometimes it can put things in many directories. When installing from the tarball you can put the includes and libraries where you want.
The log4c the versions are the same, as this is an old project.
I downloaded log4c using yum. I copied all the includes and libraries to my development project directory.
i.e.
project_name/tools/log4c/inc
project_name/tools/log4c/libs
However, I noticed that I had to look for some headers in the /usr/include directory.
Many thanks for any suggestions,
If the version in your distribution's package repository is recent enough, just use that.
Advantages are automatic updates via your distribution, easy and fast installs (including the automatic fetching and installing of dependencies!) and easy removals of packages.
If you install stuff from .tar.gz by yourself, you have to play your own distribution - keep track of security issues and bugs.
Using distribution packages, you have an eye on security problems as well, but a lot work does the distributor for you (like developing patches, repackaging, testing and catching serious stuff). Of course each distributor has a policy how to deal with different classes of issues for different package repositories. But with your own .tar.gz installs you have nothing of this.
It's an age-old question I think. And it's the same on all Linux distributions.
The package is created by someone - that person has an opinion as to where stuff should go. You may not agree - but by using a package you are spared chasing down all the dependencies needed to compile and install the software.
So for full control: roll your own - but be prepared for the possible work
otherwise use the package.
My view:
Use packages until it's impossible to do so (conflicts, compile parameters needed, ..) . I'd much rather spend time getting the software to work for me, than spend time compiling.
I usually use the packages provided by my distribution, if they are of a new enough version. There is two reasons for that:
1) Someone will make sure that I get new packages if security vulnerabilities in the old ones are uncovered.
2) It saves me time.
When I set up a development project, I never create my own include/lib directories unless the project itself is the authorative source for the relevant files I put there.
I use pkg-config to provide the location of necessary libraries and include files to my compiler. pkg-config use some .pc-files as a source of information about where things are supposed to be, and these are maintained by the same people who create the packages for your distribution. Some libraries does not provide this file, but an alternative '-config'-script. I'll provide two examples:
I'm not running Fedora 13, but an example on Ubuntu 10.04 would be;
*) Install liblog4c-dev
*) The command "log4c-config --libs" returns "-L/usr/lib -llog4c" ...
*) The command "log4c-config --cflags" returns "-I/usr/include"
And for an example using pkg-config (I'll use SDL for the example):
*) Install libsdl1.2-dev
*) The command "pkg-config sdl --libs" returns "-lSDL"
*) The command "pkg-config sdl --cflags" returns "-D_GNU_SOURCE=1 -D_REENTRANT -I/usr/include/SDL"
... So even if another distribution decides to put things in different paths, there are scripts that are supposed to give you a reliable answer to where things is - so things can be built on most distributions. Autotools (automake, autoconf, and the likes) amd cmake are quite helpful to make sure that you don't have to deal with these problems.
If you want to build something that has to work with the Apache that's included with Fedora, then it's probably best to use the apr version in Fedora. That way you get automatic security updates etc. If you want to develop something new yourself, it might be useful to track upstream instead.
Also, normally the headers that your distro provides should be found by gcc & co. without you needing to copy them, so it doesn't matter where they are stored by yum/rpm.

Resources