Some_Endian_FP17

joined 9 months ago
[–] Some_Endian_FP17@alien.top 1 points 9 months ago (1 children)

Thanks for this. I've only worked with RAG on OpenAI models and there's a lot of prompt finetuning needed to get decent results. A KG helps define the semantic elements and relationships between document fragments and the user query for RAG.

That said, I'm still relying on the vector database to do most of the heavy lifting of filtering relevant results before feeding them into an LLM. Having an LLM clean up or summarize the user query and create a KG from the vector database's response could lead to more accurate answers.

 

I finally managed to build llama.cpp on Windows on ARM running on a Surface Pro X with the Qualcomm 8cx chip. Why bother with this instead of running it under WSL? It lets you run the largest models that can fit into system RAM without WSL Hyper-V overhead.

I didn't notice any speed difference but the extra available RAM means I can use 7B Q5_K_M GGUF models now instead of Q3. Typical output speeds are 4 t/s to 5 t/s.

Steps:

  1. Install MSYS2. The installer package has x64 and ARM64 binaries included.

  2. Run clangarm64. When you're in the shell, run these commands to install the required build packages:

  • pacman -Suy
  • pacman -S mingw-w64-clang-aarch64-clang
  • pacman -S cmake
  • pacman -S make
  • pacman -S git
  1. Clone git repo and set up build environment. You need to make ARM64 clang appear as gcc by setting the flags below.
  • git clone
  • cd llama.cpp
  • mkdir build
  • cd build
  • export CC=/clangarm64/bin/cc
  • export CXX=/clangarm64/bin/c++
  1. Build llama.cpp.
  • cmake ..
  • cmake --build . --config Release
  1. Run main
  • bin/main.exe

If you're lucky, most of the package should build fine, but on my machine the quantizer .exe failed to build. I tried using ARM's own GNU toolchain compiler but I kept getting build errors.

There should be a way to get NPU-accelerated model runs using the Qualcomm QNN SDK, Microsoft's ONNX runtime and ONNX models but I got stuck in dependency hell in Visual Studio 2022. I'm not a Windows developer and trying to combine x86, x64 and ARM64 compilers and python binaries is way beyond me.