Some_Endian_FP17

joined 2 years ago

Using Mistral Openorca to create a knowledge graph from a text document in c/localllama@poweruser.forum

[–] Some_Endian_FP17@alien.top 1 points 2 years ago (1 children)

Thanks for this. I've only worked with RAG on OpenAI models and there's a lot of prompt finetuning needed to get decent results. A KG helps define the semantic elements and relationships between document fragments and the user query for RAG.

That said, I'm still relying on the vector database to do most of the heavy lifting of filtering relevant results before feeding them into an LLM. Having an LLM clean up or summarize the user query and create a KG from the vector database's response could lead to more accurate answers.

Building llama.cpp on Windows on ARM (alien.top)

submitted 2 years ago by Some_Endian_FP17@alien.top to c/localllama@poweruser.forum

0 comments fedilink

I finally managed to build llama.cpp on Windows on ARM running on a Surface Pro X with the Qualcomm 8cx chip. Why bother with this instead of running it under WSL? It lets you run the largest models that can fit into system RAM without WSL Hyper-V overhead.

I didn't notice any speed difference but the extra available RAM means I can use 7B Q5_K_M GGUF models now instead of Q3. Typical output speeds are 4 t/s to 5 t/s.

Steps:

Install MSYS2. The installer package has x64 and ARM64 binaries included.
Run clangarm64. When you're in the shell, run these commands to install the required build packages:

pacman -Suy
pacman -S mingw-w64-clang-aarch64-clang
pacman -S cmake
pacman -S make
pacman -S git

Clone git repo and set up build environment. You need to make ARM64 clang appear as gcc by setting the flags below.

git clone
cd llama.cpp
mkdir build
cd build
export CC=/clangarm64/bin/cc
export CXX=/clangarm64/bin/c++

Build llama.cpp.

cmake ..
cmake --build . --config Release

Run main

bin/main.exe

If you're lucky, most of the package should build fine, but on my machine the quantizer .exe failed to build. I tried using ARM's own GNU toolchain compiler but I kept getting build errors.

There should be a way to get NPU-accelerated model runs using the Qualcomm QNN SDK, Microsoft's ONNX runtime and ONNX models but I got stuck in dependency hell in Visual Studio 2022. I'm not a Windows developer and trying to combine x86, x64 and ARM64 compilers and python binaries is way beyond me.