overview for Hoblywobblesworth

Converting ctransformers script into single .exe file with pyinstaller in c/localllama@poweruser.forum

[–] Hoblywobblesworth@alien.top 1 points 1 year ago

Yep.

Looks like I might be missing a pyinstaller hook. Someone appears to have made a pull request that addresses the same thing but for llama cpp:

https://github.com/abetlen/llama-cpp-python/pull/709

Will try the same thing and post here if it worked.

1

Converting ctransformers script into single .exe file with pyinstaller (alien.top)

submitted 1 year ago by Hoblywobblesworth@alien.top to c/localllama@poweruser.forum

2 comments fedilink

I have a python script that takes some input text, processes it with a local 7B model, and spits out the models completion. When I call the script it runs beautifully (albeit slowly) on CPU only using the ctransformers library.

I'm now trying to convert my script into a single-click .exe file that any user can run it without needing to manually install python/dependencies or have any familiarity with command line.

My first attempts at this were with pyinstaller but when I run the .exe file output by pinstaller I get the error that:
OSError: Precompiled binaries are not available for the current platform. Please reinstall from source using:
pip uninstall ctransformers --yes
pip install ctransformers --no-binary ctransformers

I have tried reinstalling ctransformers with --no-binary but I still get the same error.

Various internet searches have not been helpful and I have found very little about how one might go about converting a python script that uses one of the main CPU-only libraries (llama cpp, ctransformers etc) into a more user friendly one-click exe file.

Any helps or pointers would be much appreciated!

on-demand inference or batch inference? in c/localllama@poweruser.forum

[–] Hoblywobblesworth@alien.top 1 points 1 year ago (1 children)

I think in the longer term the demand for the "do 10,000 generations at once" will rise. Chatbots and chat-based interfaces that have fairly spread out/consistent traffic flow are the first widely propagating use case for LLMs but they are a bit gimmicky. There are and will be plenty of very specific, niche domain use cases where you will want the hundreds/thousands generations at once and then not see traffic again for days/weeks until a next sudden spike.

If your current demand is from chatbots then build that, but once other industries and domains start to figure out how best to use LLMs, I reckon there will be growth in demand for cloud compute that can handle infrequent but super spikey inference requests.