overview for qalis

[D] I just can't find a dataset to my need! in c/machinelearning@academy.garden

[–] qalis@alien.top 1 points 2 years ago

r/datasets. Also, I would be very surprised if such dataset has even been even gathered and made available publicly. This sounds like typical long-term project with EU humanities scientific grants (I have personally worked with a similar one).

[p] User friendly digital audio signal processing in c/machinelearning@academy.garden

[–] qalis@alien.top 1 points 2 years ago (1 children)

You will probably get better answers at r/LearnMachineLearning. You need some basic neural networks courses for beginning. Then e.g. HuggingFace's audio processing course, it's short and high-level, but will be a nice intro. In general you will focus on convolutional networks (CNNs) and processing audio either as 1D signals or 2D images (spectrograms).

[D] Best way to convert Jupiter notebook to API container? in c/machinelearning@academy.garden

[–] qalis@alien.top 1 points 2 years ago (3 children)

You shouldn't do that, for multiple reasons (I can elaborate if needed). Your model is a binary file, a set of weights, basically no matter what you train. Once you write it to disk, typically with built-in serialization (e.g. pickle for Scikit-learn, or .pth format for PyTorch), there are lots of frameworks to deploy it.

The easiest to use and the most generic one is BentoML, which will package the code into a Docker image and automatically deploy with REST and gRPC endpoints. It has a lot of integrations, and is probably the most popular option. There are also more specialized solutions, e.g. TorchServe.

However, if you care about inference speed, you should also compile or optimize your model for the target architecture before packaging it for the API and target runtime, e.g. with ONNX, Apache TVM, Treelite or NVidia TensorRT.

[R] Exponentially Faster Language Modelling in c/machinelearning@academy.garden

[–] qalis@alien.top 1 points 2 years ago

Very interesting development, but I'm waiting for more production-ready version. Having to set up separate Github repo, with manual installation inside, is not exactly nice. However, if this gets fully compatible with HuggingFace Hub, then this will be huge for simpler cases.

[D] Are neural processes still relevant? in c/machinelearning@academy.garden

[–] qalis@alien.top 1 points 2 years ago

I have worked (academia & industry) with both GPs and NNs, and never even heard of neural proceses, so... not really.

For spatio-temporal data GNNs are the hot topic and more known approach

[R] DNN models that are public in c/machinelearning@academy.garden

[–] qalis@alien.top 1 points 2 years ago

Reported. Breaks rule 7, this is not nearly a "quality contribution".

[Discussion] What are best practices when building/training very small models? in c/machinelearning@academy.garden

[–] qalis@alien.top 1 points 2 years ago

(I assume you are talking about convolutional models in the context of computer vision)

I had similar constraints (embedded devices in specific environment) and we didn't use deep learning at all. Instead, we used classical image descriptors from OpenCV like color histograms, HOG, SIFT etc. with SVM as classifier. It can work surprisingly well for many problems, and is blazing fast.

Consider how you can make the problem easier. Maybe you can do binary classification instead of multiclass, or use only grayscale images. Anything that will make the task itself easier will be a good improvement.

If your problem absolutely requires neural networks, I would use all tools available:

Skip connections, either residuals or to all layers (like DenseNet)
Sharpness-Aware Minimizer (SAM) or some of its variants
Label smoothing
Data augmentation with a few really problem-relevant transformations
Extensive hyperparameter tuning with Gaussian Process or multivariate Tree Parzen Estimator (see e.g. Optuna)
You can concatenate those classical features like color histograms or HOG to the flattened output of the CNN, before the MLP head. This way you reduce what CNN needs to learn, so you can get away with less parameters
Go for more convolutional layers instead of large MLP head. Convolutional layers eat up a lot less of parameter budget than MLPs.

You can also consider training a larger network and then applying compression techniques, such as knowledge distillation, quantization or pruning.

Is it possible to make a recommendation system with no prior experience to ml [D] in c/machinelearning@academy.garden

[–] qalis@alien.top 1 points 2 years ago

Personalized recommender system? No, not doable. This requires separate setup, model serving, regular retraining since typical collaborative filtering does not support adding users and items. Besides, it makes no sense if you don't have prior data.

However, implementing popularity-based system, without personalization, is possible. This is typically just simple Bayesian statistics, doable in 10-15 lines of Python. See e.g.:

- https://arpitbhayani.me/blogs/bayesian-average/

- https://www.evanmiller.org/how-not-to-sort-by-average-rating.html

- https://www.evanmiller.org/ranking-items-with-star-ratings.html

- https://www.evanmiller.org/bayesian-average-ratings.html

[Project] Which text representation technique to choose? in c/machinelearning@academy.garden

[–] qalis@alien.top 1 points 2 years ago

r/learnmachinelearning

[R] Is it possible to implement generative ai successfully without relying on openAI in c/machinelearning@academy.garden

[–] qalis@alien.top 1 points 2 years ago

If you have something "enterprise-level", then pay some actually good ML consulting company, or high end ML-focused software house, to do this for you. If you don't have money for that, you most probably don't have enterprise data and you just need to learn, not ask general question on Reddit.

Urgent help needed regarding iNLTK [P] in c/machinelearning@academy.garden

[–] qalis@alien.top 1 points 2 years ago

I encourage you to urgently read subreddit rules. Reported.

[R] ConvNets Match Vision Transformers at Scale in c/machinelearning@academy.garden

[–] qalis@alien.top 1 points 2 years ago

That's exactly what ConvNeXt V2 does