Machine Learning

1 readers
1 users here now

Community Rules:

founded 1 year ago
MODERATORS
1
 
 

I’ve used Bertopic and Top2Vec in Python but am wondering if there’s something similar in R that can use pre-trained models to generate topics? If not do you think investing time into building something like this would be useful to the community?

2
 
 

Basically in the title, female. Thanks for advice cause their discord so many bad ones are postet its flooded

3
4
 
 

A bit of background. I graduated this year with BS in Computer Science. I had a job as an MLOps/Cloud engineer, but unfortunately I got laid off. My ambition as of now is to work as an MLE or a Data Scientist (not necessarily a researcher, although that'd be dope). My question is, which would be more useful in that regard, an MS in CS or Stats? I've heard doing ML/DS specific programs can be "cash cows" and a bit shallow, so I'd rather do something more fundamental and give me more options. I'd appreciate any input you guys have! Thanks in advance.

5
6
 
 

I work as a junior backend developer at a company for about 1.5 years now, in the UK. In the beginning, everything was great, but as time goes on, I increasingly feel that I can't continue here. They assigned me to an ML project that I handle alone, developing from scratch, in an area and with technologies I have never used before which is Computer Vision. They provided a plan, a goal, and told me to work on it. I thought, great, at least I'll learn new things. However, this was good for a while, but after about six months, I reached a point where no matter how hard I try, I'm stuck, unable to overcome the problem. I asked for help, and the response was that they can't really assist me because there's no one in the company who understands what I'm working on. So, I got the answer to try to solve it on my own. So, I continued, meanwhile, they assigned me another project, a full-stack project to work on. In this project too, about 50% of the technologies are ones I've never used.

So, I'm working on two projects, almost entirely alone on one, and on the other, there are unrealistic expectations and very difficult tasks. I work 8-10 hours a day, sometimes I even forget to eat because I'm so nervous and stressed that I want to work non-stop to achieve goals.

The problem is that the senior developer said it's time for me to finish the project I'm working on alone because they want to sell it. And I'm like: what??? This whole situation seems quite absurd. After that, he said that if I don't finish it by the end of the year, I should throw the project in the trash because if I couldn't do the whole thing in this time, it's better not to. I couldn't believe my ears. How can they sell something which was developed by a junior developer alone, almost 90% of the product?

What do you think? What should I do? Should I take it to heart? According to several senior friends, it's time for me to find another job because they think I'm being exploited. My salary is also terribly low, and there's no bonus. However, I already feel that if I stay here any longer, I will come to hate programming. Any advice?

7
8
 
 

As a software engineer transitioning into data science, I'm undertaking the challenging project of incorporating document detection for KYC (Know Your Customer) verification. The aim is to prevent users from submitting non-document images via their phone cameras, especially after disabling the gallery upload option.

I have a pool of 3 to 4 million KYC verified user IDs. Is it feasible to utilize these IDs to enhance the document detection model? My dilemma lies in deciding whether to employ OCR (Optical Character Recognition) in part or in whole. Is it necessary to use OCR just to determine if an image contains a government ID, or CNN to verify they are of certain Id type or are there alternative approaches that could be clearer?

Additionally, I lack practical experience in deploying machine learning models and haven't ventured beyond working with AI/ML concepts solely within notebooks.

9
 
 

Hey everyone,

I made a translation Discord bot for around 30,000 servers, but using paid services like Google Translate or DeepL is too expensive. Instead, I've switched to an open-source translation model (currently using m2m100 with 400 million parameters) on a CPU. However, the translations it provides aren't up to par, and I've found that newer models like MADLAD-400 deliver much better results. The catch is that MADLAD-400 is too large to run on a CPU.

I'm looking to deploy this improved model on a GPU, but my budget is a bit tight—around $100-150 per month. Does anyone know of any services or offers that provide GPU access within this price range? Any suggestions or recommendations would be greatly appreciated!

10
 
 

I'd like to fine-tune a text-to-image model to know certain faces of people I'm working with. I've been experimenting a bit and I can get some images that are reminiscent of a person but really doesn't look like them. I'm also needing to provide more in the prompt than I would expect.

For example, there is one person who is a big guy with a mustache and glasses. I fine-tuned using a few images of him with the caption being his actual name in the training dataset.

When I generate images with his name as the subject, none of the faces will have a mustache or glasses. If I prompt it "Mark Smith with mustache and glasses doing xyz" it does look slightly more reminiscent of him, but still not quite right.

What should my strategy be to improve this? Do I need more images of him? Should I hash his name (or similar) into a common caption to make sure other weights in the model are not interfering? Other ideas?

I realize I could experiment, but it's very expensive to keep fine-tuning and I don't want to go the wrong direction too many times.

11
 
 

Ok, so I lost the draft, and I'm gonna slim down the post, BC I'm not retyping all that...

  • BRANCHING WILL be at least moderately important in the model architecture
  • Multiple boards (servers in cluster)
  • NOT using Python (too slow, and I hate that language).
  • PROBABLY not using CUDA, but that is up for debate, if it fits the purpose, but heterogenous OCL is the expected method.
  • MANY (10s to 100s of primary models; 100s to 1000s of sliding windowed meta-models with complex interrelated processing) models involved in context-heavy operative multi-modal mass cross-training and only very limited pre-training (the majority of the learning process is scheduled in real-time under operating conditions).
  • Learning process is intended to be very heavily spiked (Spike Network).

Is there anyone out there who has any experience with the best type of expansion cards for this sorta thing?

Unfortunately, I have spent a very long time looking for whatever might be the best modern equivalent of the old Xeon Phi cards, but nothing seems to exist.

As I don't use Intel boards, anyway, that's halfway a moot point, but I would really like to know if anyone has an actual recommendation.

12
13
 
 

I'm training to use jais model embedding and I'm curious to know if any one used it before

14
 
 

I’m working on a project to generate text from a 1.2B parameter full precision LLM (5gb)

Unfortunately I’m limited in the infrastructure I can use to deploy this model. There is no batch inference supported. The infrastructure I have allows me to deploy a copy of the model on a single A100, 1 per process with up to 9 processes supported (these are called “replicas”). I understand that this makes little sense given my model is memory bound, and each process will fight for memory bandwidth to read in the same weights, but I can’t change that for now.

My average input and output tokens are roughly 1000 each. I estimate the kv cache per token is roughly 400kB using full precision.

I have benchmarks of the latency of the model using various “replicas” as described above. I wanted to compare this to the theoretical performance of the A100. For my use case time to first token is negligible (<200ms), and generation is memory bound.

I find that with 5 or more replicas, the math works out and my model is roughly as fast as I expect. For example, with 1000 output tokens, 6 replicas, it’s like I’m generating using a batch of 6 requests from a 30gb model + 5gb for the kv cache. At a memory bandwidth around 1-1.3tbps that translates to ~30s per request, which is not far from what I see. The same goes for other replica numbers, 5, 7, 8 and 9.

However, when I run with a single replica, I expect generation to hover around the 5-6s mark on average. Instead, I see > 20s. I need to add 4 more replicas before the number starts to make sense. It almost seems like the model takes up too little memory to be allocated the entire memory bandwidth.

Does anyone know where this extra latency could be coming from? Do models have to reach a certain amount of used memory for A100 memory bandwidth to hit their available memory bandwidth?

15
 
 

(Reposting since the previous post was removed, I think because non-Arxiv posts are only allowed on weekends and now it's a weekend)

Materials discovery is critical but tough. New materials enable big innovations like batteries or LEDs. But there are ~infinitely many combinations to try. Testing for them experimentally is slow and expensive.

So scientists and engineers want to simulate and screen materials on computers first. This can check way more candidates before real-world experiments. However, models historically struggled at accurately predicting if materials are stable.

Researchers at DeepMind made a system called GNoME that uses graph neural networks and active learning to push past these limits.

GNoME models materials' crystal structures as graphs and predicts formation energies. It actively generates and filters candidates, evaluating the most promising with simulations. This expands its knowledge and improves predictions over multiple cycles.

The authors introduced new ways to generate derivative structures that respect symmetries, further diversifying discoveries.

The results:

  1. GNoME found 2.2 million new stable materials - equivalent to 800 years of normal discovery.
  2. Of those, 380k were the most stable and candidates for validation.
  3. 736 were validated in external labs. These include a totally new diamond-like optical material and another that may be a superconductor.

Overall this demonstrates how scaling up deep learning can massively speed up materials innovation. As data and models improve together, it'll accelerate solutions to big problems needing new engineered materials.

TLDR: DeepMind made an AI system that uses graph neural networks to discover possible new materials. It found 2.2 million candidates, and over 300k are most stable. Over 700 have already been synthesized.

Full summary available here. Paper is here.

16
 
 

After reading this post and doing some research on my own, I haven't found a definitive conclusion as to which version of Whisper is the most efficient. Is there a solid consensus on this topic?

17
 
 

https://youtu.be/KwpeuqT69fw

Researchers were able to get giant amounts of training data out of ChatGPT by simply asking it to repeat a word many times over, which causes the model to diverge and start spitting out memorized text.

Why does this happen? And how much of their training data do such models really memorize verbatim?

OUTLINE:

0:00 - Intro

8:05 - Extractable vs Discoverable Memorization

14:00 - Models leak more data than previously thought

20:25 - Some data is extractable but not discoverable

25:30 - Extracting data from closed models

30:45 - Poem poem poem

37:50 - Quantitative membership testing

40:30 - Exploring the ChatGPT exploit further

47:00 - Conclusion

Paper: https://arxiv.org/abs/2311.17035

18
 
 

Can someone help me understand how to rotate a vector A by some angle from vector B, both of size 512?

I have a feature vector that is compared to a basis vector which are the weights being updated, to do that an angle is calculated between the two vector’s and compared to ground truth. I would like reconstruct the feature vector by moving the basis vector by some angle, I’m not sure if rotation in higher dimensional space makes sense or if it’s better to try to learn that reconstruction

19
 
 

I've noticed a trend recently of authors adding more formalism than needed in some instances (e.g. a diagram/ image would have done the job fine).

Is this such a thing as adding more mathematics than needed to make the paper look better or perhaps it's just constrained by the publisher (whatever format the paper must stick to in order to get published)?

20
21
 
 

Hey friends! Has anyone written any code before where they freeze particular parameters in a layer of a NN?

I don't mean freezing particular layers of a NN, which I can successfully do (e.g.:

for param in resnet18.fc.parameters():
    param.requires_grad = True

What I mean is freezing particular parameters within a layer - by applying some sort of mask or otherwise.

Thanks so much! :)

22
 
 

I recently completed the interview process for a machine learning engineer position at H&M Group and cleared all rounds. However, during the offer stage, the company ultimately rejected me due to my status as a college dropout and my current pursuit of a bachelor's degree from the University of London. This experience has been frustrating.

I have worked in the ML domain for 7 years and even worked with Razorpay.

Even on their JD, it mentioned: "Have a degree in computer science, engineering or related field, or equivalent practical experience."

23
 
 

I gotta love how people who go to MILA but are affiliated to UdeM almost never mention UdeM as their university on linkedin. Isn't this a bit of a lying by omission? Also found several people who are doing professional course based degrees that involve no research but still would simply mention MILA and nothing more.

24
25
 
 

Hey. I got some pretty impressive results for my pet-project that I've been working on for the past 1.5 years.

#MNIST inference performance using one flat layer without convolution on #Ryzen #7950X3D CPU: 46 millions predictions per second, throughput: 25 GB/s, accuracy: 98.05%. #AGI achieved. #ACI (Artificial Collective Intelligence), to be honest.

Modified Tsetlin Machine on MNIST performance

view more: next ›