sshh12

joined 10 months ago
[–] sshh12@alien.top 1 points 9 months ago

The best way is still open to some research but my understanding is that current open source SOTA is ShareGPT4V uses a high quality dataset based on GPT4V + I believe a LLaVA-like architecture. This works by essentially encoding the other domain as text embeddings that are understood by the LLM.

If you are interested I have a library for more easily training these on custom modalities: https://github.com/sshh12/multi_token (uses basically the same idea from the LLaVA 1.5 paper)

[–] sshh12@alien.top 1 points 10 months ago

My rule of thumb has been to LoRA (r between 4 and 16) until unsatisfied with results. It of course depends on data/task but imo most cases don't require full fine-tune and perf/compute ROI is low.

[–] sshh12@alien.top 1 points 10 months ago

Have/currently helping hire for senior MLE right now. Personal preference is the more abstract version but that being said I don't think it would play a strong role (either is fine). I know in some cases recruiters will screen based on keywords for the job description so potentially good to hit those.

I think I've seen more cases of overtechnical resumes not living up to the technicalness than abstract resumes that were not technical enough in interviews.

[–] sshh12@alien.top 1 points 10 months ago (1 children)

I've been working on some experimental context window extensions using multimodal models https://github.com/sshh12/multi_token

Similar to the idea of putting text into an image for GPT4V, I'm just directly encoding chunks of texts into embeddings and injecting them in the models. This gives you a very lossy 128x extension of your context window which is pretty massive.

[–] sshh12@alien.top 1 points 10 months ago

+1, when in doubt, LLM it out.

You could also ask for explanations so when it gets it wrong, you can work on modifying your prompts/examples to get better performance.

Potentially you wouldn't want to do this if:

  • Your classification problem is very unusual/cannot be explained by a prompt
  • You want to be able to run this extremely fast or on a ton of data
  • You want to learn non-LLM deep learning/NLP (in which case I would've suggested basically some form of finetuning BERT)
[–] sshh12@alien.top 1 points 10 months ago (2 children)

Definitely one tricky part as you mentioned is the dataset. In an ideal world, you'll have a supervised dataset of (document, personality type) pairs and you can train a model on these (just like u/Veggies-are-okay mentioned).

Assuming you don't have this data, a couple options:

  • Make the data. Some quick google searches show that many celebrities do have known Big-5s. You could manually curate Big-5s and text written by these celebrities to build these pairs.
  • Use synthetic data. Try asking an LLM (like ChatGPT) to write a text on a random topic as if they were $RANDOM big-5 then just use these results as your training pairs.
  • Try clustering. Potentially similar personality types have similar embeddings. Take a dataset of writings, embed them using something like BERT, label/best-effort-guess a few and then predict personalities based on the proximity of a piece of known big-5 text in the embedding space. You could extend this to training a model that asks "do text A and text B display the same big-5" which could potentially be an easier problem to get samples for and then run this model against a set of know big-5s and your unknown example.
  • Use a proxy. There might be datasets/models out there that predict heuristics that could be combined to find big 5. Like maybe a sentiment score is correlated with agreeableness. Potentially you might be able to create word/phrase banks such that using certain phrases is potentially indicative of a leaning on big-5 ("has_neurotic_phrases" is then a feature in your model)
[–] sshh12@alien.top 1 points 10 months ago

Recently had to make a similar decision but as new grad whether to go directly into ML industry vs masters/PhD route. After speaking with a bunch of Machine Learning Engineers and Data Scientists in industry it my main conclusions were:

  1. Most people whether or not they went the PhD route were happy with what they did, where they ended up, and how much they are being paid (the exception being people who started PhD and didn't complete it -- that group regretted spending time on it)
  2. Most MLE/DS roles, especially now in the AI craze, are not researchy and are much more so applied in a way that a PhD isn't usually needed

Ended up going directly into an MLE/DS role and no regrets so far 1 year in and I honestly don't think I would've gotten a better/higher paid position having had a PhD.

[–] sshh12@alien.top 1 points 10 months ago

Huge fan of modal, have been using them for a couple serverless LLM and Diffusion models. Can be definitely on the costly side, but like that the cost directly scales based on requests and setup is trivial.

recent project with modal: https://github.com/sshh12/llm-chat-web-ui/tree/main/modal

[–] sshh12@alien.top 1 points 10 months ago

Hey! I wrote a blog post recently only how these types of vision LLMs work: https://blog.sshh.io/p/large-multimodal-models-lmms

Specifically focusing on LLaVA, but generally the same high level idea.

[–] sshh12@alien.top 1 points 10 months ago

Can't speak for industry standards, but a little bit ago I worked on aerial object recognition and we used YOLO + NVIDA Jetson. Jetson seemed like the best GPU accelerated hardware that was light enough to mount to a smallish drone.