saintshing

joined 1 year ago
[–] saintshing@alien.top 1 points 11 months ago

Does this work in real time or your model has access to the entire sequence so you can use context from before and after the current time point?

You have to be careful with leaking when you preprocess the training data if you remove the laughter and leave an silent time interval.

The text based approach may work but it may not give you a precise timing.

[–] saintshing@alien.top 1 points 11 months ago

Or you try to teach someone else

By implementing a model without looking at the paper, you essentially perform autoencoding/masked language modelling and learn a more compact latent representation.

[–] saintshing@alien.top 1 points 1 year ago

I am not sure if I understand your question.

What exactly is your query? Is it "spell the poem in blocks"? Or you really want a "poem" query to always return also the part about spelling in minecraft blocks even though you haven't mentioned anything about minecraft blocks?

These two things are not associated together in human languages. I meant you can create training data to force them to be embedded together if you want. You can also add a layer on top of the vector db, so some metadata is stored together with the embedding which can help you retrieve related documents.

[–] saintshing@alien.top 1 points 1 year ago

How we killed SQL and built a machine learning model in its place

Please consider using a different subtitle

[–] saintshing@alien.top 1 points 1 year ago (1 children)

First of all, do you already have the data? Scraping and maintaining an updated product list is an entirely separate task and it is not easy.

If you have already scrapped the data, what data do you have? Just title, or you have the description, brand, model number, images? There was a kaggle competition on matching two products on Shopee for price match guarantee(based on titles and images). You can look at some of the winning solutions.

https://www.kaggle.com/competitions/shopee-product-matching

I looked at your post history. You are tring to sell your price monitoring tool but it seems like you dont have a working prototype based on this question?

[–] saintshing@alien.top 1 points 1 year ago

additional information about the products than run at a specific time (like product family, color, brand, etc.).

Seems easier to just search for images based on the additional information and then train on those images. You would have to first predict the brand(which again requires you to extract visual features from product images of that brand) and maybe do ocr(slow) and use that to filter region proposal.

[–] saintshing@alien.top 1 points 1 year ago

mlc llm(deploying llms on mobiles and in browsers)

[–] saintshing@alien.top 1 points 1 year ago (1 children)

Instead of using gaussian noise(in the latent space), I wonder if we can introduce noise by randomly inserting/deleting/replacing/swaping words. Cant we train a BERT model to predict the original text from a noise-added text?

view more: next ›