Data integration
Machine Learning
Community Rules:
- Be nice. No offensive behavior, insults or attacks: we encourage a diverse community in which members feel safe and have a voice.
- Make your post clear and comprehensive: posts that lack insight or effort will be removed. (ex: questions which are easily googled)
- Beginner or career related questions go elsewhere. This community is focused in discussion of research and new projects that advance the state-of-the-art.
- Limit self-promotion. Comments and posts should be first and foremost about topics of interest to ML observers and practitioners. Limited self-promotion is tolerated, but the sub is not here as merely a source for free advertisement. Such posts will be removed at the discretion of the mods.
- Acquire necessary data (scraping, processing into useful format storing somewhere accessible i.e. cloud storage or local hard drive) python, requests, pandas, google cloud storage.
- Prepare training dataset (feature engineering, usually process into csv) python pandas, sometimes random external libraries sometimes in other languages.
- Design initial model architecture, get training to actually work usually involves a lot of debugging. All in jupyter notebook using pytorch, local if possible else on gcp
- Evaluate model, loss + accuracy good to confirm model is learning and sanity check but mostly this involves using the eye test in a real world scenario so build poc product integration at this stage (usually python web api and some frontend for interacting either python desktop app or js web app just running locally)
- Iterate 3 + 4 until model is good enough for prod.
- Implement prod deployment usually python web api often get frontend specialist involved at this point to handle any potential frontend integrations.
- Maintain prod integration as bugs come up and new features are requested. Also continously eval model performance (depends on domain).
- Come back a year later and work through the whole process again adding incremental improvements to each step.
That's the usual life cycle of a project for me personally. Could take anywhere from 1 week to 3 months to get to working prod implementation depending on the project. Any longer than that and need to seriously reevaluate the approach.
Basically just regular software engineer dutues plus ml stuff.
All in jupyter notebook using pytorch
Jupyter seems like an awful environment to debug anything, no?
responsible for designing and developing machine learning systems, implementing appropriate ML algorithms, and conducting experiments.
play a crucial role in the development and implementation of cutting-edge artificial intelligence products.
skills in statistics and programming, as well as a deep understanding of data science and software engineering (Software engineering books) principles.
Responsibilities:
Study and transform data science prototypes
Design machine learning systems
Research and implement appropriate ML algorithms and tools
Develop machine learning applications according to requirements
Select appropriate datasets and data representation methods
Run machine learning tests and experiments
Perform statistical analysis and fine-tuning using test results
Train and retrain systems when necessary
Extend existing ML libraries and frameworks
Keep abreast of developments in the field
Read paper, convince customer to give more data than they want to give (biggest step)?, spend 90% of time integrating model, present to someone who makes WAY more money than me repeat.
Make 150k at defense contractor with masters+2.5 and fantastic WL balance and am happy. Look for better with phd, with masters it’s hard.
I’d say most of my work falls on the engineering side of things. Building pipelines to move data from A to B, building applications and APIs to serve a model and deploy it to the cloud using terraform. The actually machine learning accounts for less than 10% of my time
so mostly data engineering?
We are small company so I do a huge range of stuff.
-
New product dev from data collection through to model in production.
-
Improving our internal annotation tools
-
Automating workflows and pipelines.
-
Database wrangling, APIs and UI
-
Managing the other ML engineers
-
Business development
-
Stop CEO promising clients that AI can solve everything.
Tools:
IDE: JetBrains suite / VS 2022 / VS Code Languages: C# and Python Frameworks: .NET 7, PyTorch, ONNX. Backend: ASP.NET Core for the heavy stuff or production, else Flask Front end : Blazor WASM and occasionally Blazor server. Deployment: Local server or Azure cloud. Collaboration: Microsoft Teams and Microsoft Loop
Bro, teach me how you do that last one. My CEO (and alas, CTO is even worse) will "find a way" to solve every damn thing (including world hunger) with our products and then expect us to build it in a week.
🤣
Seriously though two things have cut through:
- “it’s better to under-promise and over-deliver”
- “maybe it will work, but we absolutely have to do a validation study first”
- Read papers/articles/blogs for new and updated models. (I work mostly on Language systems)
- Build POCs using new tools/models that can solve a problem or generate a new one :)
- Get data for this. Handle databases/data stores for storing data, on the cloud.
- Build and deploy pipelines connecting ML services with the databases. And potentially build something which is presentable.
- Monitor for abnormality and if there's any, republish the pipeline once resolved.