Machine Learning Engineer, LLM Fine-tuning and Performance
Job Description
As a Machine Learning Engineer (LLM Fine-tuning & Performance Specialist), you'll play an integral role in improving the accuracy and performance of fine-tuned Large Language Models (LLMs) for real-world applications. This ground-breaking opportunity allows you to work with brand new ML technologies, collaborating closely with partners to drive innovation and ensure smooth integration and deployment of ML solutions. Your expertise will be essential in automating ML workflows and optimizing performance, making a lasting impact in the field of AI.
What you'll be doing:
Develop, implement, and be responsible for processes for optimizing LLMs using domain-specific data, improving the model's capability to complete structured tasks.
Develop and implement techniques to measure LLM performance, defining and monitoring metrics such as recall, F1, perplexity, BLEU, ROUGE, etc.
Develop with tools like ONNX, TensorRT for optimizing model inference on specialized hardware.
Collaborate with ISVs and IHVs to understand their outstanding performance requirements and ensure successful model integration.
Use C++ to improve ML model performance, specifically in performance-critical systems, and provide technical mentorship to junior engineers.
What we need to see:
8+ years of validated experience in system software or related field.
M.S. or higher degree in Computer Science/Data Science/Engineering and related field or equivalent experience.
Deep understanding of transformer architectures and large language models like GPT, BERT, T5, or similar.
Validated hands-on experience with fine-tuning LLMs for specific tasks and improving model performance using libraries like PyTorch.
Strong ability to assess and optimize model performance using relevant metrics and evaluation techniques.
Proficiency in crafting and automating ML workflows using tools such as Kubeflow, MLflow, or Airflow.
Excellent problem-solving skills, especially in debugging and improving LLM accuracy for real-world applications.
Proficiency in Python and knowledge of C++ for optimizing performance and developing system-level integrations.
Strong interpersonal skills for effective collaboration with internal teams and external partners.
Ways to stand out from the crowd:
Experience with LLM-based function and tool calling systems.
Understanding of distributed training for LLM fine-tuning and cloud platforms like Nvidia's NVCF.
Familiarity with hardware acceleration for ML workloads, including GPU and specialized hardware optimizations.
You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.