Senior Software Engineer, ML Inference

Posted: Jan 28, 2025

Weekly Hours: 40

Role Number:200588778

We are seeking a Senior Software Engineer, ML Inference, to join Apple Maps. You will optimize, and scale machine learning models, focusing on large language models, for high-performance, production-scale inference. Collaborate with data scientists, researchers, and infrastructure teams to ensure efficient GPU-optimized deployment, handling tens of billions of daily requests. We value a culture of speed and agility—"iterate quickly, improve continuously"—where rapid iteration, learning from failures, and constant refinement are at the core of how we operate. In this environment, you will develop and deploy solutions swiftly, learn from each iteration, and constantly refine your approach to deliver better results. As a self-driven, results-oriented individual with a strong work ethic, you will play a key role in guiding the technical direction of the ML Platform, solving complex problems, and leading by example. You’ll bring leadership to the team through both mentorship and hands-on contributions, helping drive innovations in model optimization and performance tuning.

Description

* Optimize LLMs for Inference: Implement and enhance large language models for real-time and batch inference, balancing performance and resource efficiency. * Advanced Inference Optimization: Apply techniques such as quantization and speculative decoding to reduce model size and accelerate inference without sacrificing accuracy. Leverage quantization-aware training (QAT) and post-training quantization (PTQ) to deploy models on resource-constrained hardware. * Cross-Functional Collaboration: Partner with data scientists, ML researchers, and infrastructure engineering teams to understand model requirements, provide feedback, and ensure smooth deployment of models into production. * Monitoring & Resource Management: Implement monitoring tools to profile and track the performance of models running on GPUs, including real-time monitoring of GPU utilization, memory usage, and inference throughput. Manage and optimize resource allocation to ensure high availability and minimal downtime. * Continuous Improvement & R&D: Stay on top of the latest research in LLM inference techniques, GPU optimizations, and distributed systems to bring innovative improvements to the overall system.

Minimum Qualifications

Bachelor's degree in Computer Science, Engineering, or related field (or equivalent experience).
5+ years in software engineering focused on ML inference, GPU acceleration, and large-scale systems.
Expertise in deploying and optimizing LLMs for high-performance, production-scale inference.
Proficiency in Python, Java or C++.
Experience with deep learning frameworks like PyTorch, TensorFlow, and Hugging Face Transformers.
Experience with model serving tools (e.g., NVIDIA Triton, TensorFlow Serving, VLLM, etc)
Experience with optimization techniques like Attention Fusion, Quantization, and Speculative Decoding.
Skilled in GPU optimization (e.g., CUDA, TensorRT-LLM, cuDNN) to accelerate inference tasks.
Familiarity with cloud technologies like Docker, Kubernetes, AWS EKS for scalable deployment.

Preferred Qualifications

Master’s or PhD in Computer Science, Machine Learning, or a related field.
Understanding of ML Ops practices, continuous integration, and deployment pipelines for machine learning models.
Familiarity with model distillation, low-rank approximations, and other model compression techniques for reducing memory footprint and improving inference speed.
Strong understanding of distributed systems, multi-GPU/multi-node parallelism, and system-level optimization for large-scale inference.

Pay & Benefits

At Apple, base pay is one part of our total compensation package and is determined within a range. This provides the opportunity to progress as you grow and develop within a role. The base pay range for this role is between $175,800 and $312,200, and your base pay will depend on your skills, qualifications, experience, and location.

Apple employees also have the opportunity to become an Apple shareholder through participation in Apple’s discretionary employee stock programs. Apple employees are eligible for discretionary restricted stock unit awards, and can purchase Apple stock at a discount if voluntarily participating in Apple’s Employee Stock Purchase Plan. You’ll also receive benefits including: Comprehensive medical and dental coverage, retirement benefits, a range of discounted products and free services, and for formal education related to advancing your career at Apple, reimbursement for certain educational expenses — including tuition. Additionally, this role might be eligible for discretionary bonuses or commission payments as well as relocation. Learn more about Apple Benefits.

Note: Apple benefit, compensation and employee stock programs are subject to eligibility requirements and other terms of the applicable plan or program.

Apple is an equal opportunity employer that is committed to inclusion and diversity. We take affirmative action to ensure equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant.

Senior Software Engineer, ML Inference

Summary

Description

Minimum Qualifications

Key Qualifications

Preferred Qualifications

Education & Experience

Additional Requirements

Pay & Benefits