Senior Software Engineer, ML Inference
Cupertino, California, United States
Software and Services
We are seeking a Senior Software Engineer, ML Inference, to join Apple Maps. You will optimize, and scale machine learning models, focusing on large language models, for high-performance, production-scale inference. Collaborate with data scientists, researchers, and infrastructure teams to ensure efficient GPU-optimized deployment, handling tens of billions of daily requests.
We value a culture of speed and agility—"iterate quickly, improve continuously"—where rapid iteration, learning from failures, and constant refinement are at the core of how we operate. In this environment, you will develop and deploy solutions swiftly, learn from each iteration, and constantly refine your approach to deliver better results.
As a self-driven, results-oriented individual with a strong work ethic, you will play a key role in guiding the technical direction of the ML Platform, solving complex problems, and leading by example. You’ll bring leadership to the team through both mentorship and hands-on contributions, helping drive innovations in model optimization and performance tuning.
Description
* Optimize LLMs for Inference: Implement and enhance large language models for real-time and batch inference, balancing performance and resource efficiency.
* Advanced Inference Optimization: Apply techniques such as quantization and speculative decoding to reduce model size and accelerate inference without sacrificing accuracy. Leverage quantization-aware training (QAT) and post-training quantization (PTQ) to deploy models on resource-constrained hardware.
* Cross-Functional Collaboration: Partner with data scientists, ML researchers, and infrastructure engineering teams to understand model requirements, provide feedback, and ensure smooth deployment of models into production.
* Monitoring & Resource Management: Implement monitoring tools to profile and track the performance of models running on GPUs, including real-time monitoring of GPU utilization, memory usage, and inference throughput. Manage and optimize resource allocation to ensure high availability and minimal downtime.
* Continuous Improvement & R&D: Stay on top of the latest research in LLM inference techniques, GPU optimizations, and distributed systems to bring innovative improvements to the overall system.
Minimum Qualifications
- Bachelor's degree in Computer Science, Engineering, or related field (or equivalent experience).
- 5+ years in software engineering focused on ML inference, GPU acceleration, and large-scale systems.
- Expertise in deploying and optimizing LLMs for high-performance, production-scale inference.
- Proficiency in Python, Java or C++.
- Experience with deep learning frameworks like PyTorch, TensorFlow, and Hugging Face Transformers.
- Experience with model serving tools (e.g., NVIDIA Triton, TensorFlow Serving, VLLM, etc)
- Experience with optimization techniques like Attention Fusion, Quantization, and Speculative Decoding.
- Skilled in GPU optimization (e.g., CUDA, TensorRT-LLM, cuDNN) to accelerate inference tasks.
- Familiarity with cloud technologies like Docker, Kubernetes, AWS EKS for scalable deployment.
Key Qualifications
Preferred Qualifications
- Master’s or PhD in Computer Science, Machine Learning, or a related field.
- Understanding of ML Ops practices, continuous integration, and deployment pipelines for machine learning models.
- Familiarity with model distillation, low-rank approximations, and other model compression techniques for reducing memory footprint and improving inference speed.
- Strong understanding of distributed systems, multi-GPU/multi-node parallelism, and system-level optimization for large-scale inference.
Education & Experience
Additional Requirements
Pay & Benefits
Apple is an equal opportunity employer that is committed to inclusion and diversity. We take affirmative action to ensure equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant.