Senior System Software Engineer - Cloud Infrastructure
Job Description
NVIDIA is looking for a Senior System Software Engineer - Cloud Infrastructure to join the NGN GPU Cloud Infrastructure group, working to design and deliver the platforms that enable deep learning, game streaming, content delivery, and generative AI systems in the cloud. This position will focus on design, development, and implementation of software-defined infrastructure and automation for compute, network, and storage systems. We have crafted a team of extraordinary people stretching around the globe, whose mission is to push the frontiers of what is possible today and define the platform of tomorrow.
What you will be doing:
Design, prototype, implement and help operate the next generation of software to automate global cloud infrastructure for NVIDIA GPU-accelerated applications such as Deep Learning, Game Streaming, Content Delivery, and Generative AI.
Actively participate in systems design, code reviews, test authoring, feature development, bug triage, automation, configuration, documentation, and bug fixes – including open source and NVIDIA internal software projects.
Benchmark, evaluate, and optimize the performance, reliability, and efficiency of network and storage subsystems and applications.
Lead and participate in PoC and development efforts for various application use cases, working with cloud tenants, application owners, and solutions architects to design optimal and performant systems.
What we need to see:
BS or MS in Computer Science or Computer Engineering (or equivalent experience)
8+ years of professional experience in software engineering, devops, and/or site reliability engineering.
Excellent problem solving, collaborative, and interpersonal skills. Outstanding communication and soft skills, able to present to senior management in a sensible and persuasive manner. Ability to influence and build relationships with other software teams and functional groups.
Exceptional knowledge and experience designing and writing concurrent code for large-scale and performance-optimized distributed systems.
Experience integrating network, storage, and compute technologies with virtual machine and container orchestration systems.
A security-first approach with a desire to deliver highly reliable, high-quality products.
Ability to root-cause functional and performance issues in distributed systems – and drive issues to closure.
Expert-level Linux systems configuration, administration, automation, debugging, and performance optimization (ex. RHEL, CentOS, Ubuntu, Rocky Linux).
Ways to stand out from the crowd:
Production experience with git ops and devops workflows and tooling such as FluxCD, ArgoCD, Helm Charts, Terraform and/or Ansible.
Prior experience running Kubernetes clusters in production.
Proven skills in modern container networking and storage architecture.
Experience working in distributed teams across multiple time zones
Proficiency with Go (Golang) and Python.
With a competitive salary package and benefits, NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us. Are you a creative and autonomous Senior Software Engineer, who loves challenges? Do you have a genuine passion for advancing the state of Data Science across a variety of industries? If so, we want to hear from you. NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us. If you are creative and autonomous, we want to hear from you!
The base salary range is 180,000 USD - 339,250 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.