Senior Manager, Solutions Architecture

Deloitte
Birmingham, AL
We are seeking an accomplished HPC/AI Platform Engineering Manager to lead the design, implementation, and optimization of advanced computing environments that power AI, ML, and LLM workloads. This role is ideal for a hands-on technologist with deep expertise in HPC systems, GPU-accelerated infrastructure, and large-scale AI deployments-combined with the leadership's ability to drive fast-paced, innovative initiatives. You will collaborate with engineering, research, and business teams to define infrastructure strategy, assess emerging technologies, and deliver scalable, secure, and high-performance solutions. This role is pivotal in advancing generative AI, analytics, and model training capabilities through robust architecture, automation, and software integration. Recruiting for this role ends on January 31, 2026. Key Responsibilities Architecture & Strategy + Design and implement HPC and AI infrastructure leveraging HPE Apollo, ProLiant, Cray, and similar enterprise-class systems. + Architect ultra-low-latency, high-throughput interconnect fabrics (InfiniBand NDR/800G, RoCEv2, 100-400 GbE) for large-scale GPU and HPC clusters. + Deploy and optimize cutting-edge NVIDIA GPU architectures (e.g. H100, H200, RTX PRO / Blackwell series, NVL based systems) + Develop scalable hybrid HPC and cloud architectures across Azure, AWS, GCP, and on-prem environments. + Establish infrastructure blueprints supporting secure, high-throughput AI workloads. AI/ML & LLM Platform Enablement + Build and manage AI/ML infrastructure to maximize performance and productivity of ML research teams. + Architect and optimize distributed training, storage, and scheduling systems for large GPU clusters. + Implement automation, observability, and operational frameworks to minimize manual intervention. + Deploy and manage GPU-accelerated Kubernetes clusters for AI and HPC workloads. + Integrate open-source GenAI components, including vector databases and AI/ML frameworks, for model serving and experimentation. + Identify and resolve performance and scalability of bottlenecks across infrastructure layers. Software Engineering & Integration + Develop and maintain automation tools and utilities in Python, Golang, and Bash. + Integrate HPC infrastructure with ML frameworks, container runtimes, and orchestration platforms. + Contribute to job scheduling, resource management, and telemetry components. + Build APIs and interfaces for workload submission, monitoring, and reporting across heterogeneous environments. Containerization & Orchestration + Design Kubernetes and OpenShift architectures optimized for GPU and AI workloads. + Implement GPU scheduling, persistent storage, and high-speed networking configurations. + Collaborate with DevOps/MLOps teams to build CI/CD pipelines for containerized research and production environments. Systems & Automation + Oversee Linux system architectures (RHEL, Ubuntu, OpenShift) with automation via Ansible and Terraform. + Implement monitoring and observability (e.g Prometheus, Grafana, DCGM, and NVML) + Ensure system scalability, reliability, and security through proactive optimization. Governance & Leadership + Ensure architecture and deployments comply with organizational and regulatory standards. + Conduct technical workshops, architecture reviews, and presentations for both technical and executive audiences. + Define and drive the infrastructure roadmap in partnership with business stakeholders. + Mentor and lead engineering teams, translating business requirements into actionable technical deliverables. + Foster innovation and cross-functional collaboration to accelerate AI/ML initiatives. Required Qualifications + 10+ years of experience in HPC architecture, systems engineering, or platform design with a focus on architecting and operating on-premises Kubernetes for large-scale AI/ML workloads. + 3+ years working hands on and with a proficiency utilizing Linux, Python, Golang, and/or Bash. + 2+ years leading teams and/or processes + 2+ years of recent experience working with GPU platforms (strong preference for NVIDIA), distributed systems, and performance optimization. + Ability to travel 0-10%, on average, based on the work you do and the customers you serve. + Must be a US Citizen. Preferred Qualifications + Master's or Ph.D. in Computer Science, Electrical Engineering, or related discipline and work experience. + Demonstrated success supporting LLM training and inference workloads in both R&D and production environments. + Strong knowledge of high-performance networking, storage, and parallel computing frameworks. + Exceptional communication and leadership skills, capable of bridging technical depth with executive strategy. The wage range for this role takes into account the wide range of factors that are considered in making compensation decisions including but not limited to skill sets; experience and training; licensure and certifications; and other business and organizational needs. The disclosed range estimate has not been adjusted for the applicable geographic differential associated with the location at which the position may be filled. At Deloitte, it is not typical for an individual to be hired at or near the top of the range for their role and compensation decisions are dependent on the facts and circumstances of each case. A reasonable estimate of the current range is $130,000 to $241,000. You may also be eligible to participate in a discretionary annual incentive program, subject to the rules governing the program, whereby an award, if any, depends on various factors, including, without limitation, individual and organizational performance. Information for applicants with a need for accommodation: EA_ExpHire All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability or protected veteran status, or any other legally protected basis, in accordance with applicable law.
Posted 2025-11-20

Recommended Jobs

Software Developer 3

Oracle
Montgomery, AL

**Job Description** Design, develop, troubleshoot, and debug software programs for databases, applications, tools, networks etc. As a member of the software engineering division, you will take an acti…

View Details
Posted 2025-10-21

No Touch | CDL-A Truck Driver | Regional

Oriole Transportation
Huntsville, AL

CDL-A Regional Truck Driver | Weekly Home Time | $1,191–$1,401 Weekly Avg. + $1,000 Sign-On Bonus Oriole Transportation is now hiring professional Class A CDL drivers for our Regional Fleet . …

View Details
Posted 2025-09-17

Food Service Team Leader

Target
Birmingham, AL

The pay range per hour is $21.75 - $37.00 Pay is based on several factors which vary based on position. These include labor markets and in some instances may include education, work experience and cer…

View Details
Posted 2025-11-12

Technical Manager - Network and Edge Security | Remote, USA

Optiv
Birmingham, AL

The Technical Manager for Network and Edge Security is a senior technical leader responsible for driving the success of client engagements, acting as a trusted advisor, and overseeing delivery excelle…

View Details
Posted 2025-11-07

Site Sanitation Manager

Perdue Farms, Inc.
Perry County, AL

Perdue Foods has a goal of becoming the most trusted name in premium proteins by creating products for consumers and for retail and foodservice customers around the globe while changing the way anima…

View Details
Posted 2025-10-26

Test Automation Engineer III

Birmingham, AL

Kforce has a client that is seeking a Test Automation Engineer III to join their team in Birmingham, AL. The Test Automation Engineer develops automated tests to validate the functionality of applicat…

View Details
Posted 2025-11-11

Production Associate - Utility III - 2nd Shift

Cintas Corporation
Montgomery, AL

Requisition Number: 213437  Job Description Cintas is seeking a Production Associate – Utility III to support the Rental Division. This position is responsible for performing a variety of produ…

View Details
Posted 2025-10-23

HVAC PRJ Mech Jrny (un)

Johnson Controls
Hoover, AL

What you will do  Do you have HVAC experience? We are looking for skilled Journeymen who have worked in residential, light commercial and/or heavy commercial markets to join our team to work with…

View Details
Posted 2025-10-16

Specialty Representative, Rheumatology - Birmingham, AL

AbbVie
Birmingham, AL

Company Description AbbVie's mission is to discover and deliver innovative medicines and solutions that solve serious health issues today and address the medical challenges of tomorrow. We strive to h…

View Details
Posted 2025-11-11

Senior FW Development Engineer

Oracle
Montgomery, AL

**Job Description** **Job Description** At the heart of Oracle Cloud Infrastructure (OCI) are Oracle hardware systems and our advanced Cloud Software stack. Would you like to: + Make an impact at the …

View Details
Posted 2025-10-16