We are seeking a highly skilled CloudOps Engineer for our cloud operations team within the Cloud Centre of Excellence (CCoE). This role requires deeply hands-on in AWS operations, troubleshooting, operating system management, and SRE practices. We are specifically looking for a candidate with strong hands-on technical expertise in cloud operations. The ideal candidate will bring experience in AWS infrastructure, Linux/Windows operations, container-based deployments, GitLab CI/CD automation, and end-to-end troubleshooting, ensuring our cloud platforms remain secure, resilient, and compliant.
Key Responsibilities
Operational Excellence & SRE
• Drive Site Reliability Engineering (SRE) practices, including SLIs, SLOs, SLAs, error budgets, and automation of operational tasks.
• Manage incident response, root cause analysis, and post-incident reviews to strengthen platform resilience.
• Build and optimize observability and monitoring frameworks (CloudWatch, Grafana, Loki, Tempo, Prometheus).
• Implement self-healing systems and automated recovery where possible.
• Oversee OS patching to ensure no outstanding vulnerabilities, and maintain compliance with security standards.
Hands-on Cloud & Systems Engineering
• Provision, manage, and troubleshoot AWS services such as EC2, ECS, EKS, Lambda, ELB, S3, EFS, RDS, VPC, and IAM.
• Hands-on administration of Linux and Windows operating systems, including hardening, patching, and vulnerability remediation.
• Troubleshoot complex issues across infrastructure, applications, networks, and operating systems.
• Deploy and manage container-based workloads (ECS, EKS, Docker).
• Automate operations using Infrastructure-as-Code (CloudFormation, Terraform) and scripting (Python, Ansible, Bash, PowerShell).
• Implement and optimize GitLab CI/CD pipelines for operations-driven automation.
• Support cloud security, IAM, encryption, and compliance standards
Requirements
Basic Qualifications
• 8+ years of experience in cloud operations, engineering, or SRE roles.
• Strong hands-on expertise with AWS services (EC2, ECS, EKS, Lambda, ELB, S3, EFS, VPC, IAM).
• Good experience with Linux and Windows operating systems, including hardening and patching.
• Proficiency with scripting languages (Python, Ansible, Bash, PowerShell).
• Hands-on experience in container-based deployments (ECS, EKS, Docker).
• Proven ability in infrastructure and application troubleshooting.
• Deep knowledge of SRE principles, including monitoring, incident management, and SLIs/SLOs/SLAs.
• Strong expertise in GitLab CI/CD and automation frameworks (CloudFormation, Terraform).
• Working knowledge of cloud security, IAM, and encryption practices.
• Excellent problem-solving, debugging, and communication skills.
Preferred Qualifications
• AWS certifications: Solutions Architect – Professional, DevOps Engineer – Professional, or SysOps Administrator.
• Experience with observability and monitoring tools (CloudWatch, Grafana, Loki, Tempo, Prometheus).
• Familiarity with multi-cloud or hybrid-cloud operations (AWS and OCI).
• Experience managing high-scale, high-availability, mission-critical environments.
• Track record of implementing automation, SRE practices, and operational process improvements

