As a Site Reliability Engineer, you will ensure system availability, automate workflows, manage CI/CD pipelines, and support cloud technologies.
Position: Site Reliability Engineer
Job Summary
As a Site Reliability Engineer, you will play a critical role in ensuring the availability and performance of our customer-facing platform. You will work closely with DevOps, DBA, and Development teams to provision and maintain infrastructure, deploy and monitor our applications, and automate workflows. Your contributions will have a direct impact on customer satisfaction and overall experience.
Responsibilities and Deliverables
- Manage, monitor, and maintain highly available systems (Windows and Linux)
- Analyze metrics and trends to ensure rapid scalability.
- Address routine service requests while identifying ways to automate and simplify.
- Create infrastructure as code using Terraform, ARM Templates, Cloud Formation.
- Maintain data backups and disaster recovery plans.
- Design and deploy CI/CD pipelines using GitHub Actions, Octopus, Ansible, Jenkins, Azure DevOps.
- Adhere to security best practices through all stages of the software development lifecycle
- Follow and champion ITIL best practices and standards.
- Become a resource for emerging and existing cloud technologies with a focus on AWS.
Organizational Alignment
- Reports to the Senior SRE Manager
- This role involves close collaboration with DevOps, DBA, and security teams.
Technical Proficiencies
- Hands-on experience with AWS is a must-have.
- Proficiency analyzing application, IIS, system, security logs and CloudTrail events
- Practical experience with CI/CD tools such as GitHub Actions, Jenkins, Octopus
- Experience with observability tools such as New Relic, Application Insights, AppDynamics, or DataDog.
- Experience maintaining and administering Windows, Linux, and Kubernetes.
- Experience in automation using scripting languages such as Bash, PowerShell, or Python.
- Configuration management experience using Ansible, Terraform, Azure Automation Run book or similar.
- Experience with SQL Server database maintenance and administration is preferred.
- Good Understanding of networking (VNET, subnet, private link, VNET peering).
- Familiarity with cloud concepts including certificates, Oauth, AzureAD, ASE, ASP, AKS, Azure Apps, Load Balancers, Application Gateway, Firewall, Load Balancer, API Management, SQL Server, Databases on Azure
Experience
- 5+ years of experience in SRE or System Administration role
- Demonstrated ability building and supporting high availability Windows/Linux servers, with emphasis on the WISA stack (Windows/IIS/SQL Server/ASP.net)
- 3+ years of experience with CI/CD tools
- 3+ years of experience working with cloud technologies including AWS, Azure.
- 1+ years of experience working with container technology including Docker and Kubernetes.
- Comfortable using Scrum, Kanban, or Lean methodologies.
Education
- Bachelor’s Degree or College Diploma in Computer Science, Information Systems, or equivalent experience.
Top Skills
Ansible
Appdynamics
Application Insights
Arm Templates
AWS
Azure Devops
Bash
Cloud Formation
Datadog
Github Actions
Jenkins
Kubernetes
Linux
New Relic
Octopus
Powershell
Python
SQL Server
Terraform
Windows
Similar Jobs
Cloud • Information Technology • Productivity • Software • Automation
As a Senior Site Reliability Engineer, you will enhance system reliability, automate infrastructure, mentor engineers, and implement observability practices.
Top Skills:
AnsibleAWSNew RelicPythonSplunkTerraform
Cloud • Software
Lead SRE responsible for designing, building, and optimizing cloud and big-data infrastructure to ensure availability, scalability, and security of ML/AI systems. Provide technical leadership, mentor teams, troubleshoot production issues, drive automation, and define the platform roadmap while collaborating with cross-functional stakeholders.
Top Skills:
AirflowAlertmanagerAWSCloudwatchEksElkEmrGoGobblinGrafanaHadoopHdfsHiveKubernetesLinuxOpentelemetryPrometheusPythonSagemakerSparkTerraformThanos
Insurance
The Technical Lead will oversee global Operations and Technology teams, focusing on integrating legacy systems with modern applications, cloud digital transformation, and leading operational support for applications and infrastructure.
Top Skills:
Apache CamelAppdynamicsAzureAzure DevopsDockerDynatraceElastic ElkGitlabsGrafanaJenkinsKafkaKubernetesOpensearchPrometheusSplunkSpring Boot
What you need to know about the Hyderabad Tech Scene
Because of its proximity to leading research institutions and a government committed to the city's growth, Hyderabad's tech scene is booming. With plans to establish India's first "AI city," the city is on track to become one of the world's most anticipated tech hubs, with companies like TransUnion, Schrödinger and Freshworks, among others, already calling the city home.



