Job Description
				  Responsibilities Deployment & Automation 
 -  Implement and maintain CI/CD pipelines using tools such as GitHub Actions, AWS CodePipeline, and Jenkins.  
  -  Automate infrastructure provisioning and management using Infrastructure-as-Code (IaC) with Terraform, CloudFormation, or AWS CDK.  
  -  Develop robust automation scripts and self-service tooling to minimize toil and enhance operational efficiency.  
  
 Capacity, Performance & Cost Optimization 
 -  Lead and implement operational cost optimization initiatives across cloud infrastructure and data platforms.  
  -  Configure, maintain, and tune auto-scaling policies and performance thresholds.  
  -  Develop and execute Resiliency Test plans and provide critical support for Performance testing efforts.  
  
 Incident Management & SRE Principles 
 -  Serve as a production on-call responder, employing strong troubleshooting skills to quickly resolve complex incidents.  
  -  Proficiently utilize ITIL framework concepts and ITSM tools (e.g., ServiceNow) for incident and change management.  
  -  Develop high-quality Root Cause Analysis (RCA) documentation and Knowledge articles to prevent future recurrence.  
  -  Implement and enforce SRE principles, including the definition and tracking of Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Error Budgets.  
  
 Observability & Monitoring 
 -  Manage and leverage advanced observability platforms (Dynatrace preferred, AppDynamics, ELK, etc.).  
  -  Implement distributed tracing with accurate context propagation across data services and applications.  
  -  Optimize monitoring queries, and configure actionable dashboards, alerts, and anomaly detectors using tools like Dynatrace and Kibana.  
  
 Data Analytics Platform Reliability 
 -  Ensure the reliability, performance tuning, and access control for Databricks cluster management and data pipelines.  
  -  Maintain Informatica workflow orchestration, connector reliability, and error handling for critical data flows.  
  -  Manage Power BI gateway health, access control, and ensure reliable data refresh processes.  
  
 Security & Compliance 
 -  Manage service accounts, access permissions, and roles following the principle of least privilege.  
  -  Create, deploy, and manage digital certificates and TLS/SSL configurations.  
  -  Execute effective remediation tasks and respond to security incidents as part of the operational team.  
  
 Qualifications Education & Experience 
 -  Bachelor's degree in Computer Science, Engineering, or a related technical field.  
  - 2 to 4 years of hands-on experience in a DevOps, Site Reliability Engineering (SRE), or Cloud Infrastructure role.  
  -  Practical, working experience with major cloud platforms, specifically AWS and Azure.  
  
 Technical Skills 
 -  Mid-level proficiency in Python or other scripting languages (e.g., Bash, Go) for automation tasks.  
  -  Mid-level proficiency with Configuration Management tools, including Ansible.  
  -  Strong knowledge of containerization technologies (Docker, Kubernetes/ECS).  
  -  Solid understanding of Linux systems and networking fundamentals (TCP/IP, DNS, Load Balancing).  
  -  Working knowledge of relational, cloud-native (e.g., AWS RDS), and NoSQL database technologies.  
  -  Direct hands-on experience supporting and maintaining data platforms like Databricks, Informatica, or Power BI is highly desirable.  
  
 Professional Attributes 
 -  Excellent written and verbal communication skills, with a proven ability to document complex systems.  
  -  Demonstrated ability to work independently, manage shifting priorities, and drive initiatives to completion.  
  -  Availability for on-call duties and to work outside of standard business hours as required to support a 24/7 production environment.  
 
 				 
				 Job Tags
				 Work experience placement, Shift work,