The offer is finalized

Sorry, the offer is not available,
but you can perform a new search or explore similar offers:

Similar offers

Procurement Engineer

Qualifications BS in Civil Engineering; Master's or Doctoral Degree in related field, an added advantage.Licensed/Registered Civil Engineer with experience a...

Filipinas Dravo Corporation - National Capital Region

Published a month ago

Senior Associate Facilities Engineer

Job Description Make an impact with NTT DATA. Join a company that is pushing the boundaries of what is possible. We are renowned for our technical excellence...

Ntt Data - National Capital Region

Published a month ago

Implementation Engineer (Manage Engine)

Salary: 30K - 40K monthly Location: Hexagon Corporate Center, Quezon City Work setup: Hybrid | 9AM-6PM | 4 days onsite 1 day WFH RESPONSIBILITIES:Works with ...

Ayuda Business Management Solutions Inc. - National Capital Region

Published a month ago

Technical Lead Expert - Bas

At Globe, our goal is to create a wonderful world for our people, business, and nation. By uniting people of passion who believe they can make a difference, ...

Globe Telecom - National Capital Region

Published a month ago

Site Reliability Engineer

return to results

Suggestions:

See more offers of employment in Engineering

See more job offers in National Capital Region

See more job offers in Taguig City

Details of the offer

Position Description: As a Site Reliability Engineer (SRE) within our team, you will play a key role in ensuring the reliability, scalability, and performance of our systems and infrastructure. Reporting to the OCC, ITSM & ServiceNow Manager, you will be responsible for closely collaborating with cross-functional teams to implement best practices, automate processes, and proactively monitor our systems to maintain optimal uptime and a satisfactory user experience.
Your future duties and responsibilities: SITE SREs will improve operational standards & update documentation:Evaluate current operational practices and identify areas for improvement.Develop and implement standardized processes and procedures to enhance efficiency and effectiveness.Maintaining up-to-date documentation in Confluence (KB, FEX, etc.)SITE SREs will collaborate with DevOps teams to create a robust CI/CD pipeline for fully automated applications and platform deployment:Design and architect a Continuous Integration/Continuous Deployment (CI/CD) pipeline to automate the build, test, and deployment processes.Implement tools and technologies such as Jenkins, GitLab CI/CD, or similar, to streamline the pipeline.Integrate automated testing frameworks to ensure code quality and reliability throughout the deployment pipeline.Be the primary point of contact for code deployments.SITE SREs will take ownership of, manage, and enhance the release process, focusing on scalability, efficiency, and quality:Lead the planning, coordination, and execution of up-to-date releases across multiple products and environments.Continuously monitor, improve, and validate release processes based on feedback and metrics.SITE SREs will provide support for regular production updates and Job AppWorks corrections:Coordinate with development teams to prioritize and schedule production & maintenance updates.Execute deployment plans and verify successful updates while minimizing downtime and impact on users.Troubleshoot and resolve critical issues with job execution, including errors, failures, and unexpected behavior.Analyze job execution logs and metrics to identify any errors, failures, or performance bottlenecks.Reduce the number of redundant/duplicate alerts that are no longer used and be part of the optimization.SITE SREs will be On-call Support and Incident handling:Participate in an on-call rotation to provide 24/7 support for production systems, responding to alerts and incidents in a timely manner.Document incident response procedures and lessons learned for continuous improvement.Monitor system health and respond promptly to incidents, escalating as necessary for resolution.SITE SREs will be responsible for validation & Sanity Checks:Perform post-PPR and production deployment sanity checks to ensure system stability and functionality.Utilize both manual and automated checks to validate the integrity and coherence of the deployed code and configurations.Document and report any issues discovered during the validation process for further investigation and resolution.SITE SREs will be responsible for ServiceNow Ticket handling:Monitor, prioritize, and manage ServiceNow tickets according to defined SLAs and operational priorities.Assign tickets to appropriate teams or individuals for resolution and ensure timely follow-up and closure.Maintain accurate records and documentation within the ServiceNow platform.SITE SREs will be responsible for Capacity planning & Security Alert prioritization:Perform capacity testing to validate the scalability of systems and infrastructure under various load conditions.Prioritize security alerts based on severity and potential impact on system integrity and data confidentiality.Coordinate with security teams to assess and respond to security alerts promptly, implementing appropriate mitigation measures.SITE SREs will monitor DevOps Platform Products:Monitor the stability, performance, and availability of DevOps platform products such as JFrog, GitLab, Vault, Kong, ELK, Rancher, and Kubernetes (K8s).SITE SREs will define Monitoring Objectives:Collaborate with stakeholders to determine the key objectives and metrics for monitoring latency, traffic, errors, and saturation.Identify critical service-level indicators (SLIs) and objectives (SLOs) to ensure the monitoring aligns with business and user expectations.Required qualifications to be successful in this role: Bachelor's or Master's degree in Software Engineering, Computer Science, or equivalent.2+ years of experience with Kubernetes.5+ years of expertise in Linux administration.3+ years of strong coding skills in languages such as Java, React.Js, etc.2+ years of experience in infrastructure-related tools (Terraform, Ansible, VScode, Postman, etc.).Monitoring infrastructure and applications (Splunk, ECK, Grafana, Prometheus….).A solid understanding of CI/CD concepts, version control systems, and testing (experience with Jenkins, AppWorks, Git, Docker, Gitlab, etc.).Collaboration (Jira/Confluence, ServiceNow).Deep understanding of task automation.Proficiency in DevOps principles to ensure effective collaboration between IT operations and developers.Expertise in incident management and application security.Ability to define Service Level Objectives (SLOs), Service Level Agreements (SLAs), and Service Level Indicators (SLIs).Excellent communication skills to collaborate with diverse teams.Analytical mindset to understand and solve complex problems.Autonomy and sense of responsibility to manage various aspects of the role.Can work well under pressure and manage multiple priorities.Must be amenable to working onsite 2 days a week in Taguig.Skills: Linux, Kubernetes

#J-18808-Ljbffr

Nominal Salary: To be agreed

Source: Grabsjobs_Co

Job Function:

Engineering

Requirements

Built at: 2024-12-23T15:14:40.460Z

Sorry, the offer is not available, but you can perform a new search or explore similar offers:

Similar offers

Procurement Engineer

Senior Associate Facilities Engineer

Implementation Engineer (Manage Engine)

Technical Lead Expert - Bas

Site Reliability Engineer

Details of the offer

Requirements

Sorry, the offer is not available,
but you can perform a new search or explore similar offers: