Site Reliability Engineer (Manufacturing IT Operations)Full time
About 1 month ago, from Direct to Employer
As a Site Reliability Engineer (SRE) in the Manufacturing IT Operations – Incident Response Team, you will be responsible for leading incident response efforts, ensuring swift and effective resolution of critical system issues. You will also play a critical role in ensuring the reliability, scalability, and performance of our systems and services. SRE combines software engineering and operations to build, maintain, and support highly available and efficient infrastructure. Your expertise in troubleshooting and root cause analysis will be essential in identifying and addressing the underlying causes of incidents. You will work closely with software engineers, DevOps teams, and other stakeholders to implement preventive measures and enhance system resilience. Collaborating with cross-functional teams, you will design, implement, and automate robust systems, monitoring tools, and processes. With a strong focus on stability and uptime, you will proactively identify and resolve performance bottlenecks, optimize system architecture, and drive continuous improvement. Your keen eye for continuous improvement will also drive post-incident reviews and contribute to the creation of incident management best practices. By actively monitoring system health, responding to incidents in a timely manner, and implementing proactive measures, you will play a pivotal role in maintaining the stability and availability of our services, ensuring an exceptional user experience for our customers.
Your team You will report directly to the Incident Response Engineering Leader within the Manufacturing IT Operations team, who will provide guidance, support, and mentorship as you navigate your role. As a valued member of our dynamic Incident Response Team, you will collaborate closely with technically skilled professionals, including software engineers, DevOps specialists, Subject Matter Experts, and other SREs. In addition, you will have the opportunity to directly collaborate with our site customers and users, ensuring their needs and expectations are met through reliable and high-performing systems. Working within a cross-functional and collaborative environment, you will contribute to the success of our Incident Response team, which is dedicated to ensuring the reliability and availability of our site's systems. Our Incident Response team fosters a culture of technical expertise, continuous learning, and knowledge sharing, where ideas are encouraged, and innovation is embraced.
How success looks like Success as a Site Reliability Engineer (SRE) involves different areas of the role including incident response, monitoring and reliability, and effectively collaborating with customers and users, addressing their needs and expectations:
Incident Response: Swiftly respond to and resolve critical incidents, ensuring minimal impact on system availability and user experience while driving continuous improvement in incident management processes.Reliability: Ensure high system availability and reliability through robust monitoring, optimization of system architecture, and cross-functional collaboration to design and implement resilient systems.Monitoring: Implement comprehensive monitoring solutions to gain real-time insights into system performance, enabling proactive incident response and continuous improvement of system visibility and resource optimization.Working with Customers/Users: Collaborate directly with customers and users to understand their needs, proactively address concerns, and provide exceptional customer support to ensure reliable and performant systems that meet their expectations.Responsibilities of the role Lead incident response efforts, swiftly resolving critical incidents to minimize downtime and user impact.Implement effective incident management processes, ensuring clear communication, coordination, and documentation.Conduct root cause analysis, implementing preventive measures and driving continuous improvement.Job Qualifications Role Requirements Technical Expertise and Experience:
Knowledge or familiarity in system administration, including Linux/Unix environments, cloud platforms (such as AWS, Azure, or GCP).Experience with configuration management tools and infrastructure-as-code frameworks (e.g., Terraform).Proficiency in at least one programming language (e.g., Python, C#) and experience with scripting for automation tasks.Understanding of networking protocols, network infrastructures, load balancing, and DNS management.Familiarity with containerization and orchestration technologies (e.g., Docker, Kubernetes).Familiarity with databases and proficiency in writing SQL queries.Experience or familiarity with monitoring and observability tools (e.g., Prometheus, Grafana).Knowledge of incident response methodologies, root cause analysis, and implementing preventive measures.Understanding of security best practices and experience with implementing secure systems.Experience in Manufacturing Execution Systems (e.g. Proficy) or Manufacturing Operations is a plus.Soft Skills: Strong problem-solving and troubleshooting skills, with an ability to analyze complex issues and devise effective solutions.Excellent communication and collaboration skills to work effectively with cross-functional teams, stakeholders, and customers.Ability to thrive in a fast-paced, dynamic environment, managing multiple priorities and adapting to changing circumstances.Strong attention to detail and a commitment to delivering high-quality work.Proactive and self-motivated, with a continuous learning mindset and a drive for staying updated with industry trends and technologies.Strong teamwork and interpersonal skills, with an ability to build relationships and work effectively in a collaborative environment.Ability to thrive under pressure and effectively manage incidents, ensuring timely resolutions and minimizing downtime.This role requires a commitment to work a standard 5-day workweek, with 4 weekdays and at least one weekend day (Sunday or Saturday). The nature of the Site Reliability Engineer (SRE) position necessitates coverage and support across the week, ensuring the reliability and availability of our systems. This schedule allows for effective incident response and continuous monitoring of system health, as well as collaboration with cross-functional teams. We value work-life balance and will strive to provide a predictable and manageable schedule within this framework, while still meeting the needs of our customers and maintaining the stability of our services.
About us We produce globally recognized brands and we grow the best business leaders in the industry. With a portfolio of trusted brands as diverse as ours, it is paramount our leaders are able to lead with courage the vast array of brands, categories and functions. We serve consumers around the world with one of the strongest portfolios of trusted, quality, leadership brands, including Always, Ariel, Gillette, Head & Shoulders, Herbal Essences, Oral-B, Pampers, Pantene, Tampax and more. Our community includes operations in approximately 70 countries worldwide.
We are an equal opportunity employer and value diversity at our company. We do not discriminate against individuals on the basis of race, color, gender, age, national origin, religion, sexual orientation, gender identity or expression, marital status, citizenship, disability, HIV/AIDS status, or any other legally protected factor.
Job Schedule Full time
Job Location Taguig City
#J-18808-Ljbffr
WORK LOCATION: Taguig City SALARY RANGE: ?23,000 - ?25,000 - Must be a Graduate of Bachelor's Degree in Industrial Engineering - With experience, preferably...
Dempsey Resource Management Inc. - National Capital Region
Published a month ago
Responsibilities • Designing and testing electrical systems and components. • Conducting testing to ensure the systems and components meet safety standards. ...
Majestic Packaging Products Corp. - National Capital Region
Published a month ago
We are seeking a highly skilled and experienced Technical Lead to join our PLC (Programmable Logic Controller) business unit. The ideal candidate will have a...
Iprocess Inc. - National Capital Region
Published a month ago
- Install and repairs electrical wirings. (minor works only) - Ensures piping complies with electrical codes - Install and inspect transformers, circuit br...
Kengy Manpower And General Services Co. - National Capital Region
Published a month ago
Built at: 2024-11-22T19:20:51.688Z