Job Description:
As a Site Reliability Engineer (SRE) at Solutions Exchange Inc, you will play a critical role in
ensuring the reliability, scalability, and performance of our production systems. You will
collaborate closely with software engineering and operations teams to build and maintain
tools for automation, monitoring, and operations. Your expertise will be crucial in designing
resilient and scalable architectures, optimizing application performance, and resolving
complex technical issues to deliver a seamless user experience.
Responsibilities:
? Design, build, and maintain tools and frameworks for deployment, monitoring, and
operations.
Implement best practices in infrastructure security, scalability, and reliability.Collaborate with cross-functional teams to define and achieve Service Level
Objectives (SLOs) and Service Level Indicators (SLIs).
? Perform system and application troubleshooting to resolve issues and ensure
optimal performance.
? Design and implement automation strategies to streamline operations and reduce
manual intervention.
? Participate in on-call rotation and respond to incidents to minimize downtime and
impact on users.
? Conduct post-mortem analyses of incidents and implement measures to prevent
recurrence.
? Continuously evaluate and improve our systems and processes to enhance reliability
and efficiency.
JOB REQUIREMENTS: Requirements:
? Bachelor's degree in computer science, Engineering, or a related technical field,
or equivalent practical experience.
? Proven experience in a Site Reliability Engineer or similar role, with a focus on
designing and implementing scalable systems.
? Strong proficiency in programming languages, scripting and automation (Java,
ReactJS, etc.).
SITE RELIABILITY ENGINEER
? Experience with cloud platforms such as AWS, Azure, or GCP, and container
orchestration tools like Kubernetes.
? Deep understanding of networking, system administration, Windows, and
Linux/Unix-based environments.
? Excellent problem-solving skills and the ability to troubleshoot complex issues in
distributed systems.
? Strong communication skills and the ability to work effectively in a collaborative
team environment and to stakeholders
Preferred Qualifications: Master's degree in computer science, Engineering, or a related technical field.Certification in cloud platforms or DevOps methodologies (e.g., AWS Certified
DevOps Engineer, Google Professional Cloud DevOps Engineer).
3. Experience with CI/CD pipelines and configuration management tools (e.g., Ansible).
4. Knowledge of monitoring and logging tools such as Prometheus, Grafana, ELK stack,
etc.
5. Experience with Agile/Scrum methodologies and practices.