Senior Devops/ Site Reliability Engineer

DevOps Terraform DevOps Google Cloud Platform

Icon company Company

Bestarion

Icon salary Salary
Up to $3,500
Icon Location Location
Ho Chi Minh
Icon Vacancies Vacancies
3 person(s)

Benefit

13th month salary 13th month salary
Other benefits Other benefits
● Fitness & sports activities: football, tennis, table tennis, badminton… ● Commitment to community development: charity every quarter, blood donation, public seminars, career orientation talks… ● Support for personal loans such as home loans, vehicle loans, tuition fees…
Yearly salary review Yearly salary review
● Performance appraisal twice a year
Travel/company trips Travel/company trips
Performance bonus Performance bonus
Extra health insurance Extra health insurance

Job Overview And Responsibility

Working Time: + Monday - Friday, 8:00 AM - 5:30 PM (Flexible depending on each project) + 1-hour daily standup Tuesday-Friday, likely from 9 PM to 10 PM VNT. + Expectation to Travel to USA: The expectation is 1 - 4 trips/year, with each trip lasting 1-2 weeks. About the project: We're looking for a skilled and motivated DevOps/Site Reliability Engineer (SRE) to join our growing team. In this exciting role, you will be responsible for building and maintaining our cloud infrastructure, automating our CI/CD pipelines, and ensuring the reliability, performance, and scalability of our services. The ideal candidate will have a strong background in both software development and systems engineering, with a focus on GCP and automation tools, and a strong sense of ownership. JOB DESCRIPTIONS: - Design and manage infrastructure on Google Cloud Platform (GCP) using Terraform for Infrastructure as Code (IaC). - Build, configure, and maintain CI/CD pipelines using Jenkins and Groovy scripts to automate software delivery from code commit to production deployment. - Manage Jenkins plugins, master/agent nodes, and pipeline libraries to ensure the stability and scalability of our CI/CD platform. - Troubleshoot and debug automation code and interconnected systems to quickly identify and resolve issues, ensuring minimal disruption to services. - Manage core GCP services including Compute Engine, Managed Instance Groups (MIG), Disk Snapshots, Storage, and Artifact Registry to support our application ecosystem. - Containerize applications using Docker to ensure consistency across development, testing, and production environments. - Implement and manage infrastructure as code, monitoring, and logging solutions to ensure high availability and performance of our systems. - Collaborate with development teams to improve the entire software development lifecycle, from code to production. - Develop and maintain workflows in Airflow to orchestrate complex data and application tasks. - Troubleshoot and resolve production incidents, participate in on-call rotation, perform root cause analysis and perform key maintenance activities quarterly. - Effectively communicate complex technical concepts to both technical and non-technical stakeholders through clear written and verbal communication. - Strong expertise in managing and repaving Windows and Linux machines, ensuring security compliance through automated processes. - Skilled in implementing security compliance measures, including repaving infrastructure, key rotation, and periodic updates to meet industry standards. - Strong knowledge of monitoring and alerting systems, including Prometheus, Cloud Monitoring, and PagerDuty, to ensure system reliability and proactive incident response.

Required Skills and Experience

- Bachelor's degree in Computer Science, Information Technology, or a related field. - Have over 5+ years of experience as a DevOps Engineer, SRE, or a similar role. Excellent verbal and written English communication skills are essential. You must be able to clearly document processes, write concise reports, and articulate technical issues to various audiences. - Strong proficiency with Terraform for managing cloud resources. - Hands-on experience with Jenkins, including managing Jenkins masters and agents, and writing Groovy scripts for pipeline automation. - Proven ability to troubleshoot and resolve issues in complex, interconnected systems quickly and efficiently. - Expertise in GCP services, including Compute Engine, MIG, Disk Snapshots, Storage, and Artifact Registry. - Solid experience with Docker and containerization principles. - Familiarity with Airflow for workflow management and orchestration. - Strong understanding of Linux/Unix systems, networking, and security principles. - Excellent problem-solving skills and a collaborative, team-oriented mindset. - Maintenance Work Hours: The resource will need to work USA hours for three days every three months to perform maintenance on key production systems.

Why Candidate should apply this position

- The company will fully cover all travel and relocation expenses related to the U.S. assignment. - Attractive salary and benefits (13th salary, distinguished employee of the quarter and year, seniority award…) - Performance appraisal twice a year - Healthcare and accident insurance - Various training on best practices and soft skills - Team Building activities in every summer, company trip, big annual year-end party every year, etc - Fitness & sports activities: football, tennis, table tennis, badminton… - Commitment to community development: charity every quarter, blood donation, public seminars, career orientation talks… - Support for personal loans such as home loans, vehicle loans, tuition fees…

Similar jobs