Data Engineer
As a Lead/Senior Data Engineer at GFT, you will be responsible for managing, designing, and enhancing data systems and workflows that drive key business decisions. The role is focused 75% on data engineering, involving the construction and optimization of data pipelines and architectures, and 25% on supporting data science initiatives through collaboration with data science teams for machine learning workflows and advanced analytics. You will leverage technologies like Python, Airflow, Kubernetes, and AWS to deliver high-quality data solutions. Key Activities - Architect, develop, and maintain scalable data infrastructure, including data lakes, pipelines, and metadata repositories, ensuring the timely and accurate delivery of data to stakeholders - Work closely with data scientists to build and support data models, integrate data sources, and support machine learning workflows and experimentation environments - Develop and optimize large-scale, batch, and real-time data processing systems to enhance operational efficiency and meet business objectives - Leverage Python, Apache Airflow, and AWS services to automate data workflows and processes, ensuring efficient scheduling and monitoring - Utilize AWS services such as S3, Glue, EC2, and Lambda to manage data storage and compute resources, ensuring high performance, scalability, and cost-efficiency - Implement robust testing and validation procedures to ensure the reliability, accuracy, and security of data processing workflows - Stay informed of industry best practices and emerging technologies in both data engineering and data science to propose optimizations and innovative solutions
- Core Expertise: Proficiency in Python for data processing and scripting (pandas, pyspark), workflow automation (Apache Airflow), and experience with AWS services (Glue, S3, EC2, Lambda) - Containerization & Orchestration: Experience working with Kubernetes and Docker for managing containerized environments in the cloud - Data Engineering Tools: Hands-on experience with columnar and big data databases (Athena, Redshift, Vertica, Hive/Hadoop), along with version control systems like Git - Cloud Services: Strong familiarity with AWS services for cloud-based data processing and management - CI/CD Pipeline: Experience with CI/CD tools such as Jenkins, CircleCI, or AWS CodePipeline for continuous integration and deployment - Data Engineering Focus (75%): Expertise in building and managing robust data architectures and pipelines for large-scale data operations - Data Science Support (25%): Ability to support data science workflows, including collaboration on data preparation, feature engineering, and enabling experimentation environments
- Competitive salary - 13th-month salary guarantee - Performance bonus - Professional English course for employees - Premium health insurance - Extensive annual leave
- Advanced Data Science Tools: Experience with AWS Sagemaker or Databricks for enabling machine learning environments - Big Data & Analytics: Familiarity with both RDBMS (MySQL, PostgreSQL) and NoSQL (DynamoDB, Redis) databases - BI Tools: Experience with enterprise BI tools like Tableau, Looker, or PowerBI - Messaging & Event Streaming: Familiarity with distributed messaging systems like Kafka or RabbitMQ for event streaming - Monitoring & Logging: Experience with monitoring and log management tools such as the ELK stack or Datadog - Data Privacy and Security: Knowledge of best practices for ensuring data privacy and security, particularly in large data infrastructures