Kafka and Data Lake Engineer

🌍 Remote, USA 🎯 Full-time 🕐 Posted Recently

Job Description

ResponsibilitiesDesign data pipelines: Build robust, scalable, and secure data pipelines to ingest, process, and move data from various sources into the data lake using Kafka.Administer Kafka clusters: Deploy, configure, and maintain Kafka clusters and related ecosystem tools, such as Kafka Connect and Schema Registry, ensuring high availability and performance.Manage the data lake: Oversee the architecture and governance of the data lake, including managing data storage (e.g., in AWS S3 or ADLS), security, and metadata.Develop data processing applications: Create producers and consumers to interact with Kafka topics using programming languages like Python, Java, or Scala.Perform stream processing: Use tools like Kafka Streams, Apache Flink, or ksqlDB to perform real-time data transformations and analytics.Ensure data quality and security: Implement data quality checks, manage data lineage, and enforce security controls such as encryption, access controls (ACLs), and compliance (e.g., GDPR).Monitor and troubleshoot: Set up monitoring and alerting for Kafka and data lake infrastructure and respond to incidents to ensure operational reliability.Collaborate with teams: Work closely with data scientists, analysts, and other engineering teams to understand data requirements and deliver reliable data solutions.Essential skills and qualificationsExperience: Proven experience designing and managing data platforms with Apache Kafka and big data technologies.Programming: Strong proficiency in languages like Python, Java, or Scala.Big data technologies: Expertise in big data processing frameworks, such as Apache Spark and Apache Flink.Cloud platforms: Hands-on experience with cloud environments (AWS, Azure, or GCP) and relevant services like S3, Glue, or Azure Data Lake Storage.Data lake architecture: A solid understanding of data lake design principles, including storage formats (e.g., Delta Lake, Apache Iceberg), data modeling, and governance.Databases: Experience with various database systems, including both SQL and NoSQL.Infrastructure management: Familiarity with infrastructure-as-code tools like Terraform or Ansible and containerization with Docker and Kubernetes.Professionals in this field can advance from entry-level data engineering positions to senior roles, and then to a Big Data Architect or Solutions Architect, where they oversee large-scale data infrastructure Apply tot his job