Machine Learning Data Engineer

Hybrid
English
Banking
Regular
Agile/Scrum

Technologies

Krakow

21 000 - 25 200 zł B2B

Join us, and enhance data solutions with the latest technologies and tools!

Krakow-based opportunity with the possibility to work 80% remote.

As a Machine Learning Data Engineer, you will be working for our client, a leading global financial institution, known for building innovative digital solutions and transforming the banking industry. You will play a key role in supporting their data and digital transformation initiatives by developing and optimizing data engineering processes. Working with cutting-edge technologies, you’ll contribute to the development of robust and scalable data solutions for critical financial services, handling everything from data pipelines to cloud integrations. You’ll be part of a dynamic team working on both greenfield projects and established banking applications.

Your main responsibilities:

Developing and optimizing data engineering processes
Building robust, fault-tolerant data solutions for both cloud and on-premise environments
Automating data pipelines to ensure seamless data flow from ingestion to serving
Creating well-tested, clean code in line with modern software engineering principles
Working with cloud technologies (AWS, Azure, GCP) to support large-scale data operations
Supporting data transformation and migration efforts from on-premise to cloud ecosystems
Designing and implementing scalable data models and schemas
Maintaining and enhancing big data technologies such as Hadoop, HDFS, Spark, and Cloudera
Collaborating with cross-functional teams to solve complex technical problems
Contributing to the development of CI/CD pipelines and version control practices

You’re ideal for this role if you have:

Strong experience in the Data Engineering Lifecycle, especially in building data pipelines
Proficiency in Python, Pyspark, and the Python ecosystem
Experience with cloud platforms such as AWS, Azure, or GCP (preferably GCP)
Expertise in Hadoop on-premise distributions, particularly Cloudera
Experience with big data tools such as Spark, HDFS, HIVE, and Databricks
Knowledge of data lake formation, data warehousing, and schema design
Strong understanding of SQL and NoSQL databases
Ability to work with data formats like Parquet, ORC, and Avro
Familiarity with CI/CD pipelines and version control tools like Git
Strong communication skills to collaborate with diverse teams

It is a strong plus if you have:

Experience with ML models and MLOps
Exposure to building real-time event streaming pipelines with tools like Kafka or Apache Flink
Familiarity with containerization and DevOps practices
Experience in data modeling and handling semi-structured data
Knowledge of modern ETL and ELT processes
Understanding of the trade-offs between different data storage technologies

#GETREADY to meet with us!

We would like to meet you. If you are interested please apply and attach your CV in English or Polish, including a statement that you agree to our processing and storing of your personal data. You can always also apply by sending us an email at recruitment@itds.pl.

Internal number #6506

Benefits

Access to +100 projects

Access to Healthcare

Access to Multisport

Access to Pluralsight

B2B or Permanent Contract

Flexible hours and remote work