Join us, and enhance data solutions with the latest technologies and tools!
Krakow-based opportunity with the possibility to work 80% remote.
As a Machine Learning Data Engineer, you will be working for our client, a leading global financial institution, known for building innovative digital solutions and transforming the banking industry. You will play a key role in supporting their data and digital transformation initiatives by developing and optimizing data engineering processes. Working with cutting-edge technologies, you’ll contribute to the development of robust and scalable data solutions for critical financial services, handling everything from data pipelines to cloud integrations. You’ll be part of a dynamic team working on both greenfield projects and established banking applications.
Your main responsibilities:
- Developing and optimizing data engineering processes
- Building robust, fault-tolerant data solutions for both cloud and on-premise environments
- Automating data pipelines to ensure seamless data flow from ingestion to serving
- Creating well-tested, clean code in line with modern software engineering principles
- Working with cloud technologies (AWS, Azure, GCP) to support large-scale data operations
- Supporting data transformation and migration efforts from on-premise to cloud ecosystems
- Designing and implementing scalable data models and schemas
- Maintaining and enhancing big data technologies such as Hadoop, HDFS, Spark, and Cloudera
- Collaborating with cross-functional teams to solve complex technical problems
- Contributing to the development of CI/CD pipelines and version control practices
You’re ideal for this role if you have:
- Strong experience in the Data Engineering Lifecycle, especially in building data pipelines
- Proficiency in Python, Pyspark, and the Python ecosystem
- Experience with cloud platforms such as AWS, Azure, or GCP (preferably GCP)
- Expertise in Hadoop on-premise distributions, particularly Cloudera
- Experience with big data tools such as Spark, HDFS, HIVE, and Databricks
- Knowledge of data lake formation, data warehousing, and schema design
- Strong understanding of SQL and NoSQL databases
- Ability to work with data formats like Parquet, ORC, and Avro
- Familiarity with CI/CD pipelines and version control tools like Git
- Strong communication skills to collaborate with diverse teams
It is a strong plus if you have:
- Experience with ML models and MLOps
- Exposure to building real-time event streaming pipelines with tools like Kafka or Apache Flink
- Familiarity with containerization and DevOps practices
- Experience in data modeling and handling semi-structured data
- Knowledge of modern ETL and ELT processes
- Understanding of the trade-offs between different data storage technologies
#GETREADY to meet with us!
We would like to meet you. If you are interested please apply and attach your CV in English or Polish, including a statement that you agree to our processing and storing of your personal data. You can always also apply by sending us an email at recruitment@itds.pl.
Internal number #6506
Adres:
SKYLIGHT BUILDING | ZŁOTA 59 | 00-120 WARSZAWA
BUSINESS LINK GREEN2DAY BUILDING | SZCZYTNICKA 11| 50-382 WROCŁAW
Kontakt:
INFO@ITDS.PL
+48 883 373 832