World
Country
Language

poland Poland

portugal Portugal

netherlands Netherlands

THIS JOB OFFER IS NO LONGER AVAILABLE
Click on "Job offers" to see all other positions available on our website.

Site Reliability Engineer

  • Hybrid
  • English
  • Banking
  • Expert/Senior
  • Agile/Scrum

Join us, and ensure seamless system performance every day!

Krakow-based opportunity with the possibility to work 80% remotely!

As a Site Reliability Engineer, you will be working for our client, a leading financial institution heavily investing in Agile culture, DevOps processes, and Cloud Technologies. The new development team in Krakow, part of a long-term strategy to support a European platform, offers an exciting opportunity to contribute to the foundational stages of a critical project. This role involves ensuring system reliability, availability, and performance while supporting a dynamic, high-impact environment.

Your main responsibilities:

  • Managing application support operations, focusing on resiliency, availability, and monitoring system health and performance
  • Coordinating resolution of production incidents, conducting post-mortem/RCA to identify root causes and improve processes
  • Investigating, triaging, and resolving production incidents with a focus on technical signals and root cause analysis
  • Documenting post-incident recovery steps, contributing to process improvements, identifying deviations, and creating a Knowledge Base
  • Actively participating in the service management community, engaging in Incident Management, Problem Management, and Service Delivery
  • Defining and delivering tactical and strategic service improvements across the technical and process landscape
  • Applying SRE principles to continuously improve platform reliability, capacity, and performance, reducing toil and enhancing observability
  • Developing observability tools and techniques for monitoring, alerting, incident detection, response, capacity management, and release safety

You’re ideal for this role if you have:

  • 4+ years of experience in developing, supporting distributed systems written in Java
  • Experience with Disaster Recovery methods and processes
  • A methodical approach to troubleshooting and problem-solving skills
  • Experience in application lifecycle management tooling: JIRA/Confluence, Ansible, Vulnerability Remediation, CI/CD automation
  • Experience implementing and managing Logging, Monitoring, and Alerting framework for hybrid cloud using tools such as Geneos, Grafana, InfluxDB, Splunk, Loki or any other similar tools
  • Understanding of RDBMS Database, Cloud Technology, Unix/Linux, Job scheduling e.g., Control-m or Autosys
  • Ability to lead technical conversations with various technical support groups
  • Excellent communication skills and experience working in Agile methodology

It is a strong plus if you have:

  • Experience with Apache Beam, Apache Flink, GCP, Redis, REST APIs
  • Familiarity with Spring Boot and Spring Cloud
  • Knowledge of Ansible and Jenkins for automation and deployment

#GETREADY  to meet with us!

We would like to meet you. If you are interested please apply and attach your CV in English or Polish, including a statement that you agree to our processing and storing of your personal data. You can always also apply by sending us an email at recruitment@itds.pl.

Internal number #5627

Benefits

Access to +100 projects
Access to Healthcare
fintech-delivery
Access to Multisport
Training platforms
Access to Pluralsight
Make your CV shine
B2B or Permanent Contract
Flexible & remote work
Flexible hours and remote work