Site Reliability Engineer

  • Hybrid
  • English
  • Banking
  • Expert/Senior
  • Agile/Scrum
Add to Job Cart RECOMMEND A CANDIDATE

Join us, and ensure seamless system performance every day!

Krakow-based opportunity with the possibility to work 80% remotely!

As a Site Reliability Engineer, you will be working for our client, a leading financial institution heavily investing in Agile culture, DevOps processes, and Cloud Technologies. The new development team in Krakow, part of a long-term strategy to support a European platform, offers an exciting opportunity to contribute to the foundational stages of a critical project. This role involves ensuring system reliability, availability, and performance while supporting a dynamic, high-impact environment.

Your main responsibilities:

  • Managing application support operations, focusing on resiliency, availability, and monitoring system health and performance
  • Coordinating resolution of production incidents, conducting post-mortem/RCA to identify root causes and improve processes
  • Investigating, triaging, and resolving production incidents with a focus on technical signals and root cause analysis
  • Documenting post-incident recovery steps, contributing to process improvements, identifying deviations, and creating a Knowledge Base
  • Actively participating in the service management community, engaging in Incident Management, Problem Management, and Service Delivery
  • Defining and delivering tactical and strategic service improvements across the technical and process landscape
  • Applying SRE principles to continuously improve platform reliability, capacity, and performance, reducing toil and enhancing observability
  • Developing observability tools and techniques for monitoring, alerting, incident detection, response, capacity management, and release safety

You’re ideal for this role if you have:

  • 4+ years of experience in developing, supporting distributed systems written in Java
  • Experience with Disaster Recovery methods and processes
  • A methodical approach to troubleshooting and problem-solving skills
  • Experience in application lifecycle management tooling: JIRA/Confluence, Ansible, Vulnerability Remediation, CI/CD automation
  • Experience implementing and managing Logging, Monitoring, and Alerting framework for hybrid cloud using tools such as Geneos, Grafana, InfluxDB, Splunk, Loki or any other similar tools
  • Understanding of RDBMS Database, Cloud Technology, Unix/Linux, Job scheduling e.g., Control-m or Autosys
  • Ability to lead technical conversations with various technical support groups
  • Excellent communication skills and experience working in Agile methodology

It is a strong plus if you have:

  • Experience with Apache Beam, Apache Flink, GCP, Redis, REST APIs
  • Familiarity with Spring Boot and Spring Cloud
  • Knowledge of Ansible and Jenkins for automation and deployment

#GETREADY  to meet with us!

We would like to meet you. If you are interested please apply and attach your CV in English or Polish, including a statement that you agree to our processing and storing of your personal data. You can always also apply by sending us an email at recruitment@itds.pl.

Internal number #5627

Benefits

Access to +100 projects
Access to Healthcare
fintech-delivery
Access to Multisport
Training platforms
Access to Pluralsight
Make your CV shine
B2B or Permanent Contract
Flexible & remote work
Flexible hours and remote work

Apply for this job now

    I agree to receive marketing information from ITDS Polska to the e-mail address provided
    We need your consent for recruitment processes for selected jobs. Please include a consent for data processing in your CV or send a statement of consent to privacy@itds.pl. You may also grant consent to future recruitment processes for similar jobs.