Join us, and ensure seamless system performance every day!
Krakow-based opportunity with the possibility to work 80% remotely!
As a Site Reliability Engineer, you will be working for our client, a leading financial institution heavily investing in Agile culture, DevOps processes, and Cloud Technologies. The new development team in Krakow, part of a long-term strategy to support a European platform, offers an exciting opportunity to contribute to the foundational stages of a critical project. This role involves ensuring system reliability, availability, and performance while supporting a dynamic, high-impact environment.
Your main responsibilities:
- Managing application support operations, focusing on resiliency, availability, and monitoring system health and performance
- Coordinating resolution of production incidents, conducting post-mortem/RCA to identify root causes and improve processes
- Investigating, triaging, and resolving production incidents with a focus on technical signals and root cause analysis
- Documenting post-incident recovery steps, contributing to process improvements, identifying deviations, and creating a Knowledge Base
- Actively participating in the service management community, engaging in Incident Management, Problem Management, and Service Delivery
- Defining and delivering tactical and strategic service improvements across the technical and process landscape
- Applying SRE principles to continuously improve platform reliability, capacity, and performance, reducing toil and enhancing observability
- Developing observability tools and techniques for monitoring, alerting, incident detection, response, capacity management, and release safety
You’re ideal for this role if you have:
- 4+ years of experience in developing, supporting distributed systems written in Java
- Experience with Disaster Recovery methods and processes
- A methodical approach to troubleshooting and problem-solving skills
- Experience in application lifecycle management tooling: JIRA/Confluence, Ansible, Vulnerability Remediation, CI/CD automation
- Experience implementing and managing Logging, Monitoring, and Alerting framework for hybrid cloud using tools such as Geneos, Grafana, InfluxDB, Splunk, Loki or any other similar tools
- Understanding of RDBMS Database, Cloud Technology, Unix/Linux, Job scheduling e.g., Control-m or Autosys
- Ability to lead technical conversations with various technical support groups
- Excellent communication skills and experience working in Agile methodology
It is a strong plus if you have:
- Experience with Apache Beam, Apache Flink, GCP, Redis, REST APIs
- Familiarity with Spring Boot and Spring Cloud
- Knowledge of Ansible and Jenkins for automation and deployment
#GETREADY to meet with us!
We would like to meet you. If you are interested please apply and attach your CV in English or Polish, including a statement that you agree to our processing and storing of your personal data. You can always also apply by sending us an email at recruitment@itds.pl.
Internal number #5627
Address:
SKYLIGHT BUILDING | ZŁOTA 59 | 00-120 WARSZAWA
BUSINESS LINK GREEN2DAY BUILDING | SZCZYTNICKA 11| 50-382 WROCŁAW
Contact:
INFO@ITDS.PL
+48 883 373 832