Platform Site Reliability Engineer


JOB DESCRIPTION

Platform Site Reliability EngineerDescriptionDo you
want to build a career that is truly worthwhile? Working at the World Bank
Group provides a unique opportunity for you to help our clients solve their
greatest development challenges. The World Bank Group is one of the largest
sources of funding and knowledge for developing countries; a unique global
partnership of five institutions dedicated to ending extreme poverty,
increasing shared prosperity and promoting sustainable development. With 189
member countries and more than 120 offices worldwide, we work with public and
private sector partners, investing in groundbreaking projects and using data,
research, and technology to develop solutions to the most urgent global
challenges. For more information, visit www.worldbank.org
ITS
Vice Presidency Context:

Information
and Technology Solutions (ITS) enables the WBG to achieve its mission of ending
extreme poverty and promote shared prosperity in a sustainable way by
delivering transformative information and technologies to its staff working in
over 150 locations.
Our
vision is to transform how the Bank Group accomplishes its mission through
information and technology. In this fast-paced, ever-changing world, the
formulation and implementation of the ITS strategy is an ongoing, iterative
process of learning and adaptation developed through extensive consultations
with business partners throughout the World Bank Group.
ITS
shapes its strategy in response to changing business priorities and leverages
new technologies to achieve three high-level business outcomes: business
enablement, by providing Bank Group units with innovative digital tools and
technologies to transform how they deliver value for their clients; empowerment
& effectiveness, by ensuring that all Bank Group staff are connected, able
to find information, and productive to accelerate the delivery of development
solutions globally; and resilience, by equipping the Bank Group to provide
risk-based cybersecurity and robust data protection for a global network and a
growing cloud platform.
Implementation
of the strategy is guided by three core principles. The first is to deliver
solutions for business partners that are customer-centric, innovative, and
transformative. The second is to provide the Bank Group with value for money
with selective and standard technologies. The third principle is to excel at
the basics by providing a high performing, robust, and resilient IT environment
for the organization.

The
Technology Platforms Team (ITSPL) is anchored in the Chief Technology Officer
(ITSTO) division in ITS. ITS Technology Office (ITSTO) drives
technology-enabled innovation and delivers the digital backbone for WBG”s
mission. It develops future-ready technology strategy, modernizes
infrastructure, manages innovation, and fosters agility. The unit collaborates
across the organization to leverage technology as a force multiplier,
accelerating development impact and digital transformation globally.
ITSPL
delivers secure, cloud-first IT platforms with automation, self-service, IAM ,
& Platform Engineering (IaC). It manages databases, integrations, &
cloud ops to ensure reliable, scalable, & cost-effective alignment with
enterprise standards. The primary programs that the ITSTO unit is responsible
for is providing a wide range of technical infrastructure services to meet the
institution”s computing needs, from mid-range servers, large scale servers, and
the respective system, network, and supporting software on those platforms. It
provides engineering, integration, and system administration services for
Server Administration, Server Security, Backup/Restore, Storage, Virtual
infrastructure (on premise and in cloud) and Data Center Management.
The
role requires a hands-on approach hands-on position in a very multicultural
environment which supports diversity, continuous learning, enhancing skillsets
and collaboration. The candidate must demonstrate excellent communication
skills as the position requires interaction with other teams. The candidate
must possess a strong sense of curiosity, adaptability, and the drive to learn
and innovate.
We
provide a meaningful, open, and collaborative environment. We have many
interesting problems to solve, providing you an opportunity to develop your
skills while contributing to the mission of the bank. We value teamwork,
openness, curiosity, and persistence.
About
the Position:

The
rapid shift to hybrid cloud environments necessitates a Site Reliability
Engineer (SRE) with expertise in both legacy middleware systems (e.g., JBoss,
Apache, WebSphere, IIS) and modern DevOps/DevSecOps pipelines (Terraform,
Kubernetes, GitOps). This position focus on ensuring operational continuity for
enterprise platforms by stabilizing traditional systems while driving
modernization through Infrastructure-as-Code (IaC) practices and automation. As
a bridge between legacy and cloud-native technologies, this role will implement
robust SRE practices such as observability, error budgets, and incident
response to maintain a highly reliable environment with minimal downtime.
Additionally, the SRE will lead knowledge transfer efforts to prevent single
points of failure and optimize platform performance through auto-remediation
playbooks and chaos engineering. This position will implement SRE practices to
achieve resilience, automate standards and contribute to the success of
future-proof Platform as a Product strategy.
Competencies
Required:

Technical
Proficiency & Cognitive skills:

– Experience as a Site Reliability Engineer with hands-on knowledge of Site Reliability Engineering (SRE) practices & Principles, including implementing and managing SLOs, error budgets, observability, incident response, and automation in high-availability environments.
– Proven experience with legacy middleware (JBoss, Apache, WebSphere, IIS) and modern stacks (.NET, Java, NodeJS, Angular).
– Strong Knowledge and working Experience in Multi-Cloud Platforms (AWS,Azure, GCP)
– Strong database skills (PostgreSQL, MySQL, other RDBMS/NoSQL).
– Proficiency in DevOps/DevSecOps tools (Terraform, Kubernetes, GitOps, Chef, CI/CD, GitHub/GitLab/Azure Repos)
– Experience with containerization (Docker, Kubernetes/AKS) and web service management
– Strong Scripting/automation skills (Python, PowerShell, Bash etc.,)
– Experience with monitoring/observability tools (Splunk, Prometheus, Grafana etc.,)
– Experience in setting up and managing PAAS and COTS solution
– Experience working in Agile environments, with a strong understanding of Agile principles and practices. Exposure to the Scaled Agile Framework (SAFe) is highly desirable.

Client Understanding and Advising: Advocates for client needs and perspectives.
Learning Orientation: Keeps up with new SRE, cloud, middleware, and automation trends.
Analytical Thinking: Strong diagnostic and troubleshooting skills.
Foundation Architecture Knowledge: Supports standards for hybrid cloud and middleware.
Strategic Technology Planning: Contributes to technological roadmaps, especially for SRE and cloud (Platform as a product)
Technology Knowledge: Deep understanding of hybrid cloud, containerization, and middleware.
Modernize and Innovate: Develops innovative solutions in automation, observability, and cloud migration.
Deliver Results for Clients: Ensures high reliability and performance.
Collaboration: Works effectively across teams and locations.
Knowledge Sharing: Actively participates in knowledge transfer and documentation.
Decision Making: Makes informed decisions, especially in incident response.
Communication: Excellent written and verbal English; able to explain complex technical concepts.

Roles
& Responsibilities:

Infrastructure & Operations Support: Manage and support both legacy and cloud-native middleware across on-premises and hybrid cloud environments.
SRE Implementation: Apply SRE principles (observability, error budgets, incident response, chaos engineering) to ensure reliability and performance. Promote SRE practice and culture across the team and apply the SRE principle across all deliverables.
Automation & DevOps/DevSecOps: Build and maintain CI/CD pipelines, automate manual tasks & toil, develop self-service tools, and implement Infrastructure-as-Code (IaC) using tools like Terraform and GitOps.
Cloud Adoption & Modernization: Guide migration to cloud/container platforms and adoption of cloud-native services.
Knowledge Management & Collaboration: Document and communicate changes, lead knowledge transfer, and promote SRE culture across teams and stakeholders.
Compliance & Business Continuity: Ensure adherence to SLAs, compliance, and contribute to business continuity and disaster recovery planning.
Performance Optimization: Develop auto-remediation playbooks, conduct chaos engineering, and optimize platform performance.
Stakeholder Engagement: Support organizational IT strategy and deliverables, especially during team transitions or critical staffing changes.Selection Criteria* Bachelor’s or Master’s degree with at least 4-5 years of relevant experience.
* Experience in adopting Site Reliability Engineering practices to work. Having an SRE certification is a mandatory requirement
* Experience working in Agile environments and a SAFE Agile certification is mandatory
* Strong experience configuring and supporting .NET,Java, NodeJS, Angular Applications
* Good understanding of the multiple middleware technologies and custom COTS product hosting’s
* Experience with Azure DevOps (as both developer and administrator).
* Solid knowledge of modern DevOps practices, including CI/CD, git, Docker, and Kubernetes.
* Familiarity with Artifactory solutions (e.g., JFrog).
* Experience with Infrastructure as Code tools (Terraform, Chef, etc.).
* Knowledge of Azure AD authentication and authorization.
* Proficient with monitoring tools and Splunk.
* Demonstrated experience working in Agile environments
* Hands-on experience with AWS and Azure cloud services. Having cloud certification in Azure/AWS is an added advantage.WBG Culture Attributes:1. Sense
of Urgency – Anticipating and quickly reacting to the needs of internal
and external stakeholders.
2. Thoughtful
Risk Taking – Taking informed and thoughtful risks and making courageous
decisions to push boundaries for greater impact.
3. Empowerment
and Accountability – Engaging with others in an empowered and accountable
manner for impactful results.World Bank Group Core CompetenciesThe World Bank Group offers comprehensive benefits, including a retirement plan; medical, life and disability insurance; and paid leave, including parental leave, as well as reasonable accommodations for individuals with disabilities.We are proud to be an equal opportunity and inclusive employer with a dedicated and committed workforce, and do not discriminate based on gender, gender identity, religion, race, ethnicity, sexual orientation, or disability.Learn more about working at the World Bank and IFC, including our values and inspiring stories.

Level of Education: Bachelor Degree

Work Hours: 8

Experience in Months: No requirements

Organization: World Bank Group


Apply for job

To help us track our recruitment effort, please indicate in your cover/motivation letter where (jobsnonprofit.net) you saw this internship posting.