Please read announcement

HPC - Broken Omnipath switch →

Batch System

Tenemos dos switchs Omnipath y uno se ha roto en los nodos de hpc. Estos son los que realizan el intercambio de mensajes para jobs paralelizados, cola compute de Slurm. Estamos a la espera que nos manden recambio

-– — —

We have two Omnipath switches, and one has broken in the HPC nodes. These are the ones that exchange messages for parallelized jobs, Slurm compute queue. We are waiting for a replacement to be sent to us.


Altamira supercomputer   (?) Altamira supercomputer related systems.
Batch System   (?) Slurm batch system for Altamira Maintenance
Login nodes   (?) Altamira login nodes (login1, login2) Operational
Cloud Infrastructure   (?) OpenStack Cloud infrastructure.
Grid and HTC   (?) General purpose batch system and high throughput compute system.
Web and miscelaneous services   (?) Web services, wiki pages and other services.
AAI   (?) Authentication, Authorization and Identity systems.
Networking   (?) Internal and external networking.
Storage systems   (?) Distributed storage systems.

Incident history


August 1, 2022 at 7:51 AM UTC

OpenStack update

Resolved after 734h 8m of downtime
July 26, 2022 at 7:50 AM UTC

Miscelaneus services not connect

Resolved after 145h 6m of downtime
June 3, 2022 at 7:40 AM UTC

Confluence instances unavailable

Resolved after 3h 0m of downtime
May 18, 2022 at 7:00 AM UTC

Nextcloud Upgrade

Resolved after 2h 49m of downtime
May 6, 2022 at 12:01 PM UTC

Upgrade slurm to latest version to fix security issues

Resolved after 66h 59m of downtime
March 17, 2022 at 1:24 PM UTC

[OpenStack] Problem launching new instances

Resolved after 36m of downtime
January 31, 2022 at 8:00 AM UTC

[Cloud] Failure launching new instances

Resolved after 338h 59m of downtime
January 24, 2022 at 12:30 PM UTC

[OpenStack] Upgrade version of services

Resolved after 120h 0m of downtime
January 12, 2022 at 8:30 AM UTC

IFCA Network IP Movement

Resolved after 4h 30m of downtime
December 14, 2021 at 8:00 AM UTC

Data transfer system upgrade

Resolved after 5h 0m of downtime

←   Previous     5 / 8     Next   →