Experiencing disruptions

HPC and GRID - Change switch and ARCCE upgrade →

Batch System Computing Elements

Durante la mañana del jueves 18/6 se realizarán unas tareas de mantenimiento

  • Se cambiará un switch estropeado
  • Se actualizará arcce

Un saludo
-– — —

Some maintenance work will be carried out on the morning of Thursday 18 June

  • A faulty switch will be replaced
  • Arcce will be updated

Best regards


HPC - Broken Omnipath switch →

Batch System

Tenemos dos switchs Omnipath y uno se ha roto en los nodos de hpc. Estos son los que realizan el intercambio de mensajes para jobs paralelizados, cola compute de Slurm. Estamos a la espera que nos manden recambio

-– — —

We have two Omnipath switches, and one has broken in the HPC nodes. These are the ones that exchange messages for parallelized jobs, Slurm compute queue. We are waiting for a replacement to be sent to us.


Altamira supercomputer   (?) Altamira supercomputer related systems.
Batch System   (?) Slurm batch system for Altamira Disrupted
Login nodes   (?) Altamira login nodes (login1, login2) Operational
Cloud Infrastructure   (?) OpenStack Cloud infrastructure.
Grid and HTC   (?) General purpose batch system and high throughput compute system.
Web and miscelaneous services   (?) Web services, wiki pages and other services.
AAI   (?) Authentication, Authorization and Identity systems.
Networking   (?) Internal and external networking.
Storage systems   (?) Distributed storage systems.

Incident history


HPC and GRID - Change switch and ARCCE upgrade

▲ This issue is not resolved yet
April 28, 2026 at 1:19 PM UTC

[HPC users] Change in the Slurm authentication mechanism

Resolved after 50h 41m of downtime
March 25, 2026 at 9:01 AM UTC

WN de cloud network problem

Resolved after 453h 5m of downtime
March 24, 2026 at 8:30 AM UTC

network cloud problem

Resolved in under a minute
March 12, 2026 at 8:20 AM UTC

HPC - Broken Omnipath switch

◆ This issue is not resolved yet
March 2, 2026 at 7:00 AM UTC

Corte de corriente en el CPD / Power outage in the datacenter (02/03/2025 8:00)

Resolved after 4h 0m of downtime
January 28, 2026 at 7:59 AM UTC

Databases error

Resolved in under a minute
December 2, 2025 at 8:00 AM UTC

Actualizacion de auth01

Resolved after 3h 0m of downtime
October 27, 2025 at 9:00 AM UTC

Corte de corriente en el CPD / Power outage in the datacenter (27-10-2025 10:00)

Resolved in under a minute
August 28, 2025 at 7:45 PM UTC

[IFCA Advanced Computing Service] Reemplazo de switch principal / Main switch replacement

Resolved after 12h 19m of downtime

←   Previous     1 / 8     Next   →