Resolved -
HPC is now recommissioned and available for you to use.
The quicktest partition now has: amd01n02,amd01n02,amd01n03,amd01n04
Intel machines (itl02n[01-04]) have been retired.
The rest of the partitions have been restored to their original state before we moved to reduced capacity, i.e.,
gpu partition has gpu[01-03],
longrun has bigtmp[01-02],
bigmem has high[01-04], and
parallel has amd's and spj in it.
Mar 26, 00:21 UTC
Update -
DS team has completed the HPC system relocation to the off-campus site.
Testing will be conducted tomorrow in the morning and an update will be provided by midday.
Mar 25, 04:03 UTC
Update -
We are continuing to monitor for any further issues.
Mar 25, 01:41 UTC
Update -
We are continuing to monitor for any further issues.
Mar 25, 01:39 UTC
Update -
The HPC relocation is tomorrow (20 Mar). Power will be shut down for the move by the DS team. Testing will occur on Tuesday (25 Mar). Please contact support team with any questions.
Mar 18, 22:23 UTC
Update -
Rāpoi HPC system will be unavailable from Thursday, March 20th, until the week of March 24th due to relocation.
We expect to begin testing on Tuesday, March 25th, and based on the results, we will determine when the system can be reopened for general use.
Please make sure to update your new job submission with relevant time limits such that they end before March 20.
We will provide updates as necessary.
Feb 28, 01:51 UTC
Update -
Status - relocation:
The team at Digital Solutions is making progress on the migration. The logistics of quote, approvals, and insurance have been completed. Currently, they are waiting for the the off-campus facility provider to get the racks ready with power and network cabling.
Status - down nodes:
Due to a possibility of summer humidity levels exceeding 80% RH, we are unable to restart the parallel nodes at this time.
Planned:
Users will receive at least 10 days' notice before the cluster is shut down for relocation. The migration will result in several days of complete system outage as DS team de-rack, transport, and re-rack all equipment.
Potential delays:
A change freeze is planned for the first week of Trimester 1 (starting 24th Feb), which is typically a high-demand period. Depending on operational workload and incident response, this may further impact migration timelines.
We appreciate your patience and will provide updates as soon as firm migration dates are confirmed. Please reach out with any concerns.
Feb 19, 21:58 UTC
Monitoring -
The compute infrastructure is running with limited capacity. Nodes will be moved to a new off-campus facility, but the schedule hasn’t been announced yet.
Feb 10, 21:34 UTC