SCW Autumn 2021

HIGH PERFORMANCE COMPUTING

everything you need to manage and make changes to your cluster over time. ClusterVisor is highly customisable to ensure you can manage your cluster and organise your data in a way that makes the most sense. eQUEUE from ACT is a

software solution that allows system administrators to create easy-to-use, web-based job submission forms. It is designed to increase cluster utilisation by bringing more users to the cluster who would ordinarily stay away due to the complexity of submitting jobs to a cluster. There is no need to learn Linux or scripting. The end user simply inputs his or her data into predefined fields and the job is now in the cluster’s queue to run. The Scalable Cube from

HPC Scalable is an enterprise- ready, supported distribution of an open-source workload scheduler that supports a wide variety of HPC and analytic applications. Whether deployed on site, on virtual infrastructure, or in the cloud, customers can take advantage of top-quality support services from HPC Scalable, helping ensure the success of managing their HPC workloads. Microsoft Azure high-

performance computing (HPC) is a complete set of computing, networking and storage resources integrated with workload orchestration services for HPC applications. With purpose-built HPC infrastructure, solutions and optimised application services, Azure offers competitive price/ performance compared to on-premises options with additional high-performance computing benefits. Additionally, Azure includes next-generation machine-learning tools to drive smarter simulations and empower intelligent decision making. Adaptive Computing’s Moab

Cluster Suite is a professional cluster workload management solution that integrates the scheduling, managing, monitoring and reporting of cluster workloads. Moab Cluster Suite simplifies and unifies management across one or

multiple hardware, operating system, storage, network, license and resource manager environments. It processes greater workloads in less time to maximise cluster ROI. Its task-oriented management and flexible policy engine ensure service levels are delivered and workload is processed faster. This enables organisations to accomplish more work, resulting in improved cluster ROI. Omnia is a deployment

tool to configure Dell EMC PowerEdge servers running standard RPM-based Linux OS images into clusters capable of supporting HPC, AI and data analytics workloads. It uses Slurm, Kubernetes and other packages to manage jobs and run diverse workloads on the same converged solution. It is a collection of Ansible playbooks, is open source, and is constantly being extended to enable comprehensive workloads. PBS Professional from Altair is a workload manager designed to improve productivity, optimise utilisation and efficiency, and simplify administration for clusters, clouds and supercomputers – from the biggest HPC workloads to millions of small, high-throughput jobs. PBS Professional automates job scheduling, management, monitoring and reporting, and it’s the trusted solution for complex Top500 systems, as well as smaller clusters. Cloud bursting to and

between your favorite providers is easier than ever with an intuitive bursting GUI built right in. PBS Professional delivers a workload simulator that makes it easy to understand job behaviour and the effects of policy changes, plus allocation and budget management capabilities that let you manage budgets across your enterprise. Slurm is an open source,

fault-tolerant and highly scalable cluster management and job scheduling system for large and small Linux clusters. Slurm requires no kernel modifications for its operation and is relatively self-contained. As a cluster workload

www.scientific-computing.com | @scwmagazine OCF FEATURED PRODUCT

The importance of the management stack when building a cluster shouldn’t be underestimated. OCF Steel is a suite of cluster management software that allows organisations to run HPC applications on their clusters, together with the tools that help to manage, maintain, and monitor the HPC environment. With OCF’s modular approach, customers can choose OCF Steel’s range of standard software stack installation with tried and tested technologies, combined with the unique flexibility of choosing bespoke components if required.

With 20 years of experience in HPC cluster management, OCF offers a flexible pathway to its customers utilising open-source technologies, making it easier to upgrade and support the cluster management software stack without any software licensing fees. OCF Steel is deployed using a resilient open- source virtualised management platform, making it simpler to facilitate any necessary upgrades or adaptations to the cluster. Customers using the OCF Steel software stack are supported by OCF’s dedicated HPC support team.

www.ocf.co.uk

manager, Slurm has three key functions. First, it allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing and monitoring work (normally a parallel job) on the set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work. The IBM Spectrum LSF

software is designed to distribute work across existing heterogeneous IT resources to create a shared, scalable and fault-tolerant infrastructure that delivers faster, more reliable workload performance and reduces cost. LSF provides a resource

management framework that takes your job requirements, finds the best resources to run the job, and monitors its progress. Jobs always run according to host load and site policies. Univa Grid Engine is a distributed resource management system for optimising workloads and resources in thousands of data centres, improving performance and boosting productivity and efficiency.

Grid Engine helps

organisations improve ROI and deliver better results faster by optimising throughput and performance of applications, containers and services while maximising shared compute resources across on-premises, hybrid and cloud infrastructures.

Autumn 2021 Scientific Computing World 5

Page 1 | Page 2 | Page 3 | Page 4 | Page 5 | Page 6 | Page 7 | Page 8 | Page 9 | Page 10 | Page 11 | Page 12 | Page 13 | Page 14 | Page 15 | Page 16 | Page 17 | Page 18 | Page 19 | Page 20 | Page 21 | Page 22 | Page 23 | Page 24 | Page 25 | Page 26 | Page 27 | Page 28 | Page 29 | Page 30 | Page 31 | Page 32 | Page 33 | Page 34

orderForm.title