Building an HPC ecosystem

Jysoo Lee, director of the KAUST Supercomputing Core Lab, speaks during the HPC Saudi event in February of 2017.

The University held the seventh High Performance Computing Saudi Arabia event—the premier regional event in the field—from March 13 to 15. The three-day conference aimed to create a space where researchers and industry representatives could meet, share ideas and experiences and discuss cooperation and collaboration.

The 2017 event focused on coordinated efforts for the advancement of an HPC ecosystem in the Kingdom. The first two days of the event included keynote speeches, invited talks, lightning talks, poster presentations, a vendor exhibition and an open discussion aimed at drafting an action plan for setting up an HPC ecosystem in Saudi Arabia.

Each plenary session commenced with a keynote talk, with speakers including Steven E. Koonin, director, NYU Center for Urban Science and Progress (CUSP)Thomas Schulthess, director, Swiss National Supercomputing Centre (CSCS) at Lugano; and Dr. Robert G. Voigt, a senior member of the technical staff at the Krell Institute.

Collaboration is key

In his welcome address, Dr. Jysoo Lee, director of the KAUST Supercomputing Core Laboratory, praised the people behind the computing research—the people who help create the ecosystems, machinery and technology.

“The research we have and the people we have really makes KAUST special, and the Shaheen system is what we can be proud of,” Lee said. “What we are trying to do is to help and serve both KAUST and the Kingdom. Since you are here in KAUST, I want you to look at the opportunities and what can be done together."

‘There is a science to be done here’

In his opening keynote entitled "Better Cities through Data Acquisition and Analysis," Koonin highlighted his work and the work of CUSP in the field of urban science and systems. He described how the center uses informatics to study the operations of urban systems, noting how HPC technology enriches the bustling cityscape that is New York City and how it can contribute to broader global issues.

Dr. Robert G. Voigt, a senior member of the technical staff at the Krell Institute, speaks during the seventh HPC Saudi conference on the KAUST campus.

“We need technologies and methodologies to analyze data about cities—there is a science to be done here. Cities have been one of the most complex things that humans have created. Cities are what matter, and by the end of the century, about three-fourths of humanity will be in cities.” Koonin said.

“If you want to change the energy system, technology is great, but the social factor is what you have to work on in the long run. It's not just about energy, it's about everything else that happens in a city. You need to understand infrastructure, environment and people to instrument a city,” he continued.

“Cities are built for people by people. You can't understand a city unless you understand its people. You can try understand one dimension of a city or you can focus on just one city and try discover its various dimensions. One of the biggest challenges is fusing different data sources into usable data. If you can take all of this data and analyze it through data-driven models, you can learn many things. We need to 'own' the data by having an intimate familiarly with it,” Koonin added.

How to make HPC mainstream

Merle Giles, formerly of The National Center for Supercomputing Applications (NCSA) and now CEO of Moonshot Research LCC, described how needs differ in research computing. Giles discussed how he harnessed the various methodologies from his previous workplace in his new company.

“For 20 years or more, enterprise has treated HPC as a hobby—what we do in our new company is similar to what we did in NCSA, which is serve others and help others do what they know how to do better,” he said.

“A 'valley of death' exists in both the academic and industry sectors and nobody funds the middle, which is innovation. We are left to our own practices to move through this middle ground," he added. "Some differences between research computing and the commercial side are also the differences between macro and micro economics. There is a big difference between high-level macroeconomics and the company level microeconomics. KAUST is an example of a clustering effect of a macroeconomic policy. The microeconomic effect is down to the level of the firm. I don't know any boardroom that talks about HPC—HPC has been in the R&D basement forever."

On tackling the question of how to take HPC mainstream, Giles said, “Reducing time-to-impact is essential, and HPC plays a big part in this. The key to success is being obsessed with the customer. The customer wins in this game.”

“We have to know what goes on in HPC and we have to know about the companies. The HPC community is where we can solve things, and it may be the only way to peek under the hood and know how it works,” he concluded.

‘Taking charge of change’

Raed Al-Rabeh, manager of EXPEC Network Operations at Saudi Aramco, spoke about how there is a complex plethora of new technologies with new disciplines and modes of operations available to all developers, industry and computing researchers. He discussed how by virtue of this, a whole new plane of possibility in HPC is now attainable that was unthinkable a few years ago. Al-Rabeh also discussed the need to adjust to these changes in the HPC landscape and to adapt to avoid the risk of being left behind.

“It's not about change—it's about us taking charge of change and making good use of it," he said. “In HPC, you have to understand the architecture and go to very low levels of understanding to get the most out of the system. You have to be a scientist with a strong background in computer engineering or an electrical engineer to get the most out of it. The HPC challenges are not that different from the IT challenges, but they go to a different level."

"We need to spot opportunities to make good use of our systems—gone are the days when research was funded just for the sake of research. Research is now funded if it drives new opportunities that are close to home—the industry and the society and where we live, not some theoretical question out there in space. Innovation must happen as a regular process, and agility is critical, “ he added.

“Our customers aren't interested in becoming computer scientists or experts so they can use products. They expect the products to work. Technology requires resources and the knowledge is not very widespread. We need to spread the knowledge and bring it up-to-speed, and we need to embrace the change and be aware of it to give us the advantage,” he noted.

“We need alignment between business and research, with research doing what business needs. This kind of alignment fuels the research, and then products of the research are deployable and usable. Especially in the Kingdom, very few companies realize the applications of HPC,” Al-Rabeh concluded.

Following on from Al-Rabeh, Sreekanth Pannala from the Saudi Basic Industries Corporation (SABIC) highlighted the role HPC plays in SABIC and how it aids the company's goals and productivity rates for the Kingdom.

“We look towards our capabilities from a computing perspective—we look at novel solutions from an HPC perspective to make things faster,” Pannala said.

‘We must move forward’

In his keynote talk, Schulthess reflected on the goals and baseline for exascale computing and how a capable exascale computing system requires an entire computational ecosystem behind it.

“It's amazing to see so many people engaged with HPC in the Middle East. Globally, we have to figure out what we want to accomplish in particular areas. Today, the fastest supercomputers sustain 20 to 100 petaflops on HPL, and investment in software allows mathematical improvements and changes in architecture,” Schulthess said. “I don't know what that architecture will be in five to 10 years, but we must move forward with it."

In his presentation, Muhammad El-Rabaa, an associate professor at the Department of Computer Engineering at King Fahd University of Petroleum & Minerals (KFUPM), outlined how new applications have propelled HPC to the forefront of computing.

“New applications have catapulted HPC from narrow scientific applications domain to the mainstream—applications like the cloud, pocket processing, machine learning, searches, analytics, business logic, etc. Computing platforms have continuously evolved with new platforms continuing to emerge,” he said.

He also highlighted the increasing role of field-programmable gate arrays (FPGAs), an integrated circuit that can be configured after manufacturing. “Instead of building one chip, you can now have a few chips, as it is more economical. Several hi-tech executives say that FGPAs will constitute 20 percent of data centers by 2020,” he added.

A fast-moving world

Jeff Brooks, director of supercomputing product management at Cray, discussed the upcoming technology shifts in the marketplace and the implications for systems design in the exascale era.

“Systems with millions of cores will become commonplace. We are trying to invest more in data work, make it work better and scale it out. We want to couple analytics with simulation,” Brooks said. "Another thing that is coming is small, fast memories—systems with millions of cores—will become commonplace. This is a fast-moving world, but by working together you can solve problems you couldn't do before."

Delivering scientific solutions

Jeff Nichols, acting director of the National Center for Computational Sciences and the National Leadership Computing Facility at Oak Ridge National Lab (ORNL), discussed the several scientific areas that require an integrated approach and the effort in creating an exascale ecosystem that enables successful delivery of important scientific solutions across a broad range of disciplines.

“We need to think about how we're being connected to the data that is generated from the sensors all around us. Our Compute and Data Environment for Science (CADES) provides a shared infrastructure to help solve big science problems. We try to connect our data to our in-silico information from the top down.”

“We have to think about the type of data we are actually deploying on these systems. This is a very complicated workflow scenario we have to come up with. We have four pillars which are: application development, software technology, hardware technology, and exascale systems. The Oak Ridge leadership computing facility is on a well-defined path to exascale. We're interested in our ecosystem delivering important and critical science for the nation and the world,” he said.

Patricia Damkroger, vice president of the Data Center Group at Intel, spoke on the convergence of simulation and data.


“At Intel, we look at the whole ecosystem. There will be new systems and new workloads and we will need to figure out what is the underlying architecture and hardware that makes those systems work. It’s a question of how can we create a common architecture for data and simulation. The world is changing, and without analytics and AI workloads, we will drown in data,” she said.

Educating computational scientists

Voigt opened the final plenary session of the event with his keynote presentation entitled "The Education of Computational Scientists." His talk centered on providing a historical perspective of the challenges of educating future computational scientists based on his career experiences.

“One might argue that scientific computing began in the 1950s, and in 1982, computational science was recognized. Computational science takes on a discipline of its own, and there is an opportunity to learn about aspects of computational science through exploring multidisciplinary searches,” Voigt said.

“Computational science involves the integration of knowledge and methodologies. There is now an explosion of data and new areas of science and engineering. There are also rapidly changing computer architectures,” he added.

A leading role in HPC

The third day of the conference offered eight tutorials on emerging technical topics of interest, such as advanced performance tuning and optimization offered by Allinea, Intel and Cray; the best practices of HPC procurement by RedOak; and SLURM workload management by SchedMD. The most popular were “HPC 101,” which offered a step-by-step guide on how to use Shaheen II, and NVIDIA’s tutorial on the popular topic of deep learning.

A total of 333 people attended the High Performance Computing Saudi Arabia event, making it one of the biggest conferences held at KAUST.

“The conference was a great chance to observe significant HPC interests in the Kingdom. There were lots of discussions on ways to enhance the HPC ecosystem in the Kingdom, and it was clear that KAUST can play a leading role in several of them,” noted Lee.

- By David Murphy, KAUST News