Top

KAUST and the Big Data age

KAUST researchers and their international peers came together to discuss big data's impact and new challenges and opportunities in the field of computing at the recent KAUST Research Workshop on Optimization and Big Data. Photo by Asharaf Kannearil.

​-By David Murphy, KAUST News

The age of "Big Data" is here, and its increasingly ubiquitous data sets bring a plethora of new challenges and opportunities to the field of computing. The recently held KAUST Research Workshop on Optimization and Big Data sought to address these big data challenges by bringing together leading researchers from academia and industry. Speakers discussed their work on novel optimization algorithms and distributed systems capable of working in this era of big data.

Machine learning, compressed sensing, imaging, social network science and computational biology are just a few of the many prominent applications where it is increasingly common to formulate and solve optimization problems with billions of variables. The unprecedented size of these complex data sets necessitates the development of new approaches. These approaches utilize novel algorithmic design involving tools such as distributed and parallel computing, randomization, asynchronicity, decomposition, sketching and streaming.

A global discussion

The on campus research workshop, which was held from February 5 to 7, was organized by Professor Peter Richtárik and Professor Marco Canini from the University's Computer, Electrical and Mathematical Science & Engineering Division. Financial support for the event came from the KAUST Office of Sponsored Research, and the conference was co-sponsored by the Alan Turing Institute with additional support provided by the KAUST Industry Collaboration Program (KICP), Industry Partnerships office.

The conference featured 20 speakers in total from industry, KAUST and other elite international universities. The global universities in attendance included Lehigh UniversityÉcole des Ponts – ParisTechEPFLTélécom ParisTechthe University of OxfordGeorgia Institute of Technology, the University of Cambridge and Politehnica University of Bucharest. Microsoft Research also attended the event.

Tamás Terlaky, endowed chair and professor at Lehigh University, opened conference proceedings with a keynote entitled “60 Years of Interior Point Methods." Photo by Asharaf Kannearil.

 ​
Tamás Terlaky, endowed chair and professor at Lehigh University, opened conference proceedings with a keynote entitled "60 Years of Interior Point Methods: From Periphery to Glory."

During his presentation, Terlaky, founding honorary editor-in-chief of the journal Optimization and Engineering, covered interior point methods (IPMs), a class of algorithms that solve linear and nonlinear convex optimization problems. He also highlighted the scientific and computational advances that make IPMs possible; his research interests, including high-performance optimization algorithms; and optimization modeling and its various applications.

"Thanks to the Interior Point Revolution, we have seen many advances in computing. IPMs forced us to reconsider the theory of sensitivity and parametric analysis. However, there is no algorithm to solve all of our problems, but we are gaining a greater understanding of what works and what doesn't work," he added.

Guillaume Obozinski, a researcher in the computer science department at Ecole des Ponts ParisTech, adresses attendees at the KAUST Research Workshop on Optimization and Big Data. Photo by Asharaf Kannearil.

 ​
Guillaume Obozinski, a researcher in the computer science department at Ecole des Ponts ParisTech, talked about his recent work on conditional random fields (CRF) and an efficient dual augmented Lagrangian formulation to learn CRF models.

"In our research, we come across millions of algorithmic problems, and with Stochastic Dual Coordinate Ascent (SDCA), we can achieve global linear convergence in the primal. We have done a lot of work on approximate inference for CRFs. Our experiments show that the proposed algorithm outperforms state-of-the-art baselines in terms of speed of convergence," Obozinski noted.

'What randomization can do in optimization'

Conference co-organizer and KAUST Associate Professor of Computer Science Peter Richtárik spoke about his team's research at the University and the stochastic reformulations of linear and convex feasibility problems.

"I am trying to get to the basics of what randomization can do in optimization," Richtárik said.

"Our team developed a family of reformulations of an arbitrary consistent linear system into a stochastic problem. We propose and analyze three stochastic algorithms for solving the reformulated problem—basic, parallel and accelerated methods. From our parallel method, you will learn a lot about optimization and machine learning," he added.

Peter Richtárik, conference co-organizer and KAUST associate professor of computer science, spoke about his team's research at KAUST. Photo by Asharaf Kannearil.

 ​
Wolfgang Heidrich, director of the University's Visual Computing Center, discussed optimization and big data in computational imaging during his presentation. He gave an overview of the recent advances and current challenges in rapidly expanding research in the area of computational imaging, with a specific focus on the utilization of optimization and big data methods.

"We can think of the linear inverse problem as an optimization problem. A lot of the techniques that have evolved in optimization have evolved over the past few years, and a lot of people are trying to bridge the gap between Regularized Optimization and Deep Learning," Heidrich said.

"A lot of work in my group has been focused on diffractive achromat. We are creating designs for specific applications," he noted.

Redesigning communication

KAUST Assistant Professor and conference co-organizer Marco Canini's presentation focused on in-network computing. Canini and his team concentrate on redesigning communication in distributed machine learning to take advantage of programmable network data planes.

Through their experiments on machine learning workloads, the group identified that aggregation functions raise opportunities to exploit the limited computation power of networking hardware to lessen network congestion and improve the overall application performance.

"Computer networks are changing—there are now a lot more instances of software-defined networking, network-function virtualization and programmable data planes. Datacenter applications scale data and computation across many servers, and huge volumes of data are exchanged," Canini said.

"Programmable networking hardware creates new opportunities for infusing intelligence into the network. Our ongoing work focuses on building a complete prototype and to test it on real hardware. Networks are getting programmable—not just faster—and communication-bound workloads now demand more of the network. Identifying primitives for in-network computing is key," he explained.

Srikanth Kandula, a principal researcher at Microsoft Research, brought some industrial perspective to optimization and big data during his presentation entitled “Approximate Answers for Complex Parallel Queries.” Photo by Asharaf Kannearil.

 ​
Srikanth Kandula, a principal researcher at Microsoft Research, brought some industrial perspective to optimization and big data during his presentation entitled "Approximate Answers for Complex Parallel Queries."

In his talk, Kandula discussed data-analytics platforms and the need to understand why an ideal approximate analytics system has to meet at least four goals: coverage of a large class of queries; improved latency and/or throughput; minimal overhead; and accuracy guarantees.

"Even small changes in clusters are meaningful, as data and computing need to grow; approximations may reduce the increase in cost and latency. In computing, we don't have one model that serves everyone," he said.

Diverse topics highlighted

The conference's other speakers highlighted a range of topics, including analytics and machine learning algorithms; approximate analytics; convex optimization; distributed graph engines; machine learning; modern stochastic methods; stochastic gradient descent (SGD) methods; and more.

The event also featured spotlight talks and poster session spread over the conference's duration. The conference organizers felt that the KAUST Research Workshop on Optimization and Big Data was a great success.

"Optimization is one of the key technologies behind many machine learning and statistical and data science models. New exciting applications from these and other domains pose great challenges to the prevailing optmization algorithms, and new algorithms are needed to handle the big data involved in these applications and models," Richtarik noted. "Optimization and Big Data, a workshop series I started in 2012, was the first event focused specifically on these challenges. This year's conference, the fourth in the series, attracted unprecedented interest from Saudi industry, and the participants I talked to were enthusiastic about the quality and importance of the workshop."


Related stories: