Tuesday, June 28, 9:00-10:15 EDT
Keynote Talk (Chair: Lawrence Rauchwerger, University of Illinois Urbana-Champaign)
The Rise of Matrix Processing
Dr. José Moreira
Distinguished Researcher, IBM Research
Tuesday, June 28, 10:30-11:15 EDT
Session 1: Tools and Modeling (I) (Chair: Dimitrios Nikolopoulos, Virginia Tech)
Low Overhead and Context Sensitive Profiling of GPU-accelerated Applications
Keren Zhou (Rice University)
Jonathon Anderson (Rice University)
Xiaozhu Meng (Rice University)
John Mellor-Crummey (Rice University)
Calipers: A Criticality-aware Framework for Modeling Processor Performance
Hossein Golestani (University of Michigan)
Rathijit Sen (Microsoft)
Vinson Young (Microsoft)
Gagan Gupta (Microsoft)
Performance-Detective: Automatic Deduction of Cheap and Accurate Performance Models
Larissa Schmid (Karlsruhe Institute of Technology)
Marcin Copik (ETH Zurich)
Alexandru Calotoiu (ETH Zurich)
Dominik Werle (Karlsruhe Institute of Technology)
Andreas Reiter (University of Applied Sciences Karlsruhe)
Michael Selzer (Karlsruhe Institute of Technology)
Anne Koziolek (Karlsruhe Institute of Technology)
Torsten Hoefler (ETH Zurich)
Tuesday, June 28, 11:30-12:15 EDT
Session 2: New Hardware Technologies (Chair: Cristina Silvano, Politecnico di Milano)
ASAP: Automatic Synthesis of Area-Efficient and Precision-Aware CGRAs
Cheng Tan (Microsoft)
Thierry Tambe (Harvard University)
Jeff (Jun) Zhang (Harvard University)
Bo Fang (Pacific Northwest National Laboratory)
Tong Geng (Pacific Northwest National Laboratory)
Gu-Yeon Wei (Harvard University)
David Brooks (Harvard University)
Antonino Tumeo (Pacific Northwest National Laboratory)
Ganesh Gopalakrishnan (University of Utah)
Ang Li (Pacific Northwest National Laboratory)
Efficiently Emulating High-Bitwidth Computation with Low-Bitwidth Hardware
Zixuan Ma (Tsinghua University)
Haojie Wang (Tsinghua University)
Guanyu Feng (Tsinghua University)
Chen Zhang (Tsinghua University)
Lei Xie (Tsinghua University)
Jiao He (Tsinghua University)
Shengqi Chen (Tsinghua University)
Jidong Zhai (Tsingua University)
SnuQS: Scaling Quantum Circuit Simulation using Storage Devices
Daeyoung Park (Seoul National University)
Heehoon Kim (Seoul National University)
Jinpyo Kim (Seoul National University)
Taehyun Kim (Seoul National University)
Jaejin Lee (Seoul National University)
LITE: A Low-Cost Practical Inter-Operable GPU TEE
Ardhi Wiratama Baskara Yudha (University of Central Florida)
Jake Meyer (University of Central Florida)
Shougang Yuan (University of Central Florida)
Huiyang Zhou (North Carolina State University)
Yan Solihin (University of Central Florida)
Tuesday, June 28, 13:00-13:45 EDT
Session 3: Graph Processing (Chair: Osman Unsal, Barcelona Supercomputing Center)
Software-Defined Floating-Point Number Formats and Their Application to Graph Processing
Hans Vandierendonck (Queen's University Belfast)
MASTIFF: Structure-Aware Minimum Spanning Tree/Forest
Mohsen Koohi Esfahani (Queen's University Belfast)
Peter Kilpatrick (Queen's University Belfast)
Hans Vandierendonck (Queen's University Belfast)
Efficient Exact K-Nearest Neighbor Graph Construction for Billion-Scale Datasets using GPUs with Tensor Cores
Zhuoran Ji (The University of Hong Kong)
Cho-Li Wang (The University of Hong Kong)
Bring Orders into Uncertainty: Enabling Efficient Uncertain Graph Processing via Novel Path Sampling on Multi-Accelerator Systems
Heng Zhang (University of Sydney)
Lingda Li (Brookhaven National Laboratory)
Hang Liu (Stevens Institute of Technology)
Donglin Zhuang (University of Sydney)
Rui Liu (University of Chicago)
Chengying Huan (Tsinghua University)
Shuang Song (Meta)
Dingwen Tao (Washington State University)
Yongchao Liu (Ant Financial)
Charles He (Ant Financial)
Yanjun Wu (Chinese Academy of Sciences)
Shuaiwen Leon Song (University of Sydney)
Tuesday, June 28, 14:00-14:45 EDT
Session 4: I/O and Communication (Chair: Xiaoning Ding, New Jersey Institute of Technology)
CEAZ: Accelerating Parallel I/O via Hardware-Algorithm Co-Designed Adaptive Lossy Compression
Chengming Zhang (Washington State University)
Sian Jin (Washington State University)
Tong Geng (Pacific Northwest National Laboratory)
Jiannan Tian (Washington State University)
Ang Li (Pacific Northwest National Laboratory)
Dingwen Tao (Washington State University)
Towards Low-Latency I/O Services for Mixed Workloads Using Ultra-Low Latency SSDs
Mingzhe Liu (Huazhong University of Science and Technology)
Haikun Liu (Huazhong University of Science and Technology)
Chencheng Ye (Huazhong University of Science and Technology)
Xiaofei Liao (Huazhong University of Science and Technology)
Hai Jin (Huazhong University of Science and Technology)
Yu Zhang (Huazhong University of Science and Technology)
Ran Zheng (Huazhong University of Science and Technology)
Liting Hu (Virginia Tech)
Optimized MPI Collective Algorithms for Dragonfly Topology
Guangnan Feng (Sun Yat-sen University)
Dezun Dong (National University of Defense Technology)
Yutong Lu (Sun Yat-sen University)
Wednesday, June 29, 9:00-10:15 EDT
Keynote Talk (Chair: Kirk Cameron, Virginia Tech)
The Computing and Information Science and Engineering Landscape: A Look Forward
Dr. Margaret Martonosi
Hugh Trumbull Adams '35 Professor of Computer Science, Princeton University
Currently serving as Assistant Director for Computer and Information Science and Engineering (CISE) at NSF.
Wednesday, June 29, 10:30-11:15 EDT
Session 5: Compilers (Chair: Chen Ding, University of Rochester)
SparseLNR: Accelerating Sparse Tensor Computations Using Loop Nest Restructuring (Best Paper Award!)
Adhitha Dias (Purdue University)
Kirshanthan Sundararajah (Purdue University)
Charitha Saumya (Purdue University)
Milind Kulkarni (Purdue University)
VICO : Demand-driven Verification for Improving Compiler Optimizations
Sharjeel Khan (Georgia Institute of Technology)
Bodhisatwa Chatterjee (Georgia Institute of Technology)
Santosh Pande (Georgia Institute of Technology)
Lifting C Semantics for Dataflow Optimization
Alexandru Calotoiu (ETH Zurich)
Tal Ben-Nun (ETH Zurich)
Grzegorz Kwansiewski (ETH Zurich)
Johannes de Fine Licht (ETH Zurich)
Timon Schneider (ETH Zurich)
Philipp Schaad (ETH Zurich)
Torsten Hoefler (ETH Zurich)
Wednesday, June 29, 11:30-12:15 EDT
Session 6: Algorithms on Accelerators (Char: Ana Lucia Varbanescu, University of Amsterdam)
SnuHPL: High Performance LINPACK for Heterogeneous GPUs
Jinpyo Kim (Seoul National University)
Hyungdal Kwon (Seoul National University)
Jintaek Kang (Samsung Advanced Institute of Technology)
Jihwan Park (Samsung Advanced Institute of Technology)
Seungwook Lee (Samsung Advanced Institute of Technology)
Jaejin Lee (Seoul National University)
High Throughput Multidimensional Tridiagonal System Solvers on FPGAs
Kamalavasan Kamalakkannan (University of Warwick)
Gihan R. Mudalige (University of Warwick)
Istvan Z. Reguly (Pazmany Peter Catholic University)
Suhaib A. Fahmy (King Abdullah University of Science and Technology)
AnySeq/GPU: A Novel Approach for Faster Sequence Alignment on GPUs
André Müller (Johannes Gutenberg University)
Bertil Schmidt (Johannes Gutenberg University)
Richard Membrath (Technische Hochschule Ingolstadt)
Roland Leißa (University of Mannheim)
Sebastian Hack (Saarland University)
Parallel K-Clique Counting on GPUs
Mohammad Almasri (University of Illinois Urbana-Champaign)
Izzat El Hajj (American University of Beirut)
Rakesh Nagi (University of Illinois Urbana-Champaign)
Jinjun Xiong (University of Illinois Urbana-Champaign)
Wen-mei Hwu (NVIDIA and University of Illinois Urbana-Champaign)
Wednesday, Jne 29, 12:15-12:45 EDT
Special Session
Townhall Meeting with Dr. Margaret Martonosi, Assistant Director for Computer and Information Science and Engineering (CISE), US National Science Foundation
Wednesday, June 29, 13:00-13:45 EDT
Session 7: Memory Systems and Memory Management (Chair: Dionisios Pnevmatikatos, National Technical University of Athens)
Cloak: Tolerating Non-Volatile Cache Read Latency
Apostolos Kokolis (University of Illinois Urbana-Champaign)
Namrata Mantri (NVIDIA)
Shrikanth Ganapathy (Rivos Inc.)
Josep Torrellas (University of Illinois Urbana-Champaign)
John Kalamatianos (AMD Inc.)
Fast-Track Cache: A Huge Racetrack Memory L1 Data Cache
Hugo Tárrega (Universitat Politècnica de València)
Alejandro Valero (Universidad de Zaragoza)
Vicente Lorente (Universitat Politècnica de València)
Salvador Petit (Universitat Politècnica de València)
Julio Sahuquillo (Universitat Politècnica de València)
Dynamic Memory Management in Massively Parallel Systems: A Case on GPUs
Minh Pham (University of South Florida)
Hao Li (University of South Florida)
Yongke Yuan (Beijing University of Technology)
Chengcheng Mou (University of South Florida)
Kandethody Ramachandran (University of South Florida)
Zichen Xu (Jiaxing Neofelis Scientific)
Yicheng Tu (University of South Florida)
MegTaiChi: Dynamic Tensor-based Memory Management Optimization for DNN Training
Zhongzhe Hu (Chinese Academy of Sciences)
Junmin Xiao (Chinese Academy of Sciences)
Zheye Deng (Megvii Technology)
Mingyi Li (Chinese Academy of Sciences)
Kewei Zhang (Chinese Academy of Sciences)
Xiaoyang Zhang (Chinese Academy of Sciences)
Ke Meng (Alibaba Group)
Ninghui Sun (Chinese Academy of Sciences)
Guangmin Tan (Chinese Academy of Sciences)
Wednesday, June 29, 14:00-14:45 EDT
Session 8: Dense and Sparse Linear Algebra (Chair: Rong Ge, Clemson University)
Efficient, Out-of-Memory Sparse MTTKRP on Massively Parallel Architectures
Andy Nguyen (University of Oregon)
Ahmed E. Helal (Intel Labs)
Fabio Checconi (Intel Labs)
Jan Laukemann (University of Erlangen-Nürnberg)
Jesmin Hahan Tithi (Intel Labs)
Yongseok Soh (University of Oregon)
Teresa Ranadive (Laboratory of Physical Sciences)
Fabrizio Petrini (Intel Labs)
Jee W. Choi (University of Oregon)
Dense Dynamic Blocks: Optimizing SpMM for Processors with Vector and Matrix Units Using Machine Learning Techniques
Serif Yesil (University of Illinois Urbana-Champaign)
José E. Moreira (IBM Research)
Josep Torrellas (University of Illinois Urbana-Champaign)
Toward Accelerated Stencil Computation by Adapting Tensor Core Unit on GPU
Xiaoyan Liu (Beihang University)
Yi Liu (Beihang University)
Hailong Yang (Beihang University)
Jianjin Liao (Beihang University)
Mingzhen Li (Beihang University)
Zhongzhi Luan (Beihang University)
Depei Qian (Beihang University)
Thursday, June 30 9:00-10:15 EDT
Keynote Talk (Chair: Dimitrios Nikolopoulos, Virginia Tech)
Large-Scale Visual Analysis in the Age of Data
Dr. Chris R. Johnson
Distinguished Professor of Computer Science
Founding Director, Scientific Computing and Imaging Institute
University of Utah.
Thursday, June 30, 10:30-11:15 EDT
Session 9: Applications (Chair: Dionisios Pnevmatikatos, National Technical University of Athens)
KrakenOnMem: A Memristor-Augmented HW/SW Framework for Taxonomic Profiling
Taha Shahroodi (TU Delft)
Mahdi Zahedi (TU Delft)
Abhairaj Singh (TU Delft)
Stephan Wong (TU Delft)
Said Hamdioui (TU Delft)
GAPS: GPU-Acceleration of PDE Solvers for Wave Simulation
Bagus Hanindhito (UT Austin)
Dimitrios Gourounas (UT Austin)
Arash Fathi (ExxonMobil Technology and Engineering)
Dimitar Trenev (ExxonMobil Technology and Engineering)
Andreas Gerstlauer (UT Austin)
Lizy K. John (UT Austin)
Seamless Optimization of the GEMM Kernel for Task-based Programming Models
Arthur F. Lorenzon (Federal University of Pampa)
Sandro M. V. N. Marques (Federal University of Pampa)
Antoni Navarro (Barcelona Supercomputing Center)
Vicenç Beltran (Barcelona Supercomputing Center)
Thursday, June 30, 11:30-12:15 EDT
Session 10: Tools and Modeling (II) (Chair: Dingwen Tao, Washington State University)
Beyond Time Complexity: Data Movement Complexity Analysis for Matrix Multiplication
Wesley Smith (University of Rochester)
Aidan Goldfarb (University of Rochester)
Chen Ding (University of Rochester)
uiCA: Accurate Throughput Prediction of Basic Blocks on Recent Intel Microarchitectures
Andreas Abel (Saarland University)
Jan Reineke (Saarland University)
Preparing for Performance Analysis at Exascale
Jonathon Anderson (Rice University)
Yumeng Liu (Rice University)
John Mellor-Crummey (Rice University)
Clairvoyant: A Log-Based Transformer-Decoder for Failure Prediction in Large-Scale Systems
Khalid Ayedh Alharthi (University of Warwick)
Arshad Jhumka (University of Warwick)
Sheng Di (Argonne National Laboratory)
Franck Cappello (Argonne National Laboratory)
Thursday, June 30, 13:00-13:45 EDT
Session 11: Machine Learning (Chair: Dong Li, University of California Merced)
A Data-Centric Optimization Framework for Machine Learning
Oliver Rausch (ETH Zurich)
Tal Ben-Nun (ETH Zurich)
Nikoli Dryden (ETH Zurich)
Andrei Ivanov (ETH Zurich)
Shigang Li (ETH Zurich)
Torsten Hoefler (ETH Zurich)
PAME: Precision-Aware Multi-Exit DNN Serving for Reducing Latencies of Batched Inferences
Shulai Zhang (Shanghai Jiao Tong University)
Weihao Cui (Shanghai Jiao Tong University)
Quan Chen (Shanghai Jiao Tong University)
Zhengnian Zhang (Shanghai Jiao Tong University)
Yue Guan (Shanghai Jiao Tong University)
Jingwen Leng (Shanghai Jiao Tong University)
Chao Li (Shanghai Jiao Tong University)
Minyi Guo (Shanghai Jiao Tong University)
Handling Heavy-tailed Input of Transformer Inference on GPUs
Jiangsu Du (Sun Yat-sen University)
Jiazhi Jiang (Sun Yat-sen University)
Yang You (National University of Singapore)
Dan Huang (Sun Yat-sen University)
Yutong Lu (Sun Yat-sen University)
Rethinking Graph Data Placement for Graph Neural Network Training on Multiple GPUs
Shihui Song (University of Iowa)
Peng Jiang (University of Iowa)