Program

ICS YouTube Channel

Recorded presentations and live session videos.

Workshops and Tutorials

Workshops and tutorials will take place on June 27, 2022.

Monday, June 27, 11:00-17:00 EDT Tutorial

Programming extremely heterogeneous system with the Minos Computing Library (MCL)
Roberto Gioiosa (PPNL)
Ryan Friese (PNNL)

Scheduling and Format: Lectures + hands-on sessions
Proposed duration: half-day
Tutorial website

Monday, June 27, 14:00-17:30 EDT Tutorial

Programming of Parallel Distributed Systems Made Easy with the SHAD C++ Library
Vito Giovanni Castellana, Pacific Northwest National Laboratory
Marco Minutoli, Pacific Northwest National Laboratory

Duration: Half Day
Tutorial website

Monday, June 27, 14:00-15:30 EDT Tutorial

MAERI-FPGA: Enabling HW Design Space Exploration on Real FPGA Hardware Platform
Tushar Krishna, Georgia Tech
Jianming Tong, Georgia Tech

Proposed Duration: Half-Day
Tutorial website

Monday, June 27, 9:00-15:30 EDT Workshop

3rd Workshop on Heterogeneous Memory Systems (HMEM)
Joao Pedro Barreto, Universidade de Lisboa and INESC-ID
Harald Servat, Intel
Antonio J. Peña, Barcelona Supercomputing Center (BSC)

Duration: Half Day
Workshop website

Main Conference and Keynotes

The main conference will be held from June 28 - 30, 2022.

Tuesday, June 28, 9:00-10:15 EDT
Keynote Talk (Chair: Lawrence Rauchwerger, University of Illinois Urbana-Champaign)

The Rise of Matrix Processing
Dr. José Moreira
Distinguished Researcher, IBM Research

Tuesday, June 28, 10:30-11:15 EDT
Session 1: Tools and Modeling (I) (Chair: Dimitrios Nikolopoulos, Virginia Tech)

Low Overhead and Context Sensitive Profiling of GPU-accelerated Applications
Keren Zhou (Rice University)
Jonathon Anderson (Rice University)
Xiaozhu Meng (Rice University)
John Mellor-Crummey (Rice University)

Calipers: A Criticality-aware Framework for Modeling Processor Performance
Hossein Golestani (University of Michigan)
Rathijit Sen (Microsoft)
Vinson Young (Microsoft)
Gagan Gupta (Microsoft)

Performance-Detective: Automatic Deduction of Cheap and Accurate Performance Models
Larissa Schmid (Karlsruhe Institute of Technology)
Marcin Copik (ETH Zurich)
Alexandru Calotoiu (ETH Zurich)
Dominik Werle (Karlsruhe Institute of Technology)
Andreas Reiter (University of Applied Sciences Karlsruhe)
Michael Selzer (Karlsruhe Institute of Technology)
Anne Koziolek (Karlsruhe Institute of Technology)
Torsten Hoefler (ETH Zurich)

Tuesday, June 28, 11:30-12:15 EDT
Session 2: New Hardware Technologies (Chair: Cristina Silvano, Politecnico di Milano)

ASAP: Automatic Synthesis of Area-Efficient and Precision-Aware CGRAs
Cheng Tan (Microsoft)
Thierry Tambe (Harvard University)
Jeff (Jun) Zhang (Harvard University)
Bo Fang (Pacific Northwest National Laboratory)
Tong Geng (Pacific Northwest National Laboratory)
Gu-Yeon Wei (Harvard University)
David Brooks (Harvard University)
Antonino Tumeo (Pacific Northwest National Laboratory)
Ganesh Gopalakrishnan (University of Utah)
Ang Li (Pacific Northwest National Laboratory)

Efficiently Emulating High-Bitwidth Computation with Low-Bitwidth Hardware
Zixuan Ma (Tsinghua University)
Haojie Wang (Tsinghua University)
Guanyu Feng (Tsinghua University)
Chen Zhang (Tsinghua University)
Lei Xie (Tsinghua University)
Jiao He (Tsinghua University)
Shengqi Chen (Tsinghua University)
Jidong Zhai (Tsingua University)

SnuQS: Scaling Quantum Circuit Simulation using Storage Devices
Daeyoung Park (Seoul National University)
Heehoon Kim (Seoul National University)
Jinpyo Kim (Seoul National University)
Taehyun Kim (Seoul National University)
Jaejin Lee (Seoul National University)

LITE: A Low-Cost Practical Inter-Operable GPU TEE
Ardhi Wiratama Baskara Yudha (University of Central Florida)
Jake Meyer (University of Central Florida)
Shougang Yuan (University of Central Florida)
Huiyang Zhou (North Carolina State University)
Yan Solihin (University of Central Florida)

Tuesday, June 28, 13:00-13:45 EDT
Session 3: Graph Processing (Chair: Osman Unsal, Barcelona Supercomputing Center)

Software-Defined Floating-Point Number Formats and Their Application to Graph Processing
Hans Vandierendonck (Queen's University Belfast)

MASTIFF: Structure-Aware Minimum Spanning Tree/Forest
Mohsen Koohi Esfahani (Queen's University Belfast)
Peter Kilpatrick (Queen's University Belfast)
Hans Vandierendonck (Queen's University Belfast)

Efficient Exact K-Nearest Neighbor Graph Construction for Billion-Scale Datasets using GPUs with Tensor Cores
Zhuoran Ji (The University of Hong Kong)
Cho-Li Wang (The University of Hong Kong)

Bring Orders into Uncertainty: Enabling Efficient Uncertain Graph Processing via Novel Path Sampling on Multi-Accelerator Systems
Heng Zhang (University of Sydney)
Lingda Li (Brookhaven National Laboratory)
Hang Liu (Stevens Institute of Technology)
Donglin Zhuang (University of Sydney)
Rui Liu (University of Chicago)
Chengying Huan (Tsinghua University)
Shuang Song (Meta)
Dingwen Tao (Washington State University)
Yongchao Liu (Ant Financial)
Charles He (Ant Financial)
Yanjun Wu (Chinese Academy of Sciences)
Shuaiwen Leon Song (University of Sydney)

Tuesday, June 28, 14:00-14:45 EDT
Session 4: I/O and Communication (Chair: Xiaoning Ding, New Jersey Institute of Technology)

CEAZ: Accelerating Parallel I/O via Hardware-Algorithm Co-Designed Adaptive Lossy Compression
Chengming Zhang (Washington State University)
Sian Jin (Washington State University)
Tong Geng (Pacific Northwest National Laboratory)
Jiannan Tian (Washington State University)
Ang Li (Pacific Northwest National Laboratory)
Dingwen Tao (Washington State University)

Towards Low-Latency I/O Services for Mixed Workloads Using Ultra-Low Latency SSDs
Mingzhe Liu (Huazhong University of Science and Technology)
Haikun Liu (Huazhong University of Science and Technology)
Chencheng Ye (Huazhong University of Science and Technology)
Xiaofei Liao (Huazhong University of Science and Technology)
Hai Jin (Huazhong University of Science and Technology)
Yu Zhang (Huazhong University of Science and Technology)
Ran Zheng (Huazhong University of Science and Technology)
Liting Hu (Virginia Tech)

Optimized MPI Collective Algorithms for Dragonfly Topology
Guangnan Feng (Sun Yat-sen University)
Dezun Dong (National University of Defense Technology)
Yutong Lu (Sun Yat-sen University)

Wednesday, June 29, 9:00-10:15 EDT
Keynote Talk (Chair: Kirk Cameron, Virginia Tech)

The Computing and Information Science and Engineering Landscape: A Look Forward
Dr. Margaret Martonosi

Hugh Trumbull Adams '35 Professor of Computer Science, Princeton University
Currently serving as Assistant Director for Computer and Information Science and Engineering (CISE) at NSF.

Wednesday, June 29, 10:30-11:15 EDT
Session 5: Compilers (Chair: Chen Ding, University of Rochester)

SparseLNR: Accelerating Sparse Tensor Computations Using Loop Nest Restructuring (Best Paper Award!)
Adhitha Dias (Purdue University)
Kirshanthan Sundararajah (Purdue University)
Charitha Saumya (Purdue University)
Milind Kulkarni (Purdue University)

VICO : Demand-driven Verification for Improving Compiler Optimizations
Sharjeel Khan (Georgia Institute of Technology)
Bodhisatwa Chatterjee (Georgia Institute of Technology)
Santosh Pande (Georgia Institute of Technology)

Lifting C Semantics for Dataflow Optimization
Alexandru Calotoiu (ETH Zurich)
Tal Ben-Nun (ETH Zurich)
Grzegorz Kwansiewski (ETH Zurich)
Johannes de Fine Licht (ETH Zurich)
Timon Schneider (ETH Zurich)
Philipp Schaad (ETH Zurich)
Torsten Hoefler (ETH Zurich)

Wednesday, June 29, 11:30-12:15 EDT
Session 6: Algorithms on Accelerators (Char: Ana Lucia Varbanescu, University of Amsterdam)

SnuHPL: High Performance LINPACK for Heterogeneous GPUs
Jinpyo Kim (Seoul National University)
Hyungdal Kwon (Seoul National University)
Jintaek Kang (Samsung Advanced Institute of Technology)
Jihwan Park (Samsung Advanced Institute of Technology)
Seungwook Lee (Samsung Advanced Institute of Technology)
Jaejin Lee (Seoul National University)

High Throughput Multidimensional Tridiagonal System Solvers on FPGAs
Kamalavasan Kamalakkannan (University of Warwick)
Gihan R. Mudalige (University of Warwick)
Istvan Z. Reguly (Pazmany Peter Catholic University)
Suhaib A. Fahmy (King Abdullah University of Science and Technology)

AnySeq/GPU: A Novel Approach for Faster Sequence Alignment on GPUs
André Müller (Johannes Gutenberg University)
Bertil Schmidt (Johannes Gutenberg University)
Richard Membrath (Technische Hochschule Ingolstadt)
Roland Leißa (University of Mannheim)
Sebastian Hack (Saarland University)

Parallel K-Clique Counting on GPUs
Mohammad Almasri (University of Illinois Urbana-Champaign)
Izzat El Hajj (American University of Beirut)
Rakesh Nagi (University of Illinois Urbana-Champaign)
Jinjun Xiong (University of Illinois Urbana-Champaign)
Wen-mei Hwu (NVIDIA and University of Illinois Urbana-Champaign)

Wednesday, Jne 29, 12:15-12:45 EDT
Special Session

Townhall Meeting with Dr. Margaret Martonosi, Assistant Director for Computer and Information Science and Engineering (CISE), US National Science Foundation

Wednesday, June 29, 13:00-13:45 EDT
Session 7: Memory Systems and Memory Management (Chair: Dionisios Pnevmatikatos, National Technical University of Athens)

Cloak: Tolerating Non-Volatile Cache Read Latency
Apostolos Kokolis (University of Illinois Urbana-Champaign)
Namrata Mantri (NVIDIA)
Shrikanth Ganapathy (Rivos Inc.)
Josep Torrellas (University of Illinois Urbana-Champaign)
John Kalamatianos (AMD Inc.)

Fast-Track Cache: A Huge Racetrack Memory L1 Data Cache
Hugo Tárrega (Universitat Politècnica de València)
Alejandro Valero (Universidad de Zaragoza)
Vicente Lorente (Universitat Politècnica de València)
Salvador Petit (Universitat Politècnica de València)
Julio Sahuquillo (Universitat Politècnica de València)

Dynamic Memory Management in Massively Parallel Systems: A Case on GPUs
Minh Pham (University of South Florida)
Hao Li (University of South Florida)
Yongke Yuan (Beijing University of Technology)
Chengcheng Mou (University of South Florida)
Kandethody Ramachandran (University of South Florida)
Zichen Xu (Jiaxing Neofelis Scientific)
Yicheng Tu (University of South Florida)

MegTaiChi: Dynamic Tensor-based Memory Management Optimization for DNN Training
Zhongzhe Hu (Chinese Academy of Sciences)
Junmin Xiao (Chinese Academy of Sciences)
Zheye Deng (Megvii Technology)
Mingyi Li (Chinese Academy of Sciences)
Kewei Zhang (Chinese Academy of Sciences)
Xiaoyang Zhang (Chinese Academy of Sciences)
Ke Meng (Alibaba Group)
Ninghui Sun (Chinese Academy of Sciences)
Guangmin Tan (Chinese Academy of Sciences)

Wednesday, June 29, 14:00-14:45 EDT
Session 8: Dense and Sparse Linear Algebra (Chair: Rong Ge, Clemson University)

Efficient, Out-of-Memory Sparse MTTKRP on Massively Parallel Architectures
Andy Nguyen (University of Oregon)
Ahmed E. Helal (Intel Labs)
Fabio Checconi (Intel Labs)
Jan Laukemann (University of Erlangen-Nürnberg)
Jesmin Hahan Tithi (Intel Labs)
Yongseok Soh (University of Oregon)
Teresa Ranadive (Laboratory of Physical Sciences)
Fabrizio Petrini (Intel Labs)
Jee W. Choi (University of Oregon)

Dense Dynamic Blocks: Optimizing SpMM for Processors with Vector and Matrix Units Using Machine Learning Techniques
Serif Yesil (University of Illinois Urbana-Champaign)
José E. Moreira (IBM Research)
Josep Torrellas (University of Illinois Urbana-Champaign)

Toward Accelerated Stencil Computation by Adapting Tensor Core Unit on GPU
Xiaoyan Liu (Beihang University)
Yi Liu (Beihang University)
Hailong Yang (Beihang University)
Jianjin Liao (Beihang University)
Mingzhen Li (Beihang University)
Zhongzhi Luan (Beihang University)
Depei Qian (Beihang University)

Thursday, June 30 9:00-10:15 EDT
Keynote Talk (Chair: Dimitrios Nikolopoulos, Virginia Tech)

Large-Scale Visual Analysis in the Age of Data
Dr. Chris R. Johnson
Distinguished Professor of Computer Science
Founding Director, Scientific Computing and Imaging Institute
University of Utah.

Thursday, June 30, 10:30-11:15 EDT
Session 9: Applications (Chair: Dionisios Pnevmatikatos, National Technical University of Athens)

KrakenOnMem: A Memristor-Augmented HW/SW Framework for Taxonomic Profiling
Taha Shahroodi (TU Delft)
Mahdi Zahedi (TU Delft)
Abhairaj Singh (TU Delft)
Stephan Wong (TU Delft)
Said Hamdioui (TU Delft)

GAPS: GPU-Acceleration of PDE Solvers for Wave Simulation
Bagus Hanindhito (UT Austin)
Dimitrios Gourounas (UT Austin)
Arash Fathi (ExxonMobil Technology and Engineering)
Dimitar Trenev (ExxonMobil Technology and Engineering)
Andreas Gerstlauer (UT Austin)
Lizy K. John (UT Austin)

Seamless Optimization of the GEMM Kernel for Task-based Programming Models
Arthur F. Lorenzon (Federal University of Pampa)
Sandro M. V. N. Marques (Federal University of Pampa)
Antoni Navarro (Barcelona Supercomputing Center)
Vicenç Beltran (Barcelona Supercomputing Center)

Thursday, June 30, 11:30-12:15 EDT
Session 10: Tools and Modeling (II) (Chair: Dingwen Tao, Washington State University)

Beyond Time Complexity: Data Movement Complexity Analysis for Matrix Multiplication
Wesley Smith (University of Rochester)
Aidan Goldfarb (University of Rochester)
Chen Ding (University of Rochester)

uiCA: Accurate Throughput Prediction of Basic Blocks on Recent Intel Microarchitectures
Andreas Abel (Saarland University)
Jan Reineke (Saarland University)

Preparing for Performance Analysis at Exascale
Jonathon Anderson (Rice University)
Yumeng Liu (Rice University)
John Mellor-Crummey (Rice University)

Clairvoyant: A Log-Based Transformer-Decoder for Failure Prediction in Large-Scale Systems
Khalid Ayedh Alharthi (University of Warwick)
Arshad Jhumka (University of Warwick)
Sheng Di (Argonne National Laboratory)
Franck Cappello (Argonne National Laboratory)

Thursday, June 30, 13:00-13:45 EDT
Session 11: Machine Learning (Chair: Dong Li, University of California Merced)

A Data-Centric Optimization Framework for Machine Learning
Oliver Rausch (ETH Zurich)
Tal Ben-Nun (ETH Zurich)
Nikoli Dryden (ETH Zurich)
Andrei Ivanov (ETH Zurich)
Shigang Li (ETH Zurich)
Torsten Hoefler (ETH Zurich)

PAME: Precision-Aware Multi-Exit DNN Serving for Reducing Latencies of Batched Inferences
Shulai Zhang (Shanghai Jiao Tong University)
Weihao Cui (Shanghai Jiao Tong University)
Quan Chen (Shanghai Jiao Tong University)
Zhengnian Zhang (Shanghai Jiao Tong University)
Yue Guan (Shanghai Jiao Tong University)
Jingwen Leng (Shanghai Jiao Tong University)
Chao Li (Shanghai Jiao Tong University)
Minyi Guo (Shanghai Jiao Tong University)

Handling Heavy-tailed Input of Transformer Inference on GPUs
Jiangsu Du (Sun Yat-sen University)
Jiazhi Jiang (Sun Yat-sen University)
Yang You (National University of Singapore)
Dan Huang (Sun Yat-sen University)
Yutong Lu (Sun Yat-sen University)

ACM ICS 2022

Program

ICS YouTube Channel

Workshops and Tutorials

Programming extremely heterogeneous system with the Minos Computing Library (MCL) Roberto Gioiosa (PPNL) Ryan Friese (PNNL)

Programming of Parallel Distributed Systems Made Easy with the SHAD C++ Library Vito Giovanni Castellana, Pacific Northwest National Laboratory Marco Minutoli, Pacific Northwest National Laboratory

MAERI-FPGA: Enabling HW Design Space Exploration on Real FPGA Hardware Platform Tushar Krishna, Georgia Tech Jianming Tong, Georgia Tech

3rd Workshop on Heterogeneous Memory Systems (HMEM) Joao Pedro Barreto, Universidade de Lisboa and INESC-ID Harald Servat, Intel Antonio J. Peña, Barcelona Supercomputing Center (BSC)

Main Conference and Keynotes

The Rise of Matrix Processing Dr. José Moreira Distinguished Researcher, IBM Research

Dr. José Moreira

Low Overhead and Context Sensitive Profiling of GPU-accelerated Applications Keren Zhou (Rice University) Jonathon Anderson (Rice University) Xiaozhu Meng (Rice University) John Mellor-Crummey (Rice University)

Calipers: A Criticality-aware Framework for Modeling Processor Performance Hossein Golestani (University of Michigan) Rathijit Sen (Microsoft) Vinson Young (Microsoft) Gagan Gupta (Microsoft)

SnuQS: Scaling Quantum Circuit Simulation using Storage Devices Daeyoung Park (Seoul National University) Heehoon Kim (Seoul National University) Jinpyo Kim (Seoul National University) Taehyun Kim (Seoul National University) Jaejin Lee (Seoul National University)

LITE: A Low-Cost Practical Inter-Operable GPU TEE Ardhi Wiratama Baskara Yudha (University of Central Florida) Jake Meyer (University of Central Florida) Shougang Yuan (University of Central Florida) Huiyang Zhou (North Carolina State University) Yan Solihin (University of Central Florida)

Software-Defined Floating-Point Number Formats and Their Application to Graph Processing Hans Vandierendonck (Queen's University Belfast)

MASTIFF: Structure-Aware Minimum Spanning Tree/Forest Mohsen Koohi Esfahani (Queen's University Belfast) Peter Kilpatrick (Queen's University Belfast) Hans Vandierendonck (Queen's University Belfast)

Efficient Exact K-Nearest Neighbor Graph Construction for Billion-Scale Datasets using GPUs with Tensor Cores Zhuoran Ji (The University of Hong Kong) Cho-Li Wang (The University of Hong Kong)

Optimized MPI Collective Algorithms for Dragonfly Topology Guangnan Feng (Sun Yat-sen University) Dezun Dong (National University of Defense Technology) Yutong Lu (Sun Yat-sen University)

The Computing and Information Science and Engineering Landscape: A Look Forward Dr. Margaret Martonosi Hugh Trumbull Adams '35 Professor of Computer Science, Princeton University Currently serving as Assistant Director for Computer and Information Science and Engineering (CISE) at NSF.

Dr. Margaret Martonosi

SparseLNR: Accelerating Sparse Tensor Computations Using Loop Nest Restructuring (Best Paper Award!) Adhitha Dias (Purdue University) Kirshanthan Sundararajah (Purdue University) Charitha Saumya (Purdue University) Milind Kulkarni (Purdue University)

VICO : Demand-driven Verification for Improving Compiler Optimizations Sharjeel Khan (Georgia Institute of Technology) Bodhisatwa Chatterjee (Georgia Institute of Technology) Santosh Pande (Georgia Institute of Technology)

Lifting C Semantics for Dataflow Optimization Alexandru Calotoiu (ETH Zurich) Tal Ben-Nun (ETH Zurich) Grzegorz Kwansiewski (ETH Zurich) Johannes de Fine Licht (ETH Zurich) Timon Schneider (ETH Zurich) Philipp Schaad (ETH Zurich) Torsten Hoefler (ETH Zurich)

High Throughput Multidimensional Tridiagonal System Solvers on FPGAs Kamalavasan Kamalakkannan (University of Warwick) Gihan R. Mudalige (University of Warwick) Istvan Z. Reguly (Pazmany Peter Catholic University) Suhaib A. Fahmy (King Abdullah University of Science and Technology)

AnySeq/GPU: A Novel Approach for Faster Sequence Alignment on GPUs André Müller (Johannes Gutenberg University) Bertil Schmidt (Johannes Gutenberg University) Richard Membrath (Technische Hochschule Ingolstadt) Roland Leißa (University of Mannheim) Sebastian Hack (Saarland University)

Townhall Meeting with Dr. Margaret Martonosi, Assistant Director for Computer and Information Science and Engineering (CISE), US National Science Foundation

Cloak: Tolerating Non-Volatile Cache Read Latency Apostolos Kokolis (University of Illinois Urbana-Champaign) Namrata Mantri (NVIDIA) Shrikanth Ganapathy (Rivos Inc.) Josep Torrellas (University of Illinois Urbana-Champaign) John Kalamatianos (AMD Inc.)

Dense Dynamic Blocks: Optimizing SpMM for Processors with Vector and Matrix Units Using Machine Learning Techniques Serif Yesil (University of Illinois Urbana-Champaign) José E. Moreira (IBM Research) Josep Torrellas (University of Illinois Urbana-Champaign)

Large-Scale Visual Analysis in the Age of Data Dr. Chris R. Johnson Distinguished Professor of Computer Science Founding Director, Scientific Computing and Imaging Institute University of Utah.

Dr. Chris R. Johnson

KrakenOnMem: A Memristor-Augmented HW/SW Framework for Taxonomic Profiling Taha Shahroodi (TU Delft) Mahdi Zahedi (TU Delft) Abhairaj Singh (TU Delft) Stephan Wong (TU Delft) Said Hamdioui (TU Delft)

GAPS: GPU-Acceleration of PDE Solvers for Wave Simulation Bagus Hanindhito (UT Austin) Dimitrios Gourounas (UT Austin) Arash Fathi (ExxonMobil Technology and Engineering) Dimitar Trenev (ExxonMobil Technology and Engineering) Andreas Gerstlauer (UT Austin) Lizy K. John (UT Austin)

Seamless Optimization of the GEMM Kernel for Task-based Programming Models Arthur F. Lorenzon (Federal University of Pampa) Sandro M. V. N. Marques (Federal University of Pampa) Antoni Navarro (Barcelona Supercomputing Center) Vicenç Beltran (Barcelona Supercomputing Center)

Beyond Time Complexity: Data Movement Complexity Analysis for Matrix Multiplication Wesley Smith (University of Rochester) Aidan Goldfarb (University of Rochester) Chen Ding (University of Rochester)

uiCA: Accurate Throughput Prediction of Basic Blocks on Recent Intel Microarchitectures Andreas Abel (Saarland University) Jan Reineke (Saarland University)

Preparing for Performance Analysis at Exascale Jonathon Anderson (Rice University) Yumeng Liu (Rice University) John Mellor-Crummey (Rice University)

Clairvoyant: A Log-Based Transformer-Decoder for Failure Prediction in Large-Scale Systems Khalid Ayedh Alharthi (University of Warwick) Arshad Jhumka (University of Warwick) Sheng Di (Argonne National Laboratory) Franck Cappello (Argonne National Laboratory)

A Data-Centric Optimization Framework for Machine Learning Oliver Rausch (ETH Zurich) Tal Ben-Nun (ETH Zurich) Nikoli Dryden (ETH Zurich) Andrei Ivanov (ETH Zurich) Shigang Li (ETH Zurich) Torsten Hoefler (ETH Zurich)

Handling Heavy-tailed Input of Transformer Inference on GPUs Jiangsu Du (Sun Yat-sen University) Jiazhi Jiang (Sun Yat-sen University) Yang You (National University of Singapore) Dan Huang (Sun Yat-sen University) Yutong Lu (Sun Yat-sen University)

Rethinking Graph Data Placement for Graph Neural Network Training on Multiple GPUs Shihui Song (University of Iowa) Peng Jiang (University of Iowa)