Associate Professor Shuaiwen Song
Affiliated Professor, University of Washington
IEEE Mid-Career Award for Scalable Computing
IEEE Early Career Award for High Performance Computing
Australian Most Innovative Engineers
U.S. DoE Lab Pathway to Excellence Awardee
I am the SOAR associate professor (tenured) at the School of Computer Science at University of Sydney, and I direct theFuture System Architecture Lab (FSA). I am a visiting professor at University of Washington and Microsoft. I hold an Affiliated Professor position withUniversity of Washington Electrical Engineering department. Prior to my appointment at University of Sydney, I worked for U.S. Department of Energy Lab for five years as a senior staff scientist and technical lead. In 2017 and 2022, I was awarded with IEEE HPC early career award and IEEE mid-career award for scalable computing, respectively. I was also awarded with 2022 Alibaba Gloab Faculty Award (AIR), 2022 SOAR Fellowship, 2022/2021 Google Brain Collaboration Award, 2021 Facebook faculty award, 2020 Australia's Most Innovative Engineer Award and a ACM distinguished speaker. I am also a Lawrence Scholar and a recipient of Paul E. Torgersen Excellent research award, a 2018 DOE pathway to excellence research award, 2015 and 2017 DOE PNNL lab outstanding research award, two Supercomputing (IEEE/ACM SC) best paper runners-up (2015 and 2017), and 2017 HiPEAC paper award. I have published in the top HPC and computer architecture conferences including ISCA, HPCA, ASPLOS, MICRO, and Supercomputing (SC). My past and current research has been supported by Microsoft, Google, NVIDIA, Intel, U.S. government agencies including DOE office of science (ASCR), DoD, DARPA and DoE Lab LDRD, and Australian Research Council (ARC). During my tenure at PNNL, I led two DOE lab LDRD projects on AI-driven architecture design and large-scale data analytics acceleration. At University of Sydney, I run Future System Architecture (FSA) Lab with my wonderful students. In my spare time, I am also consulting for tech startups.
- High Performance Computing System Design (e.g., System ML)
- Hardware-Software System Co-design
- Emerging architectures (e.g., heterogenous architectures, emerging many-core accelerators, novel memory technologies and quantum architectures)
- Futuristic system and architecture exploration (e.g., planet-scale XR system design, cross-stack design for quantum systems)
Teaching: Operating System (COMP3520)
Concurrency and Parallelism (SOFT3410): System Hardware Perspectives
Future System Architecture Lab:https://www.fsa-lab.org/
Large-Scale Sparse Model Design: with Google
Tiered memory data center system design: with Google
Core system and compiler optimizer designs for scalable data center: with Microsoft and Alibaba
Planet-scale XR system design and its impact on the society: with Meta
Future heterogenous computing and storage system exploration: with Australian Research Council (ARC)
Von Neumann Inspired Quantum System Architecture and language interface
University of Washington, ECE
Microsoft
- 2022 mid-career award for Scalable computing
- 2022 Alibaba AIR Faculty Award
- 2022 Google Collaboration Award
- 2022 SOAR Fellowship Award
- 2021 Facebook Faculty Award
- 2021 Google GCP Award for Google Brain collaboration.
- ACM distinguished speaker
- 2020 Australia's Most Innovative Engineer Award
- 2020 Google collaboration resource award for Machine Learning System research
- 2018 U.S. DOE Pathway to Excellence Research Award
- 2018U.S. Department of Energy Research Spotlight
- 2018 Best paper finalist for IISWC
- 2017IEEE Early Career Award in High Performance Computing (IEEE/ACM 高性能计算杰出新人奖. I am the first winner of the award with Chinese background).
- Invited judge forU.S. DOE R&D 100 award in 2017 and 2018
- 2017 HiPEAC Paper Award
- 2017 Best paper runner-up for IEEE/ACM Supercomputing (SC'17)
- 2015 Best paper runner-up for IEEE/ACM Supercomputing (SC'15)
- 2016 and 2017 U.S. DOE PNNL lab Outstanding Performance Award
- Paul E. Torgersen excellent research award
- Lawrence Scholar
Project title | Research student |
---|---|
Optimizing Large Generative Models: Algorithmic Advancements and System Design Strategies | Bobbie BIE |
Optimize use of non volatile memory | William WU |
Designing Efficient GPU-based Systems for Large Language Model Inference | Haojun XIA |
Comprehensive Optimization of Deep Learning: Efficient and Effective Algorithm and System Design | Zhongzhu Charlie ZHOU |
Compiler and Runtime Optimizations for Complex Workload Patterns on Modern GPUs | Donglin ZHUANG |
Publications
Journals
- Huan, C., Wu, Y., Liu, H., Liu, Y., Zhang, H., Song, S., Pandey, S., Chen, S., Fang, X., Jin, Y., et al (2024). TEA+: A Novel Temporal Graph Random Walk Engine with Hybrid Storage Architecture. ACM Transactions on Architecture and Code Optimization, 21(2), 37. [More Information]
- Huan, C., Liu, Y., Zhang, H., Liu, H., Chen, S., Song, S., Wu, Y. (2024). TeGraph+: Scalable Temporal Graph Processing Enabling Flexible Edge Modifications. IEEE Transactions on Parallel and Distributed Systems, 35(8), 1469-1487. [More Information]
- Wang, J., Wang, Z., Yu, B., Tang, J., Song, S., Liu, C., Hu, Y. (2023). Data Fusion in Infrastructure-Augmented Autonomous Driving System: Why? Where? and How? IEEE Internet of Things Journal. [More Information]
Conferences
- Wen,, Y., Xie,, C., Song, S., Fu,, X. (2023). Post0-VR: Enabling Universal Realistic Rendering for Modern VR via Exploiting Architectural Similarity and Data Sharing. Proceedings - International Symposium on High-Performance Computer Architecture, : IEEE Computer Society.
- Zheng, Z., Yang, X., Zhao, P., Long, G., Zhu, K., Zhu, F., Zhao, W., Liu, X., Yang, J., Zhai, J., Song, S., et al (2022). AStitch: Enabling a New Multi-dimensional Optimization Space for Memory-Intensive ML Training and Inference on Modern SIMT Architectures. Macro In Conference. [More Information]
- Liu,, S., Wang,, J., Wang,, Z., Yu,, B., Hu,, W., Liu,, Y., Tang,, J., Song, S., Liu,, C., Hu,, Y. (2022). Brief Industry Paper: The Necessity of Adaptive Data Fusion in Infrastructure-Augmented Autonomous Driving System. Proceedings of the IEEE Real-Time and Embedded Technology and Applications Symposium, RTAS, : SPIE.
2024
- Huan, C., Wu, Y., Liu, H., Liu, Y., Zhang, H., Song, S., Pandey, S., Chen, S., Fang, X., Jin, Y., et al (2024). TEA+: A Novel Temporal Graph Random Walk Engine with Hybrid Storage Architecture. ACM Transactions on Architecture and Code Optimization, 21(2), 37. [More Information]
- Huan, C., Liu, Y., Zhang, H., Liu, H., Chen, S., Song, S., Wu, Y. (2024). TeGraph+: Scalable Temporal Graph Processing Enabling Flexible Edge Modifications. IEEE Transactions on Parallel and Distributed Systems, 35(8), 1469-1487. [More Information]
2023
- Wang, J., Wang, Z., Yu, B., Tang, J., Song, S., Liu, C., Hu, Y. (2023). Data Fusion in Infrastructure-Augmented Autonomous Driving System: Why? Where? and How? IEEE Internet of Things Journal. [More Information]
- Yang, Y., Xie, C., Liu, L., Leong, P., Song, S. (2023). Efficient Radius Search for Adaptive Foveal Sizing Mechanism in Collaborative Foveated Rendering Framework. IEEE Transactions on Mobile Computing. [More Information]
- Wang, L., Wan, Q., Ma, P., Wang, J., Chen, M., Song, S., Fu, X. (2023). Enabling High-Efficient ReRAM-Based CNN Training Via Exploiting Crossbar-Level Insignificant Writing Elimination. IEEE Transactions on Computers, 72(11), 3218-3230. [More Information]
2022
- Zheng, Z., Yang, X., Zhao, P., Long, G., Zhu, K., Zhu, F., Zhao, W., Liu, X., Yang, J., Zhai, J., Song, S., et al (2022). AStitch: Enabling a New Multi-dimensional Optimization Space for Memory-Intensive ML Training and Inference on Modern SIMT Architectures. Macro In Conference. [More Information]
- Liu,, S., Wang,, J., Wang,, Z., Yu,, B., Hu,, W., Liu,, Y., Tang,, J., Song, S., Liu,, C., Hu,, Y. (2022). Brief Industry Paper: The Necessity of Adaptive Data Fusion in Infrastructure-Augmented Autonomous Driving System. Proceedings of the IEEE Real-Time and Embedded Technology and Applications Symposium, RTAS, : SPIE.
- Zhai, J., Zheng, L., Zhang, F., Tang, X., Wang, H., Yu, T., Jin, Y., Song, S., Chen, W. (2022). Detecting Performance Variance for Parallel Applications Without Source Code. IEEE Transactions on Parallel and Distributed Systems, 33(12), 4239-4255. [More Information]
2021
- Zhang, H., Li, L., Zhuang, D., Liu, R., Song, S., Tao, D., Wu, Y., Song, S. (2021). An efficient uncertain graph processing framework for heterogeneous architectures. 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2021, United States: Association for Computing Machinery (ACM). [More Information]
- Zhang, C., Yuan, G., Niu, W., Tian, J., Jin, S., Zhuang, D., Jiang, Z., Wang, Y., Ren, B., Song, S., et al (2021). ClickTrain: Efficient and accurate end-to-end deep learning training via fine-grained architecture-preserving pruning. 35th ACM International Conference on Supercomputing, ICS 2021, New York: Association for Computing Machinery (ACM). [More Information]
- Zhang, X., Fu, X., Zhuang, D., Xie, C., Song, S. (2021). Enabling Highly Efficient Capsule Networks Processing through Software-Hardware Co-Design. IEEE Transactions on Computers, 70(4), 495-510. [More Information]
2020
- Zhang, X., Song, S., Xie, C., Wang, J., Zhang, W., Fu, X. (2020). Enabling Highly Efficient Capsule Networks Processing Through A PIM-Based Architecture Design. 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), Piscataway: Institute of Electrical and Electronics Engineers (IEEE). [More Information]
- Tan, J., Yan, K., Song, S., Fu, X. (2020). Energy-Efficient GPU L2 Cache Design Using Instruction-Level Data Locality Similarity. ACM Transactions on Design Automation of Electronic Systems, 25(6), 52:1-52:18. [More Information]
- Li, A., Song, S., Chen, J., Li, J., Liu, X., Tallent, N., Barker, K. (2020). Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect. IEEE Transactions on Parallel and Distributed Systems, 31(1), 94-110. [More Information]
2019
- Li, A., Geng, T., Wang, T., Herbordt, M., Song, S., Barker, K. (2019). BSTC: A Novel Binarized-Soft-Tensor-Core Design for Accelerating Bit-Based Approximated Neural Nets. The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'19), New York: Association for Computing Machinery (ACM). [More Information]
- Tan, J., Yan, K., Song, S., Fu, X. (2019). LoSCache: Leveraging Locality Similarity to Build Energy-Efficient GPU L2 Cache. 22nd Design, Automation and Test in Europe Conference and Exhibition, DATE 2019, USA: Institute of Electrical and Electronics Engineers Inc. [More Information]
- Geng, T., Wang, T., Wu, C., Yang, C., Song, S., Li, A., Herbordt, M. (2019). LP-BNN: Ultra-low-latency BNN inference with layer parallelism. 30th IEEE International Conference on Application-Specific Systems, Architectures and Processors, ASAP 2019, USA: Institute of Electrical and Electronics Engineers Inc. [More Information]
2018
- Shen, D., Li, A., Song, S., Liu, X. (2018). CUDAAdvisor: LLVM-Based Runtime Profiling for Modern GPUs. 16th International Symposium on Code Generation and Optimization, CGO 2018, USA: Association for Computing Machinery (ACM). [More Information]
- Roy, P., Krishnamoorthy, S., Song, S., Liu, X. (2018). Lightweight Detection of Cache Conflicts. 16th International Symposium on Code Generation and Optimization, CGO 2018, USA: Association for Computing Machinery (ACM). [More Information]
- Roy, P., Song, S., Krishnamoorthy, S., Vishnu, A., Sengupta, D., Liu, X. (2018). NUMA-Caffe: NUMA-Aware Deep Learning Neural Networks. ACM Transactions on Architecture and Code Optimization, 15(2). [More Information]
2017
- Li, A., Zhao, W., Song, S. (2017). BVF: Enabling significant on-chip power savings via bit-value-favor for throughput processors. 50th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2017, USA: IEEE Computer Society. [More Information]
- Qiu, J., Zhao, Z., Wu, B., Vishnu, A., Song, S. (2017). Enabling scalability-sensitive speculative parallelization for FSM computations. 31st ACM International Conference on Supercomputing, ICS 2017, USA: Association for Computing Machinery (ACM). [More Information]
- Zhang, N., Jiang, C., Sun, X., Song, S. (2017). Evaluating GPGPU memory performance through the C-AMAT Model. 2017 Workshop on Memory Centric Programming for HPC, MCHPC 2017, USA: Association for Computing Machinery (ACM). [More Information]
2016
- Tan, J., Song, S., Yan, K., Fu, X., Marquez, A., Kerbyson, D. (2016). Combating the Reliability Challenge of GPU Register File at Low Supply Voltage. 25th International Conference on Parallel Architectures and Compilation Techniques, PACT 2016, USA: Institute of Electrical and Electronics Engineers. [More Information]
- Li, A., Song, S., Kumar, A., Zhang, E., Chavarria-Miranda, D., Corporaal, H. (2016). Critical points based register-concurrency autotuning for GPUs. 19th Design, Automation and Test in Europe Conference and Exhibition, DATE 2016, USA: Institute of Electrical and Electronics Engineers.
- Tao, D., Song, S., Krishnamoorthy, S., Wu, P., Liang, X., Zhang, E., Kerbyson, D., Chen, Z. (2016). New-Sum: A Novel Online ABFT Scheme For General Iterative Methods. 25th ACM International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2016, USA: Association for Computing Machinery (ACM). [More Information]
2015
- Sengupta, D., Agarwal, K., Song, S., Schwan, K. (2015). GraphReduce: Large-Scale Graph Analytics on Accelerator-Based HPC Systems. 29th IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2015, Piscataway: Institute of Electrical and Electronics Engineers (IEEE). [More Information]
- Sengupta, D., Song, S., Agarwal, K., Schwan, K. (2015). GraphReduce: Processing large-scale graphs on accelerator-based systems. International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2015, USA: IEEE Computer Society. [More Information]
- Shrestha, S., Manzano, J., Marquez, A., Zuckerman, S., Song, S., Gao, G. (2015). Gregarious Data Re-structuring in a Many Core Architecture. 17th IEEE International Conference on High Performance Computing and Communications, IEEE 7th International Symposium on Cyberspace Safety and Security and IEEE 12th International Conference on Embedded Software and Systems, HPCC-ICESS-CSS 2015, USA: Institute of Electrical and Electronics Engineers. [More Information]
2014
- Marquez, A., Manzano, J., Song, S., Meister, B., Shrestha, S., St. John, T., Gao, G. (2014). ACDT: Architected Composite Data Types Trading-in Unfettered Data Access for Improved Execution. 20th IEEE International Conference on Parallel and Distributed Systems, ICPADS 2014, USA: IEEE Computer Society. [More Information]
- You, Y., Song, S., Kerbyson, D. (2014). An adaptive cross-architecture combination method for graph traversal. 28th ACM International Conference on Supercomputing, ICS 2014, USA: Association for Computing Machinery (ACM). [More Information]
- You, Y., Fu, H., Song, S., Dehnavi, M., Gan, L., Huang, X., Yang, G. (2014). Evaluating multi-core and many-core architectures through accelerating the three-dimensional Lax-Wendroff correction stencil. International Journal of High Performance Computing Applications, 28(3), 301-318. [More Information]
2013
- Song, S., Su, C., Rountree, B., Cameron, K. (2013). A simplified and accurate model of power-performance efficiency on emergent GPU architectures. 27th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2013, USA: IEEE Computer Society. [More Information]
- Vishnu, A., Song, S., Marquez, A., Barker, K., Kerbyson, D., Cameron, K., Balaji, P. (2013). Designing energy efficient communication runtime systems: A view from PGAS models. Journal of Supercomputing, 63(3), 691-709. [More Information]
- Li, B., Song, S., Bezakova, I., Cameron, K. (2013). EDR: An Energy-Aware Runtime Load Distribution System for Data-Intensive Applications in the Cloud. 15th IEEE International Conference on Cluster Computing, CLUSTER 2013, USA: IEEE Computer Society. [More Information]
2011
- Li, A., Song, S., Brugel, E., Kumar, A., Chavarria-Miranda, D., Corporaal, H. (2011). X: A Comprehensive Analytic Model for Parallel Machines. 30th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2016, USA: Institute of Electrical and Electronics Engineers Inc. [More Information]
Selected Grants
2022
- A Tale of Two Cities: Exploring Principles and Key Technologies of Multi-Scale Multi-Dimensional Machine Learning Inference System Optimization, Song S, Alibaba DAMO Academy/Alibaba Innovative Research (AIR) Programme
2021
- SOAR Prize, Song S, DVC Research/SOAR Prizes
- How Virtual Reality Disrupts Dignity In The Digital Era, Song S, Facebook Research (USA)/Research Award
- Adaptive Key-value Store for Future Extreme Heterogeneous Systems, Fekete A, Zwaenepoel W, Song S, Gopinath R, Scholz B, Australian Research Council (ARC)/Discovery Projects (DP)
Professional Services and Activities
Organizing Committee
- Area chair: Architecture and Networks,Supercomputing (SC) 2022
- Area chair: Architecture (IPDPS 2021)
- Area chair: Performance (ICPP2020)
- Best paper selection panel chair: IPDPS21
- ACM ASPLOS'18, poster chair and ACM student research compeition (SRC)
- Publicity Chair, ACM HPDC, 2016, 2017, 2018 and 2019
- Area chair, 2020 International Conference on Parallel Processing (ICPP)
- Publication chair, 2018 ACM International Conference on Supercomputing (ICS)
- Workshop chair and steering committee, International Workshop on Emerging Parallel and Distributed Runtime Systems and Middleware (IPDRM)
- Workshop chair, International workshop on High-Performance, Power-Aware Computing (HPPAC)
Journal Editorial
- Editor, Elsevier High-Confidence Computing.
- Associate editor: IEEE transaction on sustainable computing.
- IEEE Transactions on Parallel and Distributed Systems (TPDS) review board
- ACM Transactions on Computer Systems (TOCS), expert reviewer
- Journal for Concurrency and Computation, Practice and Experience (CCPE), review board
Program Committee
- PC, the 50th IEEE/ACM International Symposium on Computer Architecture (ISCA), 2023
- PC, Neural Information Processing Systems (NIPS) 2023.
- PC, ACM SIGPLAN Annual Symposium Principles and Practice of Parallel Programming (PPoPP),2023
- PC, 54th IEEE/ACM International Symposium on Microarchitecture (MICRO), 2022
- ERC, 49th International Symposium on Computer Architecture (ISCA), 2022
- Session chair, 48th International Symposium on Computer Architecture (ISCA), 2021
- ERC, 54th IEEE/ACM International Symposium on Microarchitecture (MICRO), 2021
- PC, 48th International Symposium on Computer Architecture (ISCA), 2021
- PC, 2021 IEEE/ACM Supercomputing (SC)
- PC, ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC), 2021
- ERC, 27th IEEE international symposium on High-Performance Computer Architecture (HPCA),2021
- ERC, IEEE/ACM International Symposium on Microarchitecture (MICRO-53), 2020
- ERC, 47th International Symposium on Computer Architecture (ISCA), 2020
- ACM SIGPLAN Annual Symposium Principles and Practice of Parallel Programming (PPoPP), 2019˜2020
- ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC), 2016˜2018, 2020
- IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2015˜2017
- IEEE/ACM International Conference on Parallel Architectures and Compilation Techniques (PACT), 2019
- ACM International Conference on Supercomputing (ICS), 2017
- IEEE International Conference on Distributed Computing Systems (ICDCS), 2020
- IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2016˜2018
- NSF proposal panelist, 2015
- R&D award judge, 2018-2019
Journal Invited Reviewer
- ACM Transactions on Computer Systems (TOCS), 2020
- ACM Transactions on Architecture and Code Optimization (TACO), 2018
- IEEE Transactions on Parallel and Distributed Systems (TPDS), 2014˜2020
- IEEE Transactions on Computers (TC), 2015˜2016
- IEEE Transactions on Sustainable Computing, 2019˜2020
- Journal of Parallel and Distributed Computing - Elsevier (JPDC)
- The International Journal of High Performance Computing (IJHPCA)
- The Journal of Supercomputing Elsevier (JOS)
- Parallel Computing-Elsevier (ParCo)
In the media
- DOE featured research highlight:Unlocking On-Package Memory’s Effects on High-Performance Computing’s Scientific Kernels
- Digital Trends: Boosting graphics performance through processing in-memory
- Yahoo Tech: Processing in memory’ technique yields 65% boost in 3D rendering speeds
- Bit-Tech: Researchers boost graphics performance through processing in-memory
- PNNL featured research news: Changing the game: Researchers unlock hardware's hidden talent for rendering 3D graphics for science — and video games
- PNNL research highlight: Changing the game
- PNNL research spotlight: Improving computing system performance, March 2016.
- "Powering Down", article about my work on power management on large-scale system, published on DOE DEIXIS magazine featured article, written by Monte Basgall
- PNNL Science Research Highlight: Energy Star: Novel models of HPC systems depict the interplay between energy efficiency and resilience", 2015.
- PNNL ACMDD staff award and honors: PNNL HPC Staff Take on Energy Efficiency, Resilience at scale, 2015.
- PNNL ACMDD staff award and honors: PNNL HPC Staff research: Improving Energy, Performance Efficiency for High Performance Computing, 2015.
- Current Magazine: HPC system modeling: Depicting interplay between energy efficiency and resilience", June 2015 issue.
- InsideHPC: "PNNL looks at undervolting to meet exascale goals", written by Rich Bruecker:
- HPC Wire Top Feature Article: "Tackling the Power and Energy wall for Future HPC Systems", Dec, 2013.