publications

publications by categories in reversed chronological order. generated by jekyll-scholar.

2025

  1. OSDI’25
    WLB-LLM: Workload-Balanced 4D Parallelism for Large Language Model Training (To Appear)
    Zheng Wang, Anna Cai, Xinfeng Xie, Zaifeng Pan, Yue Guan, Weiwei Chu, Jie Wang, Shikai Li, Jianyu Huang, Chris Cai, Yuchen Hao, and Yufei Ding
    In 19th USENIX Symposium on Operating Systems Design and Implementation, 2025
  2. USENIX ATC’25
    PluS: Highly Efficient and Expandable ML Compiler with Pluggable Graph Schedules (To Appear)
    Ruofan Wu, Zhen Zheng, Feng Zhang, Chuanjie Liu, Zaifeng Pan, Jidong Zhai, and Xiaoyong Du
    In USENIX Annual Technical Conference, 2025
  3. MLSys’25
    FastTree: Optimizing Attention Kernel and Runtime for Tree-Structured LLM Inference
    Zaifeng Pan, Yitong Ding, Yue Guan, Zheng Wang, Zhongkai Yu, Xulong Tang, Yida Wang, and Yufei Ding
    In Proceedings of Machine Learning and Systems, 2025

2024

  1. SC’24
    RecFlex: Enabling Feature Heterogeneity-Aware Optimization for Deep Recommendation Models with Flexible Schedules
    Zaifeng Pan, Zhen Zheng, Feng Zhang, Bing Xie, Ruofan Wu, Shaden Smith, Chuanjie Liu, Olatunji Ruwase, Xiaoyong Du, and Yufei Ding
    In International Conference for High Performance Computing, Networking, Storage and Analysis, 2024

2023

  1. SIGMOD’24
    BladeDISC: Optimizing dynamic shape machine learning workloads via compiler approach
    Zhen Zheng, Zaifeng Pan, Dalin Wang, Kai Zhu, Wenyi Zhao, Tianyou Guo, Xiafei Qiu, Minmin Sun, Junjie Bai, Feng Zhang, Xiaoyong Du, Jidong Zhai, and Wei Lin
    Proceedings of the ACM on Management of Data, 2023
  2. ASPLOS’23
    RECom: A Compiler Approach to Accelerating Recommendation Model Inference with Massive Embedding Columns
    Zaifeng Pan, Zhen Zheng, Feng Zhang, Ruofan Wu, Hao Liang, Dalin Wang, Xiafei Qiu, Junjie Bai, Wei Lin, and Xiaoyong Du
    In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 4, 2023
    🏆  Distinguished Artifact Award (presented at ASPLOS’24)

2021

  1. TPDS
    G-slide: A gpu-based sub-linear deep learning engine via lsh sparsification
    Zaifeng Pan, Feng Zhang, Hourun Li, Chenyang Zhang, Xiaoyong Du, and Dong Deng
    IEEE Transactions on Parallel and Distributed Systems, 2021
  2. TPDS
    Exploring data analytics without decompression on embedded GPU systems
    Zaifeng Pan, Feng Zhang, Yanliang Zhou, Jidong Zhai, Xipeng Shen, Onur Mutlu, and Xiaoyong Du
    IEEE Transactions on Parallel and Distributed Systems, 2021
  3. ICDE’21
    G-TADOC: Enabling efficient GPU-based text analytics without decompression
    Feng Zhang, Zaifeng Pan, Yanliang Zhou, Jidong Zhai, Xipeng Shen, Onur Mutlu, and Xiaoyong Du
    In 2021 IEEE 37th International Conference on Data Engineering (ICDE), 2021