publications

publications by categories in reversed chronological order. generated by jekyll-scholar.

2025

  1. OSDI’25
    WLB-LLM: Workload-Balanced 4D Parallelism for Large Language Model Training (To Appear)
    Zheng Wang , Anna Cai , Xinfeng Xie , Zaifeng Pan, Yue Guan , Weiwei Chu , Jie Wang , Shikai Li , Jianyu Huang , Chris Cai , Yuchen Hao , and Yufei Ding
    In 19th USENIX Symposium on Operating Systems Design and Implementation , 2025
  2. USENIX ATC’25
    PluS: Highly Efficient and Expandable ML Compiler with Pluggable Graph Schedules (To Appear)
    Ruofan Wu , Zhen Zheng , Feng Zhang , Chuanjie Liu , Zaifeng Pan, Jidong Zhai , and Xiaoyong Du
    In USENIX Annual Technical Conference , 2025
  3. MLSys’25
    FastTree: Optimizing Attention Kernel and Runtime for Tree-Structured LLM Inference (To Appear)
    Zaifeng Pan, Yitong Ding , Yue Guan , Zheng Wang , Zhongkai Yu , Xulong Tang , Yida Wang , and Yufei Ding
    In Proceedings of Machine Learning and Systems , 2025

2024

  1. SC’24
    RecFlex: Enabling Feature Heterogeneity-Aware Optimization for Deep Recommendation Models with Flexible Schedules
    Zaifeng Pan, Zhen Zheng , Feng Zhang , Bing Xie , Ruofan Wu , Shaden Smith , Chuanjie Liu , Olatunji Ruwase , Xiaoyong Du , and Yufei Ding
    In International Conference for High Performance Computing, Networking, Storage and Analysis , 2024

2023

  1. SIGMOD’24
    BladeDISC: Optimizing dynamic shape machine learning workloads via compiler approach
    Zhen Zheng , Zaifeng Pan, Dalin Wang , Kai Zhu , Wenyi Zhao , Tianyou Guo , Xiafei Qiu , Minmin Sun , Junjie Bai , Feng Zhang , Xiaoyong Du , Jidong Zhai , and Wei Lin
    Proceedings of the ACM on Management of Data, 2023
  2. ASPLOS’23
    RECom: A Compiler Approach to Accelerating Recommendation Model Inference with Massive Embedding Columns
    Zaifeng Pan, Zhen Zheng , Feng Zhang , Ruofan Wu , Hao Liang , Dalin Wang , Xiafei Qiu , Junjie Bai , Wei Lin , and Xiaoyong Du
    In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 4 , 2023
    🏆  Distinguished Artifact Award (presented at ASPLOS’24)

2021

  1. TPDS
    G-slide: A gpu-based sub-linear deep learning engine via lsh sparsification
    Zaifeng Pan, Feng Zhang , Hourun Li , Chenyang Zhang , Xiaoyong Du , and Dong Deng
    IEEE Transactions on Parallel and Distributed Systems, 2021
  2. TPDS
    Exploring data analytics without decompression on embedded GPU systems
    Zaifeng Pan, Feng Zhang , Yanliang Zhou , Jidong Zhai , Xipeng Shen , Onur Mutlu , and Xiaoyong Du
    IEEE Transactions on Parallel and Distributed Systems, 2021
  3. ICDE’21
    G-TADOC: Enabling efficient GPU-based text analytics without decompression
    Feng Zhang , Zaifeng Pan, Yanliang Zhou , Jidong Zhai , Xipeng Shen , Onur Mutlu , and Xiaoyong Du
    In 2021 IEEE 37th International Conference on Data Engineering (ICDE) , 2021