Publications
publications by categories in reversed chronological order. generated by jekyll-scholar.
2025
- PreprintKVFlow: Efficient Prefix Caching for Accelerating LLM-Based Multi-Agent WorkflowsarXiv preprint arXiv:2507.07400, 2025
- SOSP’25Mercury: Unlocking Multi-GPU Operator Optimization for Large Language Models via Remote Memory SchedulingIn Proceedings of the 31th symposium on operating systems principles, 2025
- SOSP’25HedraRAG: Co-Optimizing Generation and Retrieval for Heterogeneous Retrieval-Augmented Generation WorkflowsIn Proceedings of the 31th symposium on operating systems principles, 2025
2024
2023
- ASPLOS’23RECom: A Compiler Approach to Accelerating Recommendation Model Inference with Massive Embedding ColumnsIn Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 4, 2023