Model Quantization & Neural Network Compression —
Building AI that fits on your phone
I work on making large AI models — LLMs, vision networks, perception pipelines — run fast and accurately on mobile and edge hardware, without sacrificing quality. Based at Qualcomm AI Research, San Diego.
I am a Staff AI Researcher at Qualcomm AI Research in San Diego, focused on AI efficiency — compressing and quantizing large neural networks so they run fast and accurately on-device, from smartphones to autonomous vehicles. My core work spans post-training quantization, mixed-precision compression, and making LLMs and vision models viable on Qualcomm's Snapdragon hardware without sacrificing quality.
Before moving to AI Research, I spent four years on Qualcomm's Wireless R&D team applying deep learning to 5G/6G problems: building neural RF-SLAM systems for GPS-free indoor positioning, constructing digital twin networks, and developing AI-native localization pipelines — including a live demo at MWC 2024 in Barcelona.
I hold a Ph.D. in Electrical and Computer Engineering from Boise State University (dissertation: Reinforcement Learning in Self-Organizing Cellular Networks, advised by Prof. Hani Mehrpouyan), and was a visiting scholar at the University of Texas at Austin collaborating with Prof. Jeffrey Andrews.
Research on model quantization and compression for efficient on-device AI inference, with a focus on deploying Mixture-of-Experts (MoE) models across the full hardware spectrum.
Applied deep learning to 5G/6G positioning, mapping, and network intelligence.
Collaboration with Prof. Jeffrey Andrews on large-scale 5G HetNet simulation and reinforcement learning for network optimization.
Dissertation: Reinforcement Learning in Self-Organizing Cellular Networks — Advisor: Prof. Hani Mehrpouyan.