Xiaoquan Kong - AI Agent & Systems Leader

About Me

I am an AI leader specializing in agentic systems, with over a decade of experience engineering impactful AI solutions at tech giants like Baidu and Alibaba. My work has driven solutions with over 100k daily queries and saved over $15M annually. I am currently enhancing this deep industry experience at Duke University, where, as a Research Assistant for the QUBIT project, I created a RAG-enhanced, multimodal, and multi-agent system that enables faculty to analyze student learning patterns via natural language interactions. My goal is to leverage my expertise in both large-scale engineering and agentic AI to lead high-impact product initiatives.

Academic & Competitive Highlights

                    1st Place, Duke AI Hackathon:
                    Won the most competitive Education track by architecting a multimodal RAG and Agent system powered by Google's large language models.
                  

                    1st Place, Kaggle Competition:
                    Set a record-breaking score in the "Modeling Process & Algorithms" course competition through a meticulously optimized ML pipeline.
                  

                    1st Place, Business Simulation:
                    Led a team to victory in a 4-week business challenge, excelling in market analysis, financial forecasting, and strategic decision-making.
                  

                    Education:
                    MEng in AI for Product Innovation at Duke University (GPA: 3.9/4.0).
                  

Professional Experience

Over 10 years of hands-on experience architecting and delivering high-impact AI solutions at scale for industry-leading technology companies.

Machine Learning Algorithm Expert

Baidu Group | 2022 - 2024

Architected and led the development of an end-to-end RAG-based agentic system for Q&A (100K+ daily queries, 92% user satisfaction), utilizing Milvus and Elasticsearch for hybrid search.
Drove the optimization of a core query rewriting system via LLM fine-tuning (SFT with LoRA on Qwen-1.5-14B), improving accuracy to 95% while slashing P99 latency to 0.8s.
Contributed to the AI engineering roadmap by evaluating emerging tech, including pioneering RLHF exploration with HuggingFace TRL to inform strategic tuning decisions.

Machine Learning Algorithm Expert

Geely Group | 2018 - 2022

Architected the conversational AI platform deployed in over 1 million vehicles across major brands like Geely, Volvo, and BMW.
Recruited, mentored, and led a high-performance, cross-functional team of 10+ NLP and software engineers.
Spearheaded the successful transition of the core NLU service from a legacy rule-based engine to a modern deep learning-based system.

Senior Machine Learning Engineer

Alibaba Group | 2016 - 2018

Engineered a customer service bot for Ele.me that handled over 300,000 daily queries, reducing manual workload by 28% and saving an estimated $15M+ annually.
Pioneered the use of neural networks for semantic matching, creating a core engine that outperformed traditional ML models in production.

Skills & Expertise

A versatile skill set spanning AI research, production-grade software engineering, and strategic leadership.

AI/ML Engineering & Research

LLMs & Agents: Agentic Design, RAG, Tool Use, Fine-tuning (SFT, LoRA)
Core Frameworks: LangGraph, PyTorch, TensorFlow (Codeowner), JAX
Deep Learning: Transformers, NLP, Computer Vision
Tooling: spaCy (Contributor), Rasa (Superhero), Hugging Face, W&B

Software & Systems Engineering

Languages: Python (Expert), Rust, C++
System Design: High-performance APIs, Scalable Architecture
Databases: Vector DBs (Pinecone, Milvus), Elasticsearch, SQL, NoSQL
MLOps & Cloud: Docker, K8s, TFX, CI/CD, GCP, AWS (Bedrock, SageMaker)

Leadership & Product

Strategy: Technical & Product Vision, Market Analysis
Management: Team Leadership, Cross-functional Collaboration
Communication: Technical Writing, Public Speaking, Mentorship
Business Acumen: Duke MEng in Business & Management

Selected Projects

A showcase of end-to-end AI systems, high-performance frameworks, and impactful applications. These projects demonstrate deep technical expertise and a commitment to solving complex problems with production-ready solutions.

All
AI Systems
Core Engineering
Applications

OmniRAG

An enterprise-grade, multi-modal RAG system that provides evidence-based answers from text, images, and video. Built with Google Vertex AI and Pinecone for auditable, high-stakes decision-making.

Rust LLM Inference

A high-performance LLM engine built in Rust to address the need for efficient on-device AI. Achieves 2.5x memory efficiency and 1.8x speedup over Python baselines.

CampusAgent

LangGraph-powered multi-agent RAG system for a university chatbot, architected for complex query decomposition and showcasing advanced agentic design patterns.

DOTA2 Draft Master

An end-to-end ML application that delivers real-time draft recommendations with <10ms latency. Built on 1M+ matches using PyTorch and ONNX; recognized by professional analytics firms.

Deep Fundamentals

Demonstrates a first-principles understanding of AI by building core technologies from scratch, including a PyTorch-compatible ML framework, compilers, and regex engines.

Publications & Thought Leadership

A recognized voice in the AI community, sharing knowledge through best-selling books, technical speaking engagements, and open-source contributions.

Authored Books

Conversational AI with Rasa

An enterprise-focused guide to building production-grade chatbots. Foreword by Alan Nichol, Co-founder & CTO of Rasa.

Book cover of Machine Learning Pipelines in Action (Chinese Edition), translated by Xiaoquan Kong

Machine Learning Pipelines in Action (CN)

Lead translator for the official Chinese edition of "Building Machine Learning Pipelines", recommended by the TensorFlow team.

Also authored "Practical Rasa" (CN) and co-authored "spaCy for Natural Language Processing" (CN), further establishing expertise in the NLP domain.

Community Leadership & Recognition

Google Developer Expert (GDE) in Machine Learning

Selected by Google since 2018 for exceptional technical expertise and community leadership.

Key Open Source Contributor

Contributor to cutting-edge agent frameworks including Google Agent Kit and Browser-Use.

TensorFlow Addons Codeowner

Served as a maintainer and core contributor to the official TensorFlow Addons library, a key part of the TF ecosystem.

Google Summer of Code Mentor

Selected by the TensorFlow team to mentor students for Google Summer of Code 2022.

Rasa Superhero & Core Contributor

Recognized as a "Superhero" for significant contributions to the Rasa framework and leadership in the Chinese-speaking community.

Selected Speaking Engagements

Presenter at global conferences including The Level 3 AI Assistant Conference, Google DevFests, and CSDN Creator's Night.

Contact

Feel free to reach out for consulting opportunities, speaking engagements, or technical collaborations.

xiaoquan.kong@duke.edu Connect on LinkedIn Follow on GitHub