Syllabus¶
Overview¶
In recent years, natural language processing (NLP) research has undergone a massive transformation. The emergence of large language models (LLMs) has dramatically improved the ability to generate and understand text, revolutionizing various application domains such as translation, question answering, and summarization. In 2024-2025, multimodal LLMs like GPT-5 and Gemini 2.5 Pro that can simultaneously process text, images, and audio have emerged, further expanding the scope of applications. Particularly noteworthy is the emergence of new architectures beyond Transformer. For example, Mamba, a state space model (SSM), can efficiently process up to millions of tokens with linear O(n) complexity, while RWKV can process conversational messages at 10x or more lower cost than existing methods in real-time.
This course reflects these latest developments to provide hands-on deep learning-based NLP techniques. Students first learn core tool utilization methods such as PyTorch and Hugging Face usage, then directly experience fine-tuning of Transformer-based models and latest SSM architectures, prompt engineering, retrieval-augmented generation (RAG), reinforcement learning from human feedback (RLHF), and agent framework implementation. Additionally, we cover latest parameter-efficient fine-tuning (PEFT) techniques (WaveFT, DoRA, VB-LoRA, etc.) and advanced RAG architectures (HippoRAG, GraphRAG), and practice cutting-edge concepts such as multimodal LLMs and ultra-long context processing. Finally, through team projects, students integrate learned content to implement complete models and applications that solve real problems.
This course is designed for third-year undergraduate level and assumes completion of prerequisite course Language Models and Natural Language Processing (131107967A). Through team projects, students challenge real problem-solving using Korean corpora, and in the final project phase, we provide opportunities to work with industry datasets and receive feedback from industry experts, considering industry-academia collaboration.
Learning Objectives¶
-
Understand the role and limitations of large language models in modern NLP and utilize related tools such as PyTorch and Hugging Face.
-
Understand the principles and trade-offs of State Space Models (e.g., Mamba, RWKV) along with latest architectures.
-
Apply fine-tuning to pre-trained models or latest parameter-efficient fine-tuning methods like WaveFT, DoRA, VB-LoRA.
-
Learn methods to systematically optimize prompts using prompt engineering techniques and DSPy framework.
-
Understand the evolution of evaluation metrics (e.g., G-Eval, LiveCodeBench, etc.) and the importance of human evaluation, and learn latest alternatives to RLHF such as DPO (Direct Preference Optimization).
-
Design and implement advanced RAG (Retrieval-Augmented Generation) architectures like HippoRAG, GraphRAG and hybrid search strategies.
-
Understand AI regulatory frameworks like EU AI Act and acquire methodologies for implementing responsible AI systems.
-
Track latest research trends to discuss multimodal LLMs, small language models (SLM), state space models (SSM), multi-agent systems, mixture of experts (MoE), and other diverse latest technologies.
-
Understand the characteristics and challenges of Korean NLP and develop application capabilities through hands-on practice using Korean corpora.
-
Strengthen collaboration and practical problem-solving capabilities through team projects and gain project experience connected to industry.
Course Schedule¶
| Week | Main Topics and Keywords | Key Hands-on/Assignments |
|---|---|---|
| 1 | Transformer and Next-Generation Architectures • Self-Attention Mechanism and Limitations • Mamba (SSM), RWKV, Jamba |
Transformer Component Implementation Mamba vs Transformer Performance Comparison Architecture Complexity Analysis |
| 2 | PyTorch 2.x and Latest Deep Learning Frameworks • torch.compile Compiler Revolution • FlashAttention-3 Hardware Acceleration • AI Agent Frameworks |
torch.compile Performance Optimization FlashAttention-3 Implementation AI Agent Framework Comparison |
| 3 | Modern PEFT Techniques for Efficient Fine-tuning • LoRA, DoRA, QLoRA • Advanced PEFT Techniques |
PEFT Method Comparison Experiment LoRA/DoRA/QLoRA Performance Evaluation Memory Efficiency Analysis |
| 4 | Advanced Prompt Techniques and Optimization • Prompt Engineering Fundamentals • Self-Consistency, Tree of Thoughts • DSPy Framework |
DSPy-based Automatic Prompt Optimization Self-Consistency Implementation Tree of Thoughts Problem Solving |
| 5 | LLM Evaluation Paradigms and Benchmarks • Evaluation Paradigm Evolution • LLM-as-a-Judge (GPTScore, G-Eval, FLASK) • Specialized and Domain-specific Benchmarks |
G-Eval Implementation Benchmark Comparison Experiment Evaluation Bias Analysis |
| 6 | Multimodal NLP Advancements • Vision-Language Models (LLaVA, MiniGPT-4, Qwen-2.5-Omni) • Visual Reasoning (QVQ-Max) • Speech Integration |
Multimodal QA Application Development Vision-Language Model Comparison End-to-end Multimodal System |
| 7 | Ultra-Long Context Processing and Efficient Inference • Context Window Revolution (1M+ tokens) • Attention Mechanism Optimization • LongRoPE and RAG Integration |
FlashAttention-3 Integration Long Context Processing Comparison Performance Analysis |
| 8 | Core Review and Latest Trends • Architecture Review • Latest Model Trends (GPT-5, Gemini 2.5 Pro, Claude 4.1) • Industry Applications |
Comprehensive Review Model Comparison Industry Case Analysis |
| 9 | Advanced RAG Systems – HippoRAG, GraphRAG, Hybrid Search Strategies | Assignment 3: Building Korean Enterprise Search System based on GraphRAG |
| 10 | Innovation in Alignment Techniques – DPO, Constitutional AI, Process Reward Models | Comparison Practice between DPO and Existing RLHF Techniques |
| 11 | Production Agent Systems – CrewAI, Mirascope, Type-Safety Development | Multi-agent Orchestration Implementation |
| 12 | AI Regulation and Responsible AI – EU AI Act, Differential Privacy, Federated Learning | Assignment for Designing Regulation-Compliant AI Systems |
| 13 | Ontology and AI – Modeling Reality and Operating it with AI • Data Science to Decision Science • Semantic Ontology, GraphRAG • Kinetic Ontology, Closed-Loop Systems |
Semantic Ontology Modeling GraphRAG Implementation Closed-Loop Simulation |
| 14 | Final Project Development and MLOps | Team Prototype Implementation and Feedback Sessions (Industry Mentor Participation) |
| 15 | Final Project Presentations and Comprehensive Evaluation | Team Presentations, Course Content Summary and Future Prospects Discussion |
Weekly Educational Content¶
Week 1 – Transformer and Next-Generation Architectures¶
Core Topics¶
- Transformer Architecture: Self-attention mechanism, encoder-decoder structure, computational complexity \(O(N^2)\)
- Mamba Architecture: Selective State Space Model (SSM), linear time complexity \(O(N)\), hardware optimization through selective mechanisms
- RWKV Architecture: RNN-Transformer hybrid, parallel training capability, infinite context processing
- Jamba Architecture: Hybrid Transformer-Mamba with Mixture-of-Experts (MoE), long context window support, efficiency optimization
Hands-on/Activities¶
- Core Practice: Implement basic Transformer components (multi-head self-attention, positional encoding) and compare with Mamba's selective state space mechanisms
- Architecture Comparison: Analyze computational complexity and memory usage differences between Transformer (\(O(N^2)\)) and Mamba (\(O(N)\))
- Performance Evaluation: Benchmark different architectures on sequence modeling tasks, focusing on long-range dependency learning
Week 2 – PyTorch 2.x and Latest Deep Learning Frameworks¶
Core Topics¶
- PyTorch 2.x Revolution:
torch.compilecompiler revolution, TorchDynamo, AOTAutograd, PrimTorch, TorchInductor - FlashAttention-3: Hardware acceleration with tiling, TMA, WGMMA, FP8 support, ~2× speed improvement on H100 GPU
- Hugging Face Transformers Ecosystem: Model support, quantization, Zero-Build Kernels,
pipelineAPI - AI Agent Frameworks: LangGraph, CrewAI, LlamaIndex, Haystack, DSPy for building intelligent agent systems
Hands-on/Activities¶
- Core Practice: Implement
torch.compileperformance optimization and FlashAttention-3 integration - Framework Comparison: Compare different AI agent frameworks (LangGraph vs CrewAI vs DSPy) for specific use cases
- Performance Benchmarking: Measure speed improvements and memory efficiency gains from latest optimizations
Week 3 – Efficient Fine-tuning with Modern PEFT Techniques¶
Core Topics¶
- PEFT Fundamentals: Parameter-Efficient Fine-Tuning techniques that achieve 95%+ performance with <1% parameters
- LoRA (Low-Rank Adaptation): Decompose weight matrices into low-rank form, learn only small rank matrices
- DoRA (Weight-Decomposed LoRA): Adaptive fine-tuning through weight decomposition for fine-grained representation learning
- QLoRA: 4-bit quantization + LoRA, enabling 65B model fine-tuning on single 48GB GPU
- Advanced PEFT: NF4 quantization, double quantization, VB-LoRA, QR-Adaptor techniques
Hands-on/Activities¶
- Core Practice: Implement LoRA, DoRA, and QLoRA fine-tuning on Korean sentiment analysis dataset
- Performance Comparison: Compare memory usage, training speed, and final performance across different PEFT methods
- Efficiency Analysis: Measure parameter reduction ratios and performance retention rates
Week 4 – Advanced Prompting Techniques and Optimization¶
Core Topics¶
- Prompt Engineering Fundamentals: Role prompting, structured prompting, few-shot vs zero-shot techniques
- Self-Consistency: Multiple solution path exploration for improved accuracy (+17% improvement on GSM8K)
- Tree of Thoughts: Deliberate problem solving through thought expansion (24 game success rate 9%→74%)
- DSPy Framework: Declarative Self-Improving Python, Signature, Module, Optimizer for automated prompt optimization
- Automated Prompt Engineering: APE, OPRO techniques for algorithmic prompt optimization
Hands-on/Activities¶
- Core Practice: Implement DSPy-based automatic prompt optimization pipeline
- Technique Comparison: Compare Self-Consistency, Tree of Thoughts, and automated prompt engineering approaches
- Performance Evaluation: Measure accuracy improvements across different prompting strategies on reasoning tasks
Week 5 – LLM Evaluation Paradigms and Benchmarks¶
Core Topics¶
- Evaluation Paradigm Evolution: Traditional metrics (BLEU/ROUGE) vs meaning-based evaluation (BERTScore/BLEURT) vs LLM-as-a-Judge
- LLM-as-a-Judge: GPTScore, G-Eval, FLASK frameworks for automated evaluation using LLMs
- Specialized Purpose Benchmarks: LiveCodeBench, EvalPlus, HELM-Code, MMLU-Pro, GPQA, BBH
- Domain-Specific Benchmarks: FinBen, AgentHarm, LEXam, CSEDB, MATH, GSM8K
- Evaluation Bias and Limitations: Narcissistic bias, verbosity bias, inconsistency, differences from human evaluation
Hands-on/Activities¶
- Core Practice: Implement G-Eval and other LLM-based evaluation techniques
- Benchmark Comparison: Compare traditional metrics (BLEU/ROUGE) with LLM-as-a-Judge approaches on identical responses
- Bias Analysis: Analyze evaluation biases and limitations in different evaluation paradigms
Week 6 – Multimodal NLP Advancements¶
Core Topics¶
- Multimodal Integration: Text, image, audio, and video processing in unified models
- Vision-Language Models: LLaVA, MiniGPT-4, Qwen-2.5-Omni for comprehensive multimodal understanding
- Visual Reasoning: QVQ-Max specialized for visual reasoning and logical context understanding
- Speech Integration: Voxtral for speech recognition, Orpheus for zero-shot speaker synthesis
- Real-time Multimodal Streaming: Streaming input/output capabilities in multimodal LLMs
Hands-on/Activities¶
- Core Practice: Implement multimodal QA application with image, text, and audio input
- Model Comparison: Compare different vision-language models (LLaVA vs MiniGPT-4 vs Qwen-2.5-Omni)
- Integration Challenge: Build end-to-end multimodal system with voice input, image analysis, and text generation
Week 7 – Ultra-Long Context Processing and Efficient Inference¶
Core Topics¶
- Context Window Revolution: From kilobytes to megabytes - quantitative leap in context processing capabilities
- 2025 Flagship Models: GPT-5, Gemini 2.5 Pro (1M tokens), Claude Sonnet 4 (1M tokens), Llama 4 (10M tokens), LTM-2-Mini (100M tokens)
- Attention Mechanism Optimization: FlashAttention I/O bottleneck optimization, Linear Attention approximation, Ring Attention distributed processing
- Positional Encoding Extension: LongRoPE for extending context windows beyond 2M tokens with minimal fine-tuning
- RAG vs Ultra-Long Context: Integration paradigms, HippoRAG as long-term memory system
Hands-on/Activities¶
- Core Practice: Implement FlashAttention-3 integration and LongRoPE context extension
- RAG vs Long Context: Compare RAG-based summarization with ultra-long context LLMs on long documents
- Performance Analysis: Measure cost, latency, and accuracy trade-offs in long context processing
Week 8 – Core Review and Latest Trends¶
Core Topics¶
- Architecture Review: Transformer vs SSM architectures, computational complexity analysis, performance trade-offs
- Optimization Techniques: FlashAttention optimization, PEFT methods (LoRA, DoRA, QLoRA), efficiency improvements
- Advanced Techniques: Prompt engineering, LLM evaluation paradigms, multimodal integration, long context processing
- Latest Model Trends: GPT-5, Gemini 2.5 Pro, Claude 4.1, Qwen 2.5 series - comprehensive model comparison
- Industry Applications: Medical, legal, financial field applications, real-world deployment considerations
Hands-on/Activities¶
- Core Practice: Comprehensive review of key concepts through hands-on reinforcement
- Model Comparison: Compare latest models across different dimensions (performance, cost, capabilities)
- Industry Case Analysis: Analyze real-world applications and deployment strategies
Week 9 – Advanced RAG Architectures¶
Core Topics¶
- Next-generation Retrieval-Augmented Generation: Structures of advanced RAG systems that integrate large-scale knowledge to improve response accuracy
- Main Content:
- HippoRAG: RAG that mimics human hippocampus operation principles to reduce vector DB storage space by 25% and enhance long-term memory (persistent memory strengthening in information networks)
- GraphRAG: Improve query response precision to 99% by explicitly modeling associations between contexts using knowledge graphs
- Hybrid search: Multi-strategy search combining latest dense embedding techniques (NV-Embed-v2, etc.) and sparse search techniques (SPLADE) and graph exploration to secure both accuracy and speed in large-scale knowledge bases
- Production Case Studies: Analyze large-scale RAG system architectures that maintain P95 response latency within 100ms while processing tens of millions of tokens daily
Hands-on/Assignment¶
- Assignment 3: Build Korean enterprise search system based on GraphRAG. Create Q&A RAG system for given in-house wiki/document database and evaluate search accuracy and response speed
Week 10 – Innovation in Alignment Techniques¶
Core Topics¶
- LLM Output Control Techniques Emerging After RLHF: New techniques for improving usefulness and safety of LLMs
- Various Approaches:
- DPO (Direct Preference Optimization): Method that directly learns user preferences without separate reward models (simplified pipeline compared to RLHF)
- Constitutional AI: Technique that suppresses harmful content generation by AI self-correcting responses according to about 75 constitutional principles (applied to Anthropic Claude models)
- Process Supervision: Reward model technique that gives granular feedback on problem-solving process (Chain-of-Thought) rather than final answer quality to strengthen correct reasoning process
- RLAIF (RL from AI Feedback): Approach where AI evaluates AI while learning using AI evaluators instead of humans (mimicking human-level evaluation)
- Open-source Implementation Trends: Public implementations such as TRL (Transformer Reinforcement Learning) library and OpenRLHF project have emerged, allowing anyone to experiment with latest alignment techniques (3-4× training speed improvement compared to existing DeepSpeed-Chat)
Hands-on/Activities¶
- Key Hands-on: Compare and evaluate responses of models fine-tuned with DPO and existing RLHF for identical prompts/instructions. (Comparison in aspects such as safety, content quality)
Week 11 – Production Agent Systems¶
Core Topics¶
- Agent Frameworks and Multi-agent Systems: Technology that utilizes LLMs as multiple entities rather than single QA bots to handle complex tasks
- Main Content:
- CrewAI: Role-based multi-agent collaboration framework – Assign different specialized roles to multiple LLMs to perform team-like problem solving
- Mirascope: Agent development tool ensuring type-safety – Strictly manage format and type of prompt I/O through Pydantic data validation
- Haystack Agents: Open-source agent framework specialized for document RAG pipelines – Easily configure search-comprehension chains to implement domain knowledge specialized agents
- Low-code integration platforms: Environment where Flowise AI, LangFlow, n8n, etc. can design prompt workflows and visually integrate various tools through GUI
- Toolformer and LLM Internal Tool Usage: Approaches that internalize external tool usage capabilities in LLMs themselves: Train by inserting API call signals beforehand so models decide to use tools such as calculators or search at necessary moments during responses
Hands-on/Activities¶
- Key Hands-on: Implement automated customer service system prototype using multi-agent frameworks. For example, have one agent handle FAQ Q&A, another agent handle database queries or ticket generation to practice orchestration that handles users' complex demands through collaboration
Week 12 – AI Regulation and Responsible AI¶
Core Topics¶
- AI Governance and Ethical Issues: Learn impact on industry and developer compliance requirements of world's first comprehensive AI legislation including EU AI Act implemented in August 2024
- Privacy and Safety Enhancement Technologies: Methodologies for responsible and regulation-compliant LLM service deployment:
- Differential privacy: Prevent personal information exposure by introducing Differential Privacy to text embeddings, etc.
- Federated Learning: Utilize frameworks for collaborative learning locally so user data doesn't gather at central servers
- Homomorphic encryption learning: Protect sensitive information by performing model training with data itself encrypted
- Industry-specific Regulation Response Cases: Domain-specific NLP solution design cases such as HIPAA-compliant chatbots in healthcare, GDPR response examples in finance, FERPA-compliant tutor AI in education
Hands-on/Assignment¶
- Assignment: Write suitable LLM service design for given scenarios according to EU AI Act and other related regulations. Create checklist of measures to take from model development to deployment and present regulatory compliance by team
Week 13 – Ontology and AI: Modeling Reality and Operating it with AI¶
Core Topics¶
- Paradigm Shift: From Data Science to Decision Science:
- The "Data-Rich, Decision-Poor" problem and the "last mile" gap
- Limitations of Data Science (DS): Remaining at prediction and insight as "dashboard" builders
- Goal of Decision Science (DSci): "Pilots" who prescribe optimal actions and create business impact
- The need to convert expert "Tacit Knowledge" into "Explicit Models" that AI can understand
- "Ontology-First" strategy: Modeling the semantics and logic of reality before data collection
- Modeling Reality: Semantic Ontology (Semantic Layer):
- Semantic Layer: A "Digital Twin" that reflects an organization's real world
- Three core components of semantic ontology: Object Types, Properties, Link Types
- Semantic Digital Twin: Modeling that integrates "meaning" and "context" beyond simple data replication
- Root cause of LLM hallucinations: Limitations of "flat models" and the need for explicit semantics
- Integrating AI: Grounding and GraphRAG:
- Two faces of AI: Symbolic AI (logical reasoning) vs Statistical AI (LLM, statistical prediction)
- Neuro-Symbolic AI: Complementary combination of both approaches
- Three-step "Grounding" governance: Data grounding (input control), Logic grounding (processing control), Action grounding (output control)
- GraphRAG: Beyond standard RAG to knowledge graph-based multi-hop reasoning, improving precision by up to 35%
- Operating Reality: Kinetic Ontology (Kinetic Layer):
- Kinetic Ontology: Explicitly modeling "Verbs" (Actions) of reality in addition to semantic ontology ("Nouns")
- "Writeback": Mechanism that reflects AI decisions into actual operational systems
- Difference between analytical and operational systems: Automating the "last mile" through Writeback
- "Closed-Loop" decision-making: Complete automation cycle of Read-Decide-Write-Feedback-Learn
- AI Operating System: Enterprise-wide AI platform integrating semantic and kinetic layers
Hands-on/Activities¶
- Core Practice: Semantic ontology modeling exercise - Define object types, properties, and link types for a given business domain (e.g., university hospital, manufacturing) and create an ontology schema
- GraphRAG Implementation: Build a RAG system using knowledge graphs - Implement a hybrid search system combining vector search and graph traversal
- Closed-Loop Simulation: Implement a simple decision-making system prototype connecting semantic layer (read) and kinetic layer (write)
Week 14 – Final Project Development and MLOps¶
Core Topics¶
- Survey of Latest Research Results: Examine currently published latest models and techniques while discussing future directions in rapidly changing NLP field
- Main Topics:
- Development of ultra-large multimodal LLMs: Analyze innovative features of cutting-edge models such as GPT-5, Claude 4.1 Opus, Qwen 2.5 Omni, QVQ-Max. For example, GPT-5 shows performance exceeding GPT-4 in reasoning ability and context expansion, and Claude 4.1 strengthens response consistency and safety by applying constitutional AI principles. Qwen 2.5 Omni and QVQ-Max pioneer new frontiers in multimodal visual-language reasoning, demonstrating ability to simultaneously perform image interpretation and complex reasoning.
- Renaissance of small language models: Also cover advances of lightweight small models (SLM). Gemma 3 (1B-4B scale) series are attracting attention as ultra-lightweight LLMs optimized to work smoothly on consumer devices, and Mistral NeMo 12B shows specialized performance such as supporting 128K token long context windows through NVIDIA NeMo optimization. Cases like MathΣtral 7B specialized for specific areas (mathematics) achieving results comparable to GPT-4 are also introduced. These small models are being researched as alternatives to large models in terms of specialization and lightweighting.
- Evolution of reasoning capabilities: Examine new attempts by LLMs for complex problem solving. Long CoT reasons with very long Chain-of-Thought and performs backtracking and error correction when necessary, and PAL (Program-Aided LM) improves numerical calculation or logical reasoning accuracy by combining code execution capabilities. ReAct is a strategy that generates more accurate and factual answers by utilizing external tools (calculators, web search, etc.) during reasoning. Additionally, introduce Thinking Mode concept – For example, Qwen series significantly improve performance in complex math·code problems by enabling internal self-reasoning steps in models through enablethinking mode. Also cover cutting-edge approaches like Meta's _Toolformer that embed tool usage capabilities in models during pre-training so models call external APIs at necessary moments during responses to solve problems.
- Deployment and optimization frameworks: Tools for efficiently deploying LLMs in actual service environments are also advancing. For example, llama.cpp enabled execution of large models on CPU with single-file C++ implementation, and MLC-LLM supports LLM inference on mobile/browsers using WebGPU. PowerInfer-2 is a framework that maximizes power efficiency for large model distributed inference, contributing to operational cost reduction.
Hands-on/Activities¶
- Student latest paper presentations: Review and present latest NLP papers selected by groups and discuss significance, limitations, and application possibilities of the research. For example, by selecting and discussing papers on new benchmarks (MMMU, HLE, etc.) or latest model techniques mentioned above, comprehensively organize latest technology trends (Industry mentors or invited researchers participate in feedback)
Week 15 – MLOps and Industry Application Case Analysis¶
Core Topics¶
- NLP Model MLOps Concepts: Introduce model version management strategies, A/B testing techniques, deployment pipeline design, etc. Also cover methods for building online learning pipelines that continuously reflect user feedback in learning, real-time monitoring and performance drift detection systems
- Industry Application Case Analysis: Conclude the course by analyzing industry cases where latest technologies are applied and sharing final results of team projects
- Industry-specific NLP Success Cases: Introduce latest application cases of LLM and NLP technologies in each field such as healthcare, finance, and education. For example, in healthcare, cases where clinical record automation NLP reduced doctor documentation burden from 49% to 27%, in finance, cases where Morgan Stanley's contract analysis bot introduction saved 360,000 hours annually, in education, cases where customized tutor AI with multilingual support improved learning efficiency and increased student engagement by 30%. Through these cases, understand practical impact of latest NLP technologies
- Course Comprehensive Discussion: Finally, comprehensively organize content covered in the course and conduct free discussion. Students reflect on learning content from week 1 to week 15 and share opinions about most impressive technologies or topics they want to study more. Faculty present future prospects (e.g., expected developments after GPT-5, direction of AI-human collaboration, etc.) and advise students to track and utilize latest NLP trends afterwards (Collect course feedback through surveys)
Hands-on/Activities¶
- Course comprehensive discussion: Overall summary of course content and Q&A, future prospects brainstorming (student feedback collection and future learning guidance)
References (Selected Latest Papers and Materials)¶
Latest Architectures and Models¶
- Gu & Dao (2023), Mamba: Linear-Time Sequence Modeling with Selective State Spaces.
- Peng et al. (2023), RWKV: Reinventing RNNs for the Transformer Era.
- Lieber et al. (2024), Jamba: A Hybrid Transformer-Mamba Language Model.
- (Multimodal LLM) OpenAI (2025), GPT-4 Technical Report (Augmentations for GPT-5 Preview).
- Anthropic (2025), Claude 4.1 Opus System Card.
Parameter-Efficient Fine-tuning¶
- Zhang et al. (2024), WaveFT: Wavelet-based Parameter-Efficient Fine-Tuning.
- Liu et al. (2024), DoRA: Weight-Decomposed Low-Rank Adaptation.
- Chen et al. (2024), VB-LoRA: Vector Bank for Efficient Multi-Task Adaptation.
- Dettmers et al. (2023), QLoRA: Efficient Finetuning of Quantized LLMs.
Prompt Engineering and Evaluation¶
- Khattab et al. (2023), DSPy: Compiling Declarative Language Model Calls.
- Zhou et al. (2023), Self-Consistency for Chain-of-Thought.
- Yao et al. (2023), Tree of Thoughts: Deliberate Problem Solving with Large Language Models.
- Liu et al. (2023), G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment.
- Jain et al. (2024), LiveCodeBench: Holistic and Contamination-Free Code Evaluation.
Knowledge Integration and RAG¶
- Zhang et al. (2024), HippoRAG: Neurobiologically Inspired Long-Term Memory for LLMs.
- Edge et al. (2024), GraphRAG: A Modular Graph-Based RAG Approach.
- Chen et al. (2024), Hybrid Retrieval-Augmented Generation: Best Practices.
Alignment and Responsible AI¶
- Rafailov et al. (2023), Direct Preference Optimization: Your Language Model is Secretly a Reward Model.
- Bai et al. (2022), Constitutional AI: Harmlessness from AI Feedback.
- OpenAI (2024), SWE-bench Verified: Real-world Software Engineering Benchmark.
- Phan et al. (2025), Humanity's Last Exam: The Ultimate Multimodal Benchmark at the Frontier of Knowledge.
- EU Commission (2024), EU AI Act: Implementation Guidelines.
Industry Applications and MLOps¶
- Healthcare NLP Market Report 2024–2028 (Markets&Markets).
- Financial Services AI Applications 2025 (McKinsey Global Institute).
- State of AI in Education 2025 (Stanford HAI).
- Cremer & Liu (2025), PowerInfer-2: Energy-Efficient LLM Inference at Scale.
- Development Tools: CrewAI Documentation – Multi-agent Scenario Implementation Guide
- DSPy Official Guide – Prompt DSL Usage Guide
- OpenRLHF Project – Open-source RLHF Implementation