Datawhale Open-Sources "Building Agents from Scratch": A Systematic Agent Development Tutorial
With the arrival of the "Year of the Agent," the technological focus is shifting from training large models to building agent applications. Datawhale's open-source project, "Building Agents from Scratch," is a systematic, practice-oriented tutorial. It deeply analyzes process-driven and AI-native agent architectures. Having garnered over 48,000 stars on GitHub for its high-quality content, it serves as an essential introductory guide for AI developers.
Published Snapshot
Source: Publish BaselineRepository: datawhalechina/hello-agents
Open RepoStars
48,335
Forks
5,814
Open Issues
116
Snapshot Time: 05/13/2026, 12:00 AM
Project Overview
If 2024 was the inaugural year of the "War of a Hundred Models," then 2025 has undoubtedly kicked off the "Year of the Agent." In this wave of technological evolution, the technical focus of the entire industry is undergoing a significant shift: from simply pursuing foundational Large Language Models (LLMs) with larger training parameters and stronger capabilities, to gradually focusing on how to utilize existing models to build smarter, more practical agent applications capable of autonomous execution. However, during this transition period, the developer community faces a major pain point—there is a severe shortage of systematic, practice-oriented tutorials on Agents in the current market. Most available materials are either too theoretical or merely fragmented code snippets.
To fill this gap, Datawhale, a well-known open-source learning community in China, launched the hello-agents project, titled "Building Agents from Scratch." This project aims to provide developers and AI enthusiasts with a comprehensive guide to building agent systems from the ground up, emphasizing both theory and practice. Thanks to its systematic course design and cutting-edge practical cases, the project quickly gained popularity on GitHub, becoming a highly anticipated open-source tutorial in the field of AI application development. The project not only outlines the development history of Agents but also clearly distinguishes the two mainstream factions of current Agent construction, providing developers with a clear learning path.
Core Capabilities and Applicable Boundaries
Core Capabilities:
- Dual-Track Architecture Analysis: The tutorial systematically breaks down the two major factions of current Agent construction. One faction is the software engineering-based, process-driven Agent (such as Dify, Coze, n8n), which uses LLMs as the backend for data processing; the other faction is the true AI-native Agent, where decision-making and execution are entirely driven by AI.
- Combination of Theory and Practice: It not only provides an in-depth analysis of the underlying principles of Agents but also includes rich Python coding exercises, helping learners write agents from scratch and understand their internal operating mechanisms.
- Convenient Multi-Channel Access: It offers a seamless online reading experience, supporting international access via GitHub Pages and a domestic acceleration mirror (hello-agents.datawhale.cc), allowing users to study anytime, anywhere without downloading.
Applicable Boundaries:
- Recommended Users: Engineers looking to transition from traditional software development to AI application development; university students and researchers interested in the application layer of large models; product managers who need a systematic understanding of Agent architectures.
- Non-Recommended Users: Enterprise users seeking out-of-the-box, commercialized Agent products; pure business personnel with absolutely no programming background (especially lacking Python basics) who do not intend to write code.
Insights and Inferences
Based on the objective facts above, the following inferences can be drawn:
First, the project has amassed over 48,000 Stars in less than a year (from its creation in September 2025 to May 2026). This astonishing growth rate directly reflects the developer community's extreme thirst for high-quality, structured knowledge on Agent development. As the marginal utility of large model capabilities diminishes, the explosion of the application layer is inevitable, and mastering Agent construction capabilities will become the core competitiveness of the next generation of developers.
Second, the clear distinction between "process-driven" and "AI-native" in the tutorial indicates that Agent engineering is maturing. This classification helps developers choose the appropriate technology stack based on actual business scenarios (whether they need highly deterministic enterprise-level processes or highly flexible exploratory tasks), avoiding blind trend-following.
Finally, Datawhale's driving force as an open-source community cannot be underestimated. Through a community co-creation model, the tutorial maintains a very high update frequency (the latest push was on May 11, 2026), thereby keeping pace with the rapid technological iterations in the AI field. However, the project currently lacks a clear open-source license, which may somewhat restrict its secondary distribution and modification in commercial training or internal enterprise use.
30-Minute Getting Started Guide
For developers encountering this project for the first time, you can quickly get into the learning state within 30 minutes through the following steps:
- Access the Online Tutorial (0-5 minutes):
No need to configure any local environment; directly access the domestic acceleration node via browser:
https://hello-agents.datawhale.cc. - Read Core Concepts (5-15 minutes): Go to the "Project Introduction" and "Quick Start" chapters. Focus on reading the theoretical comparison between the two major Agent construction factions (Software Engineering vs. AI-Native) to establish a macro understanding of agent architectures.
- Clone the Project Locally (15-20 minutes):
Open the terminal and execute the command to pull the project source code and practical code to your local machine:
git clone https://github.com/datawhalechina/hello-agents.git - Environment Preparation and Initial Exploration (20-30 minutes): Enter the project directory. It is recommended to use Conda or venv to create a new Python virtual environment. Browse the practical directory in the code repository, check the basic Python-based Agent implementation scripts, and understand how their dependency libraries (such as LangChain, OpenAI SDK, etc.) are called, preparing for subsequent in-depth practical coding.
Risks and Limitations
When learning and using this project to build Agents, developers should be aware of the following risks and limitations:
- Data Privacy and Compliance Risks: In the practical phase, building Agents usually requires calling third-party large model APIs (such as OpenAI, Qwen, etc.). Developers must strictly manage their API Keys to avoid hardcoding them in public code repositories. At the same time, be careful not to send sensitive personal data or enterprise confidential data as Prompts to external models to prevent violating data compliance requirements.
- Uncontrollable Cost Risks: When AI-native Agents execute complex tasks, they may undergo multiple iterations, reflections, and tool calls, which can lead to an exponential increase in Token consumption. Failing to set reasonable loop limits and budget alerts may result in exorbitant API call fees.
- Technology Iteration and Maintenance Risks: The Agent technology stack is still in a period of rapid evolution, with underlying frameworks and API interfaces changing frequently. Some code examples in the tutorial may become invalid over time, requiring learners to have a certain ability to troubleshoot independently and consult the latest official documentation.
- Open Source License Limitations: Currently, the project data card shows the License as "NOASSERTION" (not explicitly stated). This means that legally, all rights are reserved by default. Enterprise users face potential intellectual property legal risks when using it for commercial training or directly reusing its code to build commercial closed-source products. It is recommended to confirm authorization matters with the project maintainers before use.
Evidence Sources
- https://api.github.com/repos/datawhalechina/hello-agents (Retrieved: 2026-05-13)
- https://api.github.com/repos/datawhalechina/hello-agents/releases/latest (Retrieved: 2026-05-13)
- https://github.com/datawhalechina/hello-agents/blob/main/README.md (Retrieved: 2026-05-13)
- https://github.com/datawhalechina/hello-agents (Retrieved: 2026-05-13)