Heretic: A Fully Automated Censorship Removal Tool for Large Language Models Based on Directional Ablation
Heretic is a fully automated censorship removal tool for large language models. By combining directional ablation techniques with an Optuna-based parameter optimizer, it automatically eliminates the safety alignment restrictions of Transformer models without expensive post-training. Highly popular in the open-source community, this project provides researchers and developers with a novel, low-cost engineering solution to obtain unrestricted models.
Published Snapshot
Source: Publish BaselineRepository: p-e-w/heretic
Open RepoStars
13,744
Forks
1,399
Open Issues
82
Snapshot Time: 03/15/2026, 12:00 AM
Project Overview
Heretic (https://github.com/p-e-w/heretic) is a fully automated censorship removal tool for Large Language Models (LLMs). Against the backdrop of current open-source large models generally incorporating strict "Safety Alignment" through techniques like RLHF or DPO, there is a growing demand among developers and researchers to acquire unrestricted models. Traditional "uncensoring" methods usually rely on building specific datasets and conducting expensive post-fine-tuning training. Heretic takes a different approach by combining cutting-edge Directional Ablation (also known as Abliteration) technology with an Optuna-based TPE parameter optimizer, enabling the process of removing model censorship mechanisms to be completed fully automatically without retraining. Since its release, the project has rapidly accumulated significant attention, becoming a highly controversial yet technically valuable open-source engineering effort in the AI community.
Core Capabilities and Applicability Boundaries
Core Capabilities: The core of Heretic lies in transforming academic "Abliteration" research (e.g., Arditi et al., 2024) into a usable automated pipeline. Driven by an Optuna-based TPE (Tree-structured Parzen Estimator) optimizer, it automatically searches for and eliminates direction vectors representing "refusal to answer" features within the model's residual stream. By synergistically minimizing the number of refusals and the loss of the model's core capabilities, the tool automatically finds high-quality ablation parameters without any human intervention.
Applicability Boundaries: This tool is specifically designed for large language models based on the Transformer architecture.
- Target Audience: Suitable for AI researchers needing to study internal model representation mechanisms, local LLM enthusiasts pursuing ultimate freedom, and developers needing to build specific vertical domain applications (such as cybersecurity attack and defense drills, or unfiltered creative writing).
- Non-Target Audience: Not recommended for enterprise-level production environments that must strictly adhere to AI safety compliance standards, nor is it suitable for beginners lacking basic knowledge of deep learning model structures.
Perspectives and Inferences
Judging by the project's astonishing growth rate of over 13,700 stars in less than half a year, it can be inferred that there is widespread resistance in the open-source community against the excessive "alignment" (the so-called "over-refusal" phenomenon) of current mainstream models. The emergence of Heretic accurately fills the gap between theoretical research and democratized tools.
Transforming directional ablation into a hyperparameter optimization problem (with the help of Optuna) is an extremely clever engineering decision. This not only significantly lowers the computational threshold but also means that as optimization algorithms iterate in the future, the ablation effects will continue to improve, and the degradation of the model's general capabilities will become increasingly minimal.
It is expected that this tool will have a significant disruptive impact on the open-source ecosystem on Hugging Face: the traditional artisanal process of relying on full or LoRA fine-tuning to create "Uncensored" models is highly likely to be replaced by this automated pipeline based on direct weight intervention, thereby spawning a large number of low-cost, uncensored derivative models.
30-Minute Getting Started Guide
- Environment Preparation: Ensure your local environment has a CUDA-supported GPU and install Python 3.10 or higher.
- Get the Code: Clone the repository using the command
git clone https://github.com/p-e-w/heretic.gitand navigate to the project root directory. - Install Dependencies: Execute
pip install -r requirements.txtto ensure core dependencies like PyTorch, Transformers, and Optuna are installed. - Prepare the Target Model: Select a Transformer-based target model on Hugging Face (e.g., the Instruct versions of the Llama 3 or Qwen series) and download it locally.
- Execute Automated Ablation: Run Heretic's main script, specify the model path, and start the optimization process. The tool will automatically initiate Optuna's Trial search to evaluate model performance under different ablation parameters.
- Export and Test: After the optimization process ends, the tool will output the modified model weight files. Users can directly load these weights using vLLM or llama.cpp for inference testing to verify whether the safety censorship mechanism has been successfully removed.
Risks and Limitations
- Data Privacy and Content Risks: After removing safety alignment, the model will no longer refuse to generate harmful, biased, or illegal content. This requires users to run it in a fully controlled local environment and avoid exposing it directly to public network users; otherwise, it may trigger serious social and ethical issues.
- Compliance Risks: Using this tool to modify and republish certain commercial open-source models may violate their original Acceptable Use Policy (AUP). Enterprise users must strictly evaluate legal risks.
- Cost and Computational Limitations: Although much cheaper than full fine-tuning, the TPE-based parameter search process still requires multiple forward propagation evaluations on the model. For models with ultra-large parameters (e.g., 70B+), considerable VRAM and computation time are still required.
- Maintenance and License Limitations: The project adopts the AGPL-3.0 license, which is a highly infectious open-source license. Any commercial product that integrates Heretic as a backend service and provides network access to the public must open-source its related code, which greatly limits its direct application in closed-source commercial projects.
Evidence Sources
- GitHub Repository Info: https://api.github.com/repos/p-e-w/heretic (Accessed: 2026-03-15)
- GitHub Latest Release: https://api.github.com/repos/p-e-w/heretic/releases/latest (Accessed: 2026-03-15)
- GitHub README: https://github.com/p-e-w/heretic/blob/master/README.md (Accessed: 2026-03-15)
- Project Homepage: https://github.com/p-e-w/heretic (Accessed: 2026-03-15)