Tech & Gadgets

Show HN: Duplicate 3 layers in a 24B LLM, logical deduction .22→.76. No training

Ali Nemati3 days ago27 sec read4 views

A method to duplicate specific layers in large language models (LLMs) without retraining significantly improved logical deduction and other benchmarks by running the model's reasoning process multiple times through selected "reasoning circuits." This technique reveals that altering layer duplication can create different cognitive modes within the same model, offering content creators a new way to enhance AI performance for specialized tasks without additional training.

Read the full article at Hacker News

Want to create content about this topic? Use Nemati AI tools to generate articles, social posts, and more.

Comments

Build DeepSeek-V3: Multi-Head Latent Attention (MLA) Architecture

The Multihead Latent Attention (MLA) is an advanced attention mechanism designed to enhance efficiency in transformer models through compression/decompression of queries and key-values, LoRA-style low-rank projections for computational savings, and R...

Ali Nemati

AI & Machine Learning6 days ago22 sec read

Aligning Language Models from User Interactions

Researchers propose a method using self-distillation to improve language model performance by learning from multi-turn user interactions, enhancing alignment and instruction-following abilities without degrading other capabilities. This approach allo...

Ali Nemati

AI & Machine LearningMar 1523 sec read

Trump Supporters Getting Scammed by AI-Generated Foot Fetish Model

An Instagram model named Jessica Foster has gained over a million followers and a significant presence on OnlyFans, despite being entirely AI-generated. The account exploits political sentiment among Trump supporters, highlighting how advanced AI can...

Ali Nemati

AI & Machine LearningMar 1024 sec read

With its latest Phi-4 reasoning model, Microsoft reckons bigger isn't always better

Microsoft introduced Phi-4-Reasoning-Vision-15B, a multimodal model that challenges the trend of larger AI models by demonstrating strong reasoning capabilities with fewer parameters and less training data. This approach emphasizes efficient training...

Ali Nemati

AI & Machine LearningMar 626 sec read

Safety Mirage: How Spurious Correlations Undermine VLM Safety Fine-Tuning and Can Be Mitigated by Machine Unlearning

Researchers have identified a "safety mirage" issue in vision language models (VLMs) where supervised safety fine-tuning can inadvertently reinforce spurious correlations, making VLMs vulnerable to simple text modifications and overly cautious about ...

Ali Nemati

Show HN: Duplicate 3 layers in a 24B LLM, logical deduction .22→.76. No training

Related Articles

Build DeepSeek-V3: Multi-Head Latent Attention (MLA) Architecture

Aligning Language Models from User Interactions

Trump Supporters Getting Scammed by AI-Generated Foot Fetish Model

With its latest Phi-4 reasoning model, Microsoft reckons bigger isn't always better

Safety Mirage: How Spurious Correlations Undermine VLM Safety Fine-Tuning and Can Be Mitigated by Machine Unlearning