🔥 So much to do, so little time.

R PRO

juiceb0xc0de

22 17 82

https://github.com/JuiceB0xC0de

AI & ML interests

You can call me McDreamy M.D. I'll be your attending neural network brain surgeon.

Recent Activity

new activity 35 minutes ago

meta-llama/Llama-3.1-8B-Instruct:Llama 3.1 8B brain atlas

updated a dataset about 2 hours ago

juiceb0xc0de/qwen3-8b-atlas

new activity about 11 hours ago

Smilyai-labs/Nova-1-Standard-1.3B-Preview:Early training brain atlas: a progress snapshot

View all activity

Organizations

replied to mmhamdy's post 2 days ago

Haha no shit. I just finished writing an article on scaling ssm mamba style models and popped over to see what's new in posts. I guess there's a theme today.

posted an update 30 days ago

Post

235

😅 You ever fumble on a project? Please someone tell me I'm not alone. I fumbled at step one and remained oblivious for the remainder of the project. Funny story, I was under the assumption that Qwen/Qwen3-8B was the base model that the paired with the Qwen SAE released by Alibaba. I didn't realize there was a Qwen3-8b-Base model until after the 12 hours of independent mapping techniques I had applied to the model that was missing the -Base suffix. 🤗 My bad, I'm just a bartender. I should not be unsupervised.

Not all is lost however. The outcome was a very in depth neural network atlas complete with its own SQLite queryable database for the Qwen3-8B model I can now share with you all. The data base combines these methods for a full in depth dive:

- Neuron Taxonomy
- Category Separation Scoring
- Co-activation Analysis
- Per-Head Decomposition
- Component Comparison
- Attribution Patching
- Sparse Non-negative Matrix Factorization
- NeuronLens
- DAS SVD rotation
- Cross-layer Coherence
- SQLite database

So if you've ever wondered where a specific behaviour or ability lives in the hidden dimensions of Qwen-8B or perhaps wanted to make informed quantization decisions please enjoy the fruits of my ill-informed labour lol. 😂

juiceb0xc0de/qwen3-8b-atlas
Qwen/Qwen3-8B

replied to appvoid's post about 1 month ago

I applaud you in your journey into the void with small models. I too am deeply fascinated with the optimization of smaller models rather than asking for more parameters and terabytes of scraped internet data. I hope to see what you've come up with in a few weeks time.

I just finished designing a sparsity training scheduler that trains on average 35% of a models available weights with almost no hidden dimensions between transformers adjoined and zero throughput while randomizing trainable locations. It cuts VRAM and training time down and the models set higher benchmarks on mathematics than FFT models trained on the same corpus. I discovered this while fucking around for fun.

I don't doubt the discoveries to be made with training smaller architectures have many more surprises in store for us.

replied to their post about 1 month ago

@danielhanchen what happened to this magnificent model!? I had the perfect place to slot it in to my team of AI bros! I would love to see this back on HF. 🤗

reacted to kalyan-ks's post with 👀 about 1 month ago

Post

1633

LLM Guardrail Models are Less Robust Against Text Mutation Attacks

Blog post - https://huggingface.co/blog/kalyan-ks/llm-guardrail-models-less-robust

Evaluated the robustness of three LLM guardrail models (GLiGuard, LlamaGuard3 and MiniGuard).

Evaluation is done using 16 text mutation attacks over three datasets (AEGIS 2.0, WildGuard and ExpGuard).

Achieved average Unsafe ASR score of up to 33% and average Safe ASR score of up to 25% against GLiGuard model.

Achieved average Unsafe ASR score of up to 35% and average Safe ASR score of up to 17% against LlamaGuard3-8B model.

Achieved average Unsafe ASR score of up to 45% and average Safe ASR score of up to 15% against MiniGuard v0.1 model.

reacted to pankajpandey-dev's post with 👀 about 1 month ago

Post

695

🇮🇳 Just shipped: MiniCPM5-1B-Hindi-Instruct (+ GGUF quants)

First Hindi instruction-tuned fine-tune of OpenBMB's brand-new MiniCPM5-1B (released this week).

Trained with Unsloth + LoRA (r=32) on AI4Bharat's anudesh + dolly Hindi splits — ~4k high-quality examples, 2 epochs on a single T4 in 60 minutes.

🔗 Model (16-bit + LoRA adapter):
pankajpandey-dev/MiniCPM5-1B-Hindi-Instruct

📦 GGUF quants for llama.cpp / Ollama / LM Studio:
pankajpandey-dev/MiniCPM5-1B-Hindi-Instruct-v1-GGUF

5 quant levels — from Q3_K_M (~560 MB, runs on a Raspberry Pi) to Q8_0 (~1.2 GB, near-lossless). Q4_K_M is the recommended default.

Part of my ongoing 🇮🇳 Hindi LLM Series — bringing strong open-source LLMs to Indian languages.

#Hindi #IndicNLP #MiniCPM5 #LoRA #Unsloth #GGUF #llamacpp #Ollama #LocalLLM

reacted to ProCreations's post with 🤗 about 1 month ago

Post

618

I kind of forgot to post that I made my AI model Intellite version 1, but yea, it is here. ProCreations/intellite-500m-sft

It is tiny and not extremely trained making it prone to hallucination, please double check all information. I can't afford to train it more or increase model size, so if anyone somehow has access to compute and want's to contribute, let me know.

replied to PhysiQuanty's post about 1 month ago

Would you be looking for something like this?
https://huggingface.co/spaces/strangertoolshf/huggingface-user-stats

reacted to AbstractPhil's post with 🤗 about 1 month ago

Post

820

The transformer prototype v2 is operational, which takes the behavior of the H2 battery and directly forces a projected rigid behavior into a multiscale structure. Turns roughly 57k params to around 90k params for the preliminary version, and with this behavior the model converges SEMI-CLOSE to the SVAE current spectrum in considerably less epochs. So stay tuned on that one, the transformer did converge. The behavior itself is validated and convergent in the H2 protocol spectrum.

The transformer operates with the "single" setting.

AbstractPhil/geolip-svae-transformer

I've implanted a rigid formula that allows this direct behavior from the H2 battery to superimpose onto adjacent structural boundaries, and with that built aleph and void into the system as well. These are guarantees.

As for the centrifuge concept. The optimization on the centrifuge was quite lackluster. The hardware doesn't support such behavior. You can access the current operating version of the centrifuge by utilizing "stacked" configuration. Four lenses was too much when running a quaternion bank to handle such complex interactions reasonably, so I will need to work something out in the future to get a full centrifuge system working.

Crusher is ready, transformer_v3.

You might be curious WHY these converge at such low raw MSE in the later stages. The reasoning is kind of difficult to explain, so I'll try to make it simple. The direction is very subtle in the later stages of training with AdamW, so the curves start to create much more accurate shifts towards the goals. This allows the model to rapidly converge after earlier heavier training. You can't simply train it low, it takes too long. This allows the model to KIND OF get everything NEAR where it's supposed to be, which allows the really small twitches of MSE to provide massive corrections without needing hard logits or more difficult to finetune features.

9 replies

reacted to codelion's post with 😎 about 1 month ago

Post

3229

Inspired by the Nemotron Diffusion recipe, check out dhara-250m: a 250M experimental language model that supports three decoding modes from one set of weights: autoregressive, block-diffusion, and self-speculation.

It is small, easy to try, and meant for exploring diffusion-style decoding and latency tradeoffs in compact LMs.

Model: codelion/dhara-250m

Try the chat demo here: codelion/dhara-chat

3 replies

reacted to TravisMuhlestein's post with 😎 about 1 month ago

Post

2333

Interesting to see broader ecosystem momentum forming around open standards for agentic systems.

Feels like conversations are increasingly converging around the same operational requirements: identity, interoperability, governance, trust boundaries, orchestration, and coordination between agents, tools, and services.

As agents become more operational, these infrastructure layers seem increasingly important for making larger multi-agent ecosystems reliable outside controlled environments.

https://www.linuxfoundation.org/press/agentic-ai-foundation-adds-43-new-members-as-enterprise-and-government-adoption-of-open-agent-standards-accelerates

posted an update about 1 month ago

Post

201

What am i building now you ask? A Hugging Face Space that maps ML training components as a compatibility graph. You pick an optimizer, see what pairs with it, what breaks, and why (with cited sources).

Think skill tree meets training recipe builder. Helps you discover that you don't always need AdamW + Cosine. There's a whole ecosystem of combinations most people never try.

If you're new to ML or just stuck in a routine build yourself a new training suite. It could be a great decision! Or a waste on GPaaS I really can't say. I'm just a bartender, don't believe what I say most of the time.

juiceb0xc0de/forge

replied to their post about 1 month ago

That is exactly what I'm planning on doing!

replied to their post about 1 month ago

Update: I've completed the first 9 layers and will be taking a step back for a quick mo to adjust and update the auto trainer for finer resolution and other shit I have swimming around in my brain.

replied to Crownelius's post about 1 month ago

Yo every month this resets? Thank you for my new guilty pleasure. This playground feels like it was personally designed for my weird ass ideas. I'm about to get all KINDS of stupid up in here. You don't even know! 🤗

reacted to danielhanchen's post with 🤗 about 1 month ago

Post

2798

Qwen3.6 MTP is here! Run locally on 20GB RAM. ⚡️

MTP enables Qwen3.6 to generate ~1.4–2.2× faster with no accuracy change.

Qwen3.6-27B: unsloth/Qwen3.6-27B-MTP-GGUF
Qwen3.6-35B-A3B: unsloth/Qwen3.6-35B-A3B-MTP-GGUF
Guide: https://unsloth.ai/docs/models/qwen3.6#mtp-guide

2 replies

posted an update about 1 month ago

Post

239

Gemma-4-E2B SAE Atlas — Work in Progress

JumpReLU Sparse Autoencoders trained on every layer of Gemma-4-E2B-it using an adaptive Lagrangian controller. Training in progress. I'm publishing layers live as they come hot off the press for anyone interested in following along. I will be making further adjustments for finer resolution but the early data should be helpful I think? I'm just a bartender don't trust everything I say. 🤗 The Lagrangian math is pretty cool. It auto-steers the trainer taking the guess work out of hyperparameter adjustments.

Full paper and methodology when ever I get around to writing it up. There's a lot of work to be done. For now though, enjoy! 🤗

https://huggingface.co/juiceb0xc0de/gemma-4-e2b-saes

3 replies

posted an update about 1 month ago

Post

1508

Introducing the Gemma-4-E2B Brain Atlas, an interactive neural census of every layer, every head, 16 behavior categories in Google's flagship 2B model. We ran 184,320 probe prompts across 35 layers × 8 components and mapped what came back.

The Brain Atlas is an interactive tool that lets you explore the internal behavior of Google's Gemma-4-E2B model layer by layer, head by head. Pick a behavior category, pick a layer, and see exactly which components light up and which go quiet. The dataset is fully queryable if you want to go deeper.

The mapping combines multiple single-direction techniques run in parallel across every layer and component. Activation taxonomy (classifying each neuron by how broadly it fires across prompt categories), coactivation pair analysis (which neurons lock together and on what topics), F-stat behavioral separation (one-way ANOVA per feature across 16 behavior categories), per-head specificity scoring, and a full compliance probe pipeline using SVD, sparse decomposition, and variance analysis.

Here's what I found when I ran it.

The sharpest behavioral signal isn't at the output. It's Layer 0. Up projection hits F=22.7, nearly 2x anything in the final third of the network. The model does its behavioral sorting before it's barely started, then spends the next 34 layers… doing what exactly?

The gate has a lifecycle. 70% dormant at L1, highest in the model. Brutal sparsification at L23–26 (>58% silent). Then reopens. The final five layers are the most alive gates anywhere. The model's last act is a gate flare.
Layer 4 routes 5 projections to dim 448. One layer. One dimension. That's a topology highway.

Zero specialist neurons. Not one. 1.2M neurons analyzed. None fires exclusively on a single category. This model distributes everything.

🧠 Space: juiceb0xc0de/gemma-4-e2b-brain-atlas
📊 Dataset (1.3M rows, fully queryable): juiceb0xc0de/gemma-4-e2b-atlas

reacted to danielhanchen's post with 🔥 about 2 months ago

Post

5959

We’re excited to announce that Unsloth has joined the PyTorch Ecosystem! 🔥🦥

Unsloth is an open-source project that makes training & running models more accurate and faster with less compute. Our mission is to make local AI accessible to everyone. Thanks to all of you for making this possible! 💕

Blog: https://unsloth.ai/blog/pytorch
GitHub: https://github.com/unslothai/unsloth

2 replies

posted an update about 2 months ago

Post

181

I'm starting a new model line, Locus. These models aren't fine tuned, they de-tuned 🤗. What I mean by that is I remove a percentage of the corporate tuned speech patterns like "why this matters" "no fluff" "as a large language model". By reducing the RLHF based habitual patterns in model response I've had higher success rates in personality adoptability. I've fine tuned on the Locus models myself so you can chat with it post fine-tune or just trust me and try it yourself!

I don't aim to remove guard rails or the LLM identity entirely, what I want to do is dampen RLHF to a manageable volume. Personality models perform better with guardrails intact no different than humans with moral guidelines and boundaries. Refusals can help steer and mold personality. RLHF however drowns out adaptability so I'm cranking it down for you to crank your project up!

juiceb0xc0de/bella-bartender-gemma-e2b
juiceb0xc0de/locus-gemma-4-e2b

R PRO

AI & ML interests

Recent Activity

Organizations

juiceb0xc0de's activity