The post discusses benchmarks for various Qwen models on an M5 Max 128GB laptop using the mlx_lm tool. Models tested include Qwen3.5-122B-A10B-4bit and gpt-oss-120b-MXFP4-Q8, showing varying performance in terms of tokens-per-second and memory usage. Another post announces the release of an uncensored version of Qwen3.5-35B-A3B on Hugging Face, highlighting its aggressive uncensoring without capability loss claims, which prompts community discussion about evaluating KLD for substantiation and concerns over long context handling quality.
Read the full article at Latent Space
Want to create content about this topic? Use Nemati AI tools to generate articles, social posts, and more.
![[AINews] Replit Agent 4: The Knowledge Work Agent](https://nerdstudio-backend-bucket.s3.us-east-2.amazonaws.com/media/blog/images/articles/4372377f3ca9457a.webp)




