jundot/omlx — LLM inference server with continuous batching & SSD caching for Apple Silicon —

Ali NematiFeb 2132 sec read30 views

7 stars | 0 forks | Python

LLM inference server with continuous batching & SSD caching for Apple Silicon — managed from the macOS menu bar

What it does

oMLX is a powerful LLM inference server designed for Apple Silicon, enabling continuous batching and SSD caching directly from the macOS menu bar. It simplifies the management of local LLMs, making them practical for real coding tasks.

Why it matters: Transform your Mac into an efficient LLM inference server with oMLX — seamless, powerful, and user-friendly!

View on GitHub

Want to create content about this repo? Use Nemati AI tools to generate articles, tutorials, and social posts.

Comments

datawhalechina/hello-agents — 📚 《从零开始构建智能体》——从零开始的智能体原理与实践教程

21,870 stars | 2,510 forks | Python 📚 《从零开始构建智能体》——从零开始的智能体原理与实践教程 What it does Hello-Agents 是一个全面的教程，旨在帮助开发者从零开始构建基于AI的智能体系统。它涵盖了理论知识和实践技能，使学习者能够理解并...21,870 stars | 2,510 forks | Python 📚 《从零开始构建智能体》——从零开始的智能体原理与实践教程 What it does Hello-Agents 是一个全面的教程，旨在帮助开发者从零开始构建基于AI的智能体系统。它涵盖了理论知识和实践技能，使学习者能够理解并创建真正的 AI Native Agent。 Why it matters: 探索未来技术趋势？从零开始构建你的智能体系统！🚀 #HelloAgents #AI View on GitHub Trend...

Ali Nemati

GitHub TrendingFeb 2130 sec read

chu2bard/polyroute — Multi-provider LLM request router with fallback and cost tracking

18 stars | 0 forks | Python Multi-provider LLM request router with fallback and cost tracking What it does Polyroute is a versatile request router for...18 stars | 0 forks | Python Multi-provider LLM request router with fallback and cost tracking What it does Polyroute is a versatile request router for large language models (LLMs) that seamlessly integrates multiple providers like OpenAI and Anthropi...

Ali Nemati

GitHub TrendingFeb 2127 sec read

chu2bard/ctxpack — Context window compression and management utilities

9 stars | 0 forks | Python Context window compression and management utilities What it does ctxpack provides utilities for compressing and managing co...9 stars | 0 forks | Python Context window compression and management utilities What it does ctxpack provides utilities for compressing and managing context windows in LLM conversations, optimizing message handling for better performance. This is cruc...

Ali Nemati

GitHub TrendingFeb 2126 sec read

ariannamethod/molecule — molecule.ai

6 stars | 1 forks | Python molecule.ai What it does Molecule is a dependency-free, continually-learning GPT organism that utilizes a hybrid attention ...6 stars | 1 forks | Python molecule.ai What it does Molecule is a dependency-free, continually-learning GPT organism that utilizes a hybrid attention mechanism and advanced features like vector autograd and SQLite memory. It offers a unique approach ...

Ali Nemati

GitHub TrendingFeb 2135 sec read

Shubhamsaboo/awesome-llm-apps — Star Shubhamsaboo / awesome-llm-apps

96,269 stars | 13,980 forks | Python Star Shubhamsaboo / awesome-llm-apps Collection of awesome LLM apps with AI Agents and RAG using OpenA...96,269 stars | 13,980 forks | Python Star Shubhamsaboo / awesome-llm-apps Collection of awesome LLM apps with AI Agents and RAG using OpenAI, Anthropic, Gemini and opensource models. What it does The 'awesome-llm-apps' repository is a cu...

Ali Nemati

jundot/omlx — LLM inference server with continuous batching & SSD caching for Apple Silicon —

What it does

Related Articles

datawhalechina/hello-agents — 📚 《从零开始构建智能体》——从零开始的智能体原理与实践教程

chu2bard/polyroute — Multi-provider LLM request router with fallback and cost tracking

chu2bard/ctxpack — Context window compression and management utilities

ariannamethod/molecule — molecule.ai

Shubhamsaboo/awesome-llm-apps — Star Shubhamsaboo / awesome-llm-apps