Skip to content

Models

We found 155 models
📌
Moonshot AI logokimi-k2.6
Text GenerationMoonshot AIHosted

Kimi K2.6 is a frontier-scale open-source 1T parameter model with a 262.1k context window, multi-turn tool calling, vision inputs, and structured outputs for agentic workloads.

  • Function calling
  • Reasoning
  • Vision
📌
Zhipu AI logoglm-4.7-flash
Text GenerationZhipu AIHosted

GLM-4.7-Flash is a fast and efficient multilingual text generation model with a 131,072 token context window. Optimized for dialogue, instruction-following, and multi-turn tool calling across 100+ languages.

  • Function calling
  • Reasoning
📌
OpenAI logogpt-oss-120b
Text GenerationOpenAIHosted

OpenAI's open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases – gpt-oss-120b is for production, general purpose, high reasoning use-cases.

  • Function calling
  • Reasoning
📌
Meta logollama-4-scout-17b-16e-instruct
Text GenerationMetaHosted

Meta's Llama 4 Scout is a 17 billion parameter model with 16 experts that is natively multimodal. These models leverage a mixture-of-experts architecture to offer industry-leading performance in text and image understanding.

  • Batch
  • Function calling
  • Vision
Anthropic logoclaude-opus-4.8
Text GenerationAnthropicProxied

Claude Opus 4.8 is Anthropic's most capable generally available model, with a step-change improvement in agentic coding over Claude Opus 4.7. It uses adaptive thinking to calibrate reasoning per task and supports a one million token context window at standard pricing.

    Black Forest Labs logoflux-2-pro-preview
    Text-to-ImageBlack Forest LabsProxied

    FLUX.2 [pro] Preview is Black Forest Labs' recommended default for production image generation and editing — tracks the latest [pro] weights with strong multi-reference support.

      Black Forest Labs logoflux-2-max
      Text-to-ImageBlack Forest LabsProxied

      FLUX.2 [max] is Black Forest Labs' highest-quality image model — top editing consistency, strongest prompt following, and grounding search for visualizations of real-time information.

        Black Forest Labs logoflux-2-flex
        Text-to-ImageBlack Forest LabsProxied

        FLUX.2 [flex] is Black Forest Labs' fine-grained control variant of FLUX.2 — exposes tunable inference steps, guidance, and prompt upsampling for typography-heavy and production workflows.

          xAI logogrok-imagine-video
          Text-to-VideoxAIProxied

          xAI's video generation model. Generates, edits, and extends videos from text and image inputs with native synchronized audio including dialogue, sound effects, and music. Supports multiple creative modes (normal, fun, custom).

            xAI logogrok-imagine-image-quality
            Text-to-ImagexAIProxied

            xAI's higher-fidelity text-to-image model optimized for sharper details, more accurate compositions, and stronger text rendering. Supports image editing via reference images and masks. Trades speed for quality compared to grok-imagine-image. Default output at 2k resolution.

              xAI logogrok-4.3
              Text GenerationxAIProxied

              xAI's Grok 4.3 model with a 1M-token context window and strong agentic tool calling with minimal hallucinations. Accepts text and image inputs, and supports function calling, structured outputs, and configurable reasoning effort (none, low, medium, high).

                xAI logogrok-imagine-image
                Text-to-ImagexAIProxied

                xAI's Grok Imagine image model. Generates and edits images from text and reference-image inputs with configurable aspect ratio and resolution.

                  xAI logogrok-4.20-multi-agent-0309
                  Text GenerationxAIProxied

                  xAI's Grok 4.20 multi-agent model with a 2M-token context window. Multiple agents collaborate in parallel to perform deep research tasks, with function calling, structured outputs, and reasoning capabilities.

                    xAI logogrok-4.20-0309-non-reasoning
                    Text GenerationxAIProxied

                    xAI's Grok 4.20 non-reasoning model. Skips the thinking trace for fast, single-pass responses while keeping the same training as the reasoning variant.

                      xAI logogrok-4.20-0309-reasoning
                      Text GenerationxAIProxied

                      xAI's Grok 4.20 reasoning model. Uses extended thinking to work through complex problems, returning a reasoning trace alongside the final answer.

                        Vidu logoq3-pro
                        Text-to-VideoViduProxied

                        Vidu Q3 Pro is a high-quality video generation model supporting text-to-video, image-to-video, and start/end-frame-to-video workflows with audio and up to 16-second clips.

                          Vidu logoq3-turbo
                          Text-to-VideoViduProxied

                          Vidu Q3 Turbo is a faster version of Vidu Q3 optimized for lower latency video generation while maintaining audio support and up to 16-second clips.

                            RunwayML logogen-4.5
                            Text-to-VideoRunwayMLProxied

                            RunwayML's video generation model supporting both text-to-video and image-to-video with customizable duration, aspect ratio, and content moderation controls.

                              Recraft logorecraftv4-vector
                              Text-to-ImageRecraftProxied

                              Generate production-ready SVG vector graphics from text prompts with clean geometry, structured layers, and editable paths.

                                Recraft logorecraftv4-pro
                                Text-to-ImageRecraftProxied

                                Recraft V4 Pro generates high-resolution, art-directed images at 2048px+ with strong composition, text rendering, and design taste. Built for print and production work.

                                  Recraft logorecraftv4-pro-vector
                                  Text-to-ImageRecraftProxied

                                  Generate detailed, production-ready SVG vector graphics from text prompts with fine geometry, scalable to any size for print and design work.

                                    Recraft logorecraftv4
                                    Text-to-ImageRecraftProxied

                                    Recraft V4 generates art-directed images with strong composition, accurate text rendering, and design taste built in. Fast and cost-efficient at standard resolution.

                                      PixVerse logov6
                                      Text-to-VideoPixVerseProxied

                                      Pixverse v6 is the latest Pixverse video model with support for up to 15-second videos, customizable duration from 1 to 15 seconds, and audio generation.

                                        Recraft logorecraftv3
                                        Text-to-ImageRecraftProxied

                                        Recraft V3 is the previous-generation text-to-image model from Recraft, well-suited to design-quality compositions, brand-aware imagery, and accurate text rendering.

                                          PixVerse logov5.6
                                          Text-to-VideoPixVerseProxied

                                          Pixverse v5.6 is a video generation model supporting text-to-video and image-to-video with audio generation, customizable aspect ratios, and up to 1080p output.

                                            OpenAI logotts-1-hd
                                            Text-to-SpeechOpenAIProxied

                                            OpenAI's high-definition text-to-speech model producing higher quality audio output.

                                              OpenAI logotts-1
                                              Text-to-SpeechOpenAIProxied

                                              OpenAI's text-to-speech model optimized for real-time use with low latency.

                                                OpenAI logoo4-mini
                                                Text GenerationOpenAIProxied

                                                OpenAI's fast, lightweight reasoning model optimized for multi-step problem solving at lower cost.

                                                  OpenAI logogpt-image-2
                                                  Text-to-ImageOpenAIProxied

                                                  OpenAI's next-generation image model that creates and edits images from text prompts, with support for multiple quality levels, sizes, and output formats. Note: transparent backgrounds are not supported — use openai/gpt-image-1.5 for transparent PNGs.

                                                    OpenAI logogpt-5.5-pro
                                                    Text GenerationOpenAIProxied

                                                    GPT-5.5 pro uses OpenAI's Responses API with built-in tools, improved reasoning, and stateful context management.

                                                      OpenAI logogpt-image-1.5
                                                      Text-to-ImageOpenAIProxied

                                                      OpenAI's image generation model that creates and edits images from text prompts, supporting multiple quality levels and output sizes.

                                                        OpenAI logogpt-5.5
                                                        Text GenerationOpenAIProxied

                                                        GPT-5.5 is OpenAI's flagship model with strong coding, reasoning, and multimodal capabilities.

                                                          OpenAI logogpt-5.4-nano
                                                          Text GenerationOpenAIProxied

                                                          GPT-5.4 nano is OpenAI's smallest and fastest model, optimized for edge and low-latency use cases.

                                                            OpenAI logogpt-5.4-pro
                                                            Text GenerationOpenAIProxied

                                                            GPT-5.4 pro uses OpenAI's Responses API with built-in tools, improved reasoning, and stateful context management.

                                                              OpenAI logogpt-5.4
                                                              Text GenerationOpenAIProxied

                                                              GPT-5.4 is OpenAI's flagship model with strong coding, reasoning, and multimodal capabilities.

                                                                OpenAI logogpt-5.4-mini
                                                                Text GenerationOpenAIProxied

                                                                GPT-5.4 mini is a smaller, faster, and more cost-efficient version of GPT-5.4 for lightweight tasks.

                                                                  OpenAI logogpt-5
                                                                  Text GenerationOpenAIProxied

                                                                  OpenAI's model excelling at coding, writing, and reasoning.

                                                                    OpenAI logogpt-4o-transcribe
                                                                    Automatic Speech RecognitionOpenAIProxied

                                                                    A speech-to-text model that uses GPT-4o to transcribe audio with improved word error rate and better language recognition compared to original Whisper models.

                                                                      OpenAI logogpt-4.1-mini
                                                                      Text GenerationOpenAIProxied

                                                                      Fast, affordable version of GPT-4.1 with a million-token context window.

                                                                        OpenAI logogpt-4.1
                                                                        Text GenerationOpenAIProxied

                                                                        OpenAI's flagship GPT model for complex tasks with a million-token context window.

                                                                          MiniMax logospeech-2.8-turbo
                                                                          Text-to-SpeechMiniMaxProxied

                                                                          MiniMax Speech 2.8 Turbo turns text into natural, expressive speech with voice cloning, emotion control, and 40+ language support at faster speeds.

                                                                            MiniMax logomusic-2.6
                                                                            Music GenerationMiniMaxProxied

                                                                            MiniMax's music generation model that creates full-length songs with vocals from text prompts and lyrics, or instrumental tracks. Supports BPM/key control and auto-generated lyrics.

                                                                              MiniMax logospeech-2.8-hd
                                                                              Text-to-SpeechMiniMaxProxied

                                                                              MiniMax Speech 2.8 HD focuses on studio-grade audio generation with emotion control, multilingual support (40+ languages), and voice cloning.

                                                                                MiniMax logom2.7
                                                                                Text GenerationMiniMaxProxied

                                                                                MiniMax's M2.7 language model with multilingual capabilities.

                                                                                  MiniMax logohailuo-2.3-fast
                                                                                  Text-to-VideoMiniMaxProxied

                                                                                  A lower-latency version of Hailuo 2.3 that preserves core motion quality, visual consistency, and stylization while enabling faster iteration.

                                                                                    Inworld logotts-2
                                                                                    Text-to-SpeechInworldProxied

                                                                                    Inworld's most powerful and expressive text-to-speech model. Builds on TTS 1.5 with rich expressive speech, real-time latency, natural language steering (e.g. [whisper], [say excitedly]), and stronger multilingual support across 15 production languages plus 90+ experimental languages.

                                                                                      MiniMax logohailuo-2.3
                                                                                      Text-to-VideoMiniMaxProxied

                                                                                      A high-fidelity video generation model optimized for realistic human motion, cinematic VFX, expressive characters, and strong prompt and style adherence across text-to-video and image-to-video workflows.

                                                                                        Inworld logotts-1.5-max
                                                                                        Text-to-SpeechInworldProxied

                                                                                        Highest-quality text-to-speech with under 200ms latency, emotion control, and 15-language support.

                                                                                          Inworld logotts-1.5-mini
                                                                                          Text-to-SpeechInworldProxied

                                                                                          Ultra-fast, cost-efficient text-to-speech with approximately 120ms latency and 15-language support.

                                                                                            Google logoveo-3.1-fast
                                                                                            Text-to-VideoGoogleProxied

                                                                                            A faster version of Veo 3.1 optimized for lower latency while maintaining high-quality video and audio output.

                                                                                              Google logoveo-3-fast
                                                                                              Text-to-VideoGoogleProxied

                                                                                              A faster version of Veo 3 optimized for lower latency video generation with audio support.

                                                                                                Google logoveo-3.1
                                                                                                Text-to-VideoGoogleProxied

                                                                                                Google's latest video generation model with improved quality, motion, and audio generation.

                                                                                                  Google logonano-banana-pro
                                                                                                  Text-to-ImageGoogleProxied

                                                                                                  Google's higher-quality image generation model with improved detail and prompt adherence.

                                                                                                    Google logoveo-3
                                                                                                    Text-to-VideoGoogleProxied

                                                                                                    Google's video generation model capable of producing high-quality videos with optional audio from text prompts.

                                                                                                      Google logonano-banana
                                                                                                      Text-to-ImageGoogleProxied

                                                                                                      Google's fast image generation model producing high-quality images from text prompts.

                                                                                                        Google logonano-banana-2
                                                                                                        Text-to-ImageGoogleProxied

                                                                                                        Google's second-generation image generation model with improved quality and speed.

                                                                                                          Google logogemini-3.1-pro
                                                                                                          Text GenerationGoogleProxied

                                                                                                          Google's most intelligent Gemini model with improved reasoning, a medium thinking level, and a 1M token context window.

                                                                                                            Google logoimagen-4
                                                                                                            Text-to-ImageGoogleProxied

                                                                                                            Google's latest image generation model producing high-quality, photorealistic images from text prompts with support for multiple aspect ratios.

                                                                                                              Google logogemini-3.1-flash-tts
                                                                                                              Text-to-SpeechGoogleProxied

                                                                                                                Google logogemini-3-flash
                                                                                                                Text GenerationGoogleProxied

                                                                                                                Gemini 3 Flash is Google's fast multimodal model with frontier intelligence, superior search, and grounding capabilities.

                                                                                                                  Google logogemini-3.1-flash-lite
                                                                                                                  Text GenerationGoogleProxied

                                                                                                                  Google's lightest and most cost-efficient Gemini model for high-throughput tasks.

                                                                                                                    Google logogemini-2.5-flash-lite
                                                                                                                    Text GenerationGoogleProxied

                                                                                                                    Google's lightest and most cost-efficient Gemini 2.5 model for high-throughput tasks.

                                                                                                                      Google logogemini-2.5-pro
                                                                                                                      Text GenerationGoogleProxied

                                                                                                                      Google's most capable Gemini 2.5 model with strong reasoning, thinking support, and a 1M token context window.

                                                                                                                        Google logogemini-2.5-flash
                                                                                                                        Text GenerationGoogleProxied

                                                                                                                        Google's fast multimodal Gemini 2.5 model with strong reasoning and a 1M token context window.

                                                                                                                          ByteDance logoseedream-4.5
                                                                                                                          Text-to-ImageByteDanceProxied

                                                                                                                          Seedream 4.5 builds on 4.0 with multi-reference image support, batch generation, and sequential image generation.

                                                                                                                            ByteDance logoseedream-5-lite
                                                                                                                            Text-to-ImageByteDanceProxied

                                                                                                                            Seedream 5 Lite is a lighter, faster version of the Seedream 5 family with multi-reference and batch generation support.

                                                                                                                              ByteDance logoseedream-4.0
                                                                                                                              Text-to-ImageByteDanceProxied

                                                                                                                              Seedream 4.0 is ByteDance's image creation model that combines text-to-image generation and image editing into a single architecture, offering fast, high-resolution output up to 4K.

                                                                                                                                ByteDance logoseedance-2.0-fast
                                                                                                                                Text-to-VideoByteDanceProxied

                                                                                                                                Faster variant of ByteDance's Seedance 2.0 video model. Trades some quality for speed while sharing the same multimodal architecture. Supports text-to-video, image-to-video, native audio generation, multimodal references (images, videos, audio), video editing, and video extension.

                                                                                                                                  AssemblyAI logouniversal-3-pro
                                                                                                                                  Automatic Speech RecognitionAssemblyAIProxied

                                                                                                                                  AssemblyAI's Universal 3 Pro speech recognition model for high-accuracy transcription.

                                                                                                                                    ByteDance logoseedance-2.0
                                                                                                                                    Text-to-VideoByteDanceProxied

                                                                                                                                    ByteDance's next-generation video model with a unified multimodal architecture. Generates high-quality video with synchronized audio from text, images, video clips, and audio inputs. Supports multimodal references (up to 9 images, 3 videos, 3 audio files), native audio generation, video editing, video extension, intelligent duration, and adaptive aspect ratio.

                                                                                                                                      Anthropic logoclaude-sonnet-4.5
                                                                                                                                      Text GenerationAnthropicProxied

                                                                                                                                      Claude Sonnet 4.5 is the best coding model to date, with significant improvements across the entire development lifecycle.

                                                                                                                                        Anthropic logoclaude-sonnet-4.6
                                                                                                                                        Text GenerationAnthropicProxied

                                                                                                                                        Claude Sonnet 4.6 is Anthropic's latest balanced model offering strong coding, reasoning, and agentic capabilities with improved instruction following.

                                                                                                                                          Anthropic logoclaude-sonnet-4
                                                                                                                                          Text GenerationAnthropicProxied

                                                                                                                                          Claude Sonnet 4 delivers superior coding and reasoning while responding more precisely to instructions, a significant upgrade over previous versions.

                                                                                                                                            Anthropic logoclaude-opus-4.6
                                                                                                                                            Text GenerationAnthropicProxied

                                                                                                                                            Claude Opus 4.6 is Anthropic's flagship language model built for complex, multi-step work in coding, financial analysis, and legal reasoning. It uses extended thinking to work through complex problems carefully and features a one million token context window.

                                                                                                                                              Anthropic logoclaude-opus-4.7
                                                                                                                                              Text GenerationAnthropicProxied

                                                                                                                                              Claude Opus 4.7 is Anthropic's most capable generally available model, with a step-change improvement in agentic coding over Claude Opus 4.6. It uses adaptive thinking to calibrate reasoning per task and supports a one million token context window at standard pricing.

                                                                                                                                                Alibaba logowan-2.6-image
                                                                                                                                                Text-to-ImageAlibabaProxied

                                                                                                                                                Alibaba's Wan 2.6 text-to-image model generating images from text prompts with optional negative prompts and customizable dimensions.

                                                                                                                                                  Anthropic logoclaude-haiku-4.5
                                                                                                                                                  Text GenerationAnthropicProxied

                                                                                                                                                  Claude Haiku 4.5 delivers similar levels of coding performance at one-third the cost and more than twice the speed of larger models.

                                                                                                                                                    Alibaba logoqwen3-max
                                                                                                                                                    Text GenerationAlibabaProxied

                                                                                                                                                    Alibaba's Qwen 3 Max is a large language model with strong coding, reasoning, and multilingual capabilities, served via DashScope's OpenAI-compatible endpoint.

                                                                                                                                                      Alibaba logoqwen3.5-397b-a17b
                                                                                                                                                      Text GenerationAlibabaProxied

                                                                                                                                                      Alibaba's Qwen 3.5 is a 397B-parameter mixture-of-experts model with 17B active parameters, offering strong reasoning capabilities with efficient inference.

                                                                                                                                                        Alibaba logohh1-t2v
                                                                                                                                                        Text-to-VideoAlibabaProxied

                                                                                                                                                        Alibaba's HappyHorse 1.0 text-to-video model. Generates videos from a text prompt with configurable resolution, aspect ratio, and duration (3-15s).

                                                                                                                                                          Alibaba logohh1-i2v
                                                                                                                                                          Image-to-VideoAlibabaProxied

                                                                                                                                                          Alibaba's HappyHorse 1.0 image-to-video model. Animates a reference image with an optional text prompt. Supports 720P and 1080P output with durations from 3 to 15 seconds.

                                                                                                                                                            Google logogemma-4-26b-a4b-it
                                                                                                                                                            Text GenerationGoogleHosted

                                                                                                                                                            Gemma 4 is Google's most intelligent family of open models, built from Gemini 3 research to maximize intelligence-per-parameter.

                                                                                                                                                            • Function calling
                                                                                                                                                            • Reasoning
                                                                                                                                                            • Vision
                                                                                                                                                            NVIDIA logonemotron-3-120b-a12b
                                                                                                                                                            Text GenerationNVIDIAHosted

                                                                                                                                                            NVIDIA Nemotron 3 Super is a hybrid MoE model with leading accuracy for multi-agent applications and specialized agentic AI systems.

                                                                                                                                                            • Function calling
                                                                                                                                                            • Reasoning
                                                                                                                                                            Moonshot AI logokimi-k2.5
                                                                                                                                                            Text GenerationMoonshot AIHosted

                                                                                                                                                            Kimi K2.5 is a frontier-scale open-source model with a 256k context window, multi-turn tool calling, vision inputs, and structured outputs for agentic workloads.

                                                                                                                                                            • Function calling
                                                                                                                                                            • Planned deprecation
                                                                                                                                                            • Reasoning
                                                                                                                                                            • Vision
                                                                                                                                                            Black Forest Labs logoflux-2-klein-9b
                                                                                                                                                            Text-to-ImageBlack Forest LabsHosted

                                                                                                                                                            FLUX.2 [klein] 9B is an ultra-fast, distilled image model with enhanced quality. It unifies image generation and editing in a single model, delivering state-of-the-art quality enabling interactive workflows, real-time previews, and latency-critical applications.

                                                                                                                                                            • Partner
                                                                                                                                                            Black Forest Labs logoflux-2-klein-4b
                                                                                                                                                            Text-to-ImageBlack Forest LabsHosted

                                                                                                                                                            FLUX.2 [klein] is an ultra-fast, distilled image model. It unifies image generation and editing in a single model, delivering state-of-the-art quality enabling interactive workflows, real-time previews, and latency-critical applications.

                                                                                                                                                            • Partner
                                                                                                                                                            Black Forest Labs logoflux-2-dev
                                                                                                                                                            Text-to-ImageBlack Forest LabsHosted

                                                                                                                                                            FLUX.2 [dev] is an image model from Black Forest Labs where you can generate highly realistic and detailed images, with multi-reference support.

                                                                                                                                                            • Partner
                                                                                                                                                            Deepgram logoaura-2-es
                                                                                                                                                            Text-to-SpeechDeepgramHosted

                                                                                                                                                            Aura-2 is a context-aware text-to-speech (TTS) model that applies natural pacing, expressiveness, and fillers based on the context of the provided text. The quality of your text input directly impacts the naturalness of the audio output.

                                                                                                                                                            • Batch
                                                                                                                                                            • Partner
                                                                                                                                                            • Real-time
                                                                                                                                                            Deepgram logoaura-2-en
                                                                                                                                                            Text-to-SpeechDeepgramHosted

                                                                                                                                                            Aura-2 is a context-aware text-to-speech (TTS) model that applies natural pacing, expressiveness, and fillers based on the context of the provided text. The quality of your text input directly impacts the naturalness of the audio output.

                                                                                                                                                            • Batch
                                                                                                                                                            • Partner
                                                                                                                                                            • Real-time
                                                                                                                                                            IBM logogranite-4.0-h-micro
                                                                                                                                                            Text GenerationIBMHosted

                                                                                                                                                            Granite 4.0 instruct models deliver strong performance across benchmarks, achieving industry-leading results in key agentic tasks like instruction following and function calling. These efficiencies make the models well-suited for a wide range of use cases like retrieval-augmented generation (RAG), multi-agent workflows, and edge deployments.

                                                                                                                                                            • Function calling
                                                                                                                                                            Deepgram logoflux
                                                                                                                                                            Automatic Speech RecognitionDeepgramHosted

                                                                                                                                                            Flux is the first conversational speech recognition model built specifically for voice agents.

                                                                                                                                                            • Partner
                                                                                                                                                            • Real-time
                                                                                                                                                            p
                                                                                                                                                            plamo-embedding-1b
                                                                                                                                                            Text EmbeddingspfnetHosted

                                                                                                                                                            PLaMo-Embedding-1B is a Japanese text embedding model developed by Preferred Networks, Inc. It can convert Japanese text input into numerical vectors and can be used for a wide range of applications, including information retrieval, text classification, and clustering.

                                                                                                                                                              a
                                                                                                                                                              gemma-sea-lion-v4-27b-it
                                                                                                                                                              Text GenerationaisingaporeHosted

                                                                                                                                                              SEA-LION stands for Southeast Asian Languages In One Network, which is a collection of Large Language Models (LLMs) which have been pretrained and instruct-tuned for the Southeast Asia (SEA) region.

                                                                                                                                                                a
                                                                                                                                                                indictrans2-en-indic-1B
                                                                                                                                                                Translationai4bharatHosted

                                                                                                                                                                IndicTrans2 is the first open-source transformer-based multilingual NMT model that supports high-quality translations across all the 22 scheduled Indic languages

                                                                                                                                                                  Google logoembeddinggemma-300m
                                                                                                                                                                  Text EmbeddingsGoogleHosted

                                                                                                                                                                  EmbeddingGemma is a 300M parameter, state-of-the-art for its size, open embedding model from Google, built from Gemma 3 (with T5Gemma initialization) and the same research and technology used to create Gemini models. EmbeddingGemma produces vector representations of text, making it well-suited for search and retrieval tasks, including classification, clustering, and semantic similarity search. This model was trained with data in 100+ spoken languages.

                                                                                                                                                                    Deepgram logoaura-1
                                                                                                                                                                    Text-to-SpeechDeepgramHosted

                                                                                                                                                                    Aura is a context-aware text-to-speech (TTS) model that applies natural pacing, expressiveness, and fillers based on the context of the provided text. The quality of your text input directly impacts the naturalness of the audio output.

                                                                                                                                                                    • Batch
                                                                                                                                                                    • Partner
                                                                                                                                                                    • Real-time
                                                                                                                                                                    Leonardo logolucid-origin
                                                                                                                                                                    Text-to-ImageLeonardoHosted

                                                                                                                                                                    Lucid Origin from Leonardo.AI is their most adaptable and prompt-responsive model to date. Whether you're generating images with sharp graphic design, stunning full-HD renders, or highly specific creative direction, it adheres closely to your prompts, renders text with accuracy, and supports a wide array of visual styles and aesthetics – from stylized concept art to crisp product mockups.

                                                                                                                                                                    • Partner
                                                                                                                                                                    Leonardo logophoenix-1.0
                                                                                                                                                                    Text-to-ImageLeonardoHosted

                                                                                                                                                                    Phoenix 1.0 is a model by Leonardo.Ai that generates images with exceptional prompt adherence and coherent text.

                                                                                                                                                                    • Partner
                                                                                                                                                                    OpenAI logogpt-oss-20b
                                                                                                                                                                    Text GenerationOpenAIHosted

                                                                                                                                                                    OpenAI's open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases – gpt-oss-20b is for lower latency, and local or specialized use-cases.

                                                                                                                                                                    • Function calling
                                                                                                                                                                    • Reasoning
                                                                                                                                                                    Pipecat logosmart-turn-v2
                                                                                                                                                                    Voice Activity DetectionPipecatHosted

                                                                                                                                                                    An open source, community-driven, native audio turn detection model in 2nd version

                                                                                                                                                                    • Batch
                                                                                                                                                                    • Real-time
                                                                                                                                                                    Qwen logoqwen3-embedding-0.6b
                                                                                                                                                                    Text EmbeddingsQwenHosted

                                                                                                                                                                    The Qwen3 Embedding model series is the latest proprietary model of the Qwen family, specifically designed for text embedding and ranking tasks.

                                                                                                                                                                      Deepgram logonova-3
                                                                                                                                                                      Automatic Speech RecognitionDeepgramHosted

                                                                                                                                                                      Transcribe audio using Deepgram’s speech-to-text model

                                                                                                                                                                      • Batch
                                                                                                                                                                      • Partner
                                                                                                                                                                      • Real-time
                                                                                                                                                                      Qwen logoqwen3-30b-a3b-fp8
                                                                                                                                                                      Text GenerationQwenHosted

                                                                                                                                                                      Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support.

                                                                                                                                                                      • Batch
                                                                                                                                                                      • Function calling
                                                                                                                                                                      • Reasoning
                                                                                                                                                                      Google logogemma-3-12b-it
                                                                                                                                                                      Text GenerationGoogleHosted

                                                                                                                                                                      Gemma 3 models are well-suited for a variety of text generation and image understanding tasks, including question answering, summarization, and reasoning. Gemma 3 models are multimodal, handling text and image input and generating text output, with a large, 128K context window, multilingual support in over 140 languages, and is available in more sizes than previous versions.

                                                                                                                                                                      • LoRA
                                                                                                                                                                      • Planned deprecation
                                                                                                                                                                      MistralAI logomistral-small-3.1-24b-instruct
                                                                                                                                                                      Text GenerationMistralAIHosted

                                                                                                                                                                      Building upon Mistral Small 3 (2501), Mistral Small 3.1 (2503) adds state-of-the-art vision understanding and enhances long context capabilities up to 128k tokens without compromising text performance. With 24 billion parameters, this model achieves top-tier capabilities in both text and vision tasks.

                                                                                                                                                                      • Function calling
                                                                                                                                                                      Qwen logoqwq-32b
                                                                                                                                                                      Text GenerationQwenHosted

                                                                                                                                                                      QwQ is the reasoning model of the Qwen series. Compared with conventional instruction-tuned models, QwQ, which is capable of thinking and reasoning, can achieve significantly enhanced performance in downstream tasks, especially hard problems. QwQ-32B is the medium-sized reasoning model, which is capable of achieving competitive performance against state-of-the-art reasoning models, e.g., DeepSeek-R1, o1-mini.

                                                                                                                                                                      • LoRA
                                                                                                                                                                      • Reasoning
                                                                                                                                                                      Qwen logoqwen2.5-coder-32b-instruct
                                                                                                                                                                      Text GenerationQwenHosted

                                                                                                                                                                      Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). As of now, Qwen2.5-Coder has covered six mainstream model sizes, 0.5, 1.5, 3, 7, 14, 32 billion parameters, to meet the needs of different developers. Qwen2.5-Coder brings the following improvements upon CodeQwen1.5:

                                                                                                                                                                      • LoRA
                                                                                                                                                                      BAAI logobge-reranker-base
                                                                                                                                                                      Text ClassificationBAAIHosted

                                                                                                                                                                      Different from embedding model, reranker uses question and document as input and directly output similarity instead of embedding. You can get a relevance score by inputting query and passage to the reranker. And the score can be mapped to a float value in [0,1] by sigmoid function.

                                                                                                                                                                        Meta logollama-guard-3-8b
                                                                                                                                                                        Text GenerationMetaHosted

                                                                                                                                                                        Llama Guard 3 is a Llama-3.1-8B pretrained model, fine-tuned for content safety classification. Similar to previous versions, it can be used to classify content in both LLM inputs (prompt classification) and in LLM responses (response classification). It acts as an LLM – it generates text in its output that indicates whether a given prompt or response is safe or unsafe, and if unsafe, it also lists the content categories violated.

                                                                                                                                                                        • LoRA
                                                                                                                                                                        DeepSeek logodeepseek-r1-distill-qwen-32b
                                                                                                                                                                        Text GenerationDeepSeekHosted

                                                                                                                                                                        DeepSeek-R1-Distill-Qwen-32B is a model distilled from DeepSeek-R1 based on Qwen2.5. It outperforms OpenAI-o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.

                                                                                                                                                                        • Reasoning
                                                                                                                                                                        Meta logollama-3.3-70b-instruct-fp8-fast
                                                                                                                                                                        Text GenerationMetaHosted

                                                                                                                                                                        Llama 3.3 70B quantized to fp8 precision, optimized to be faster.

                                                                                                                                                                        • Batch
                                                                                                                                                                        • Function calling
                                                                                                                                                                        Meta logollama-3.2-1b-instruct
                                                                                                                                                                        Text GenerationMetaHosted

                                                                                                                                                                        The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks.

                                                                                                                                                                          Meta logollama-3.2-3b-instruct
                                                                                                                                                                          Text GenerationMetaHosted

                                                                                                                                                                          The Llama 3.2 instruction-tuned text only models are optimized for multilingual dialogue use cases, including agentic retrieval and summarization tasks.

                                                                                                                                                                            Meta logollama-3.2-11b-vision-instruct
                                                                                                                                                                            Text GenerationMetaHosted

                                                                                                                                                                            The Llama 3.2-Vision instruction-tuned models are optimized for visual recognition, image reasoning, captioning, and answering general questions about an image.

                                                                                                                                                                            • LoRA
                                                                                                                                                                            • Vision
                                                                                                                                                                            Black Forest Labs logoflux-1-schnell
                                                                                                                                                                            Text-to-ImageBlack Forest LabsHosted

                                                                                                                                                                            FLUX.1 [schnell] is a 12 billion parameter rectified flow transformer capable of generating images from text descriptions.

                                                                                                                                                                              Meta logollama-3.1-8b-instruct-awq
                                                                                                                                                                              Text GenerationMetaHosted

                                                                                                                                                                              Quantized (int4) generative text model with 8 billion parameters from Meta.

                                                                                                                                                                              • Planned deprecation
                                                                                                                                                                              Meta logollama-3.1-8b-instruct-fp8
                                                                                                                                                                              Text GenerationMetaHosted

                                                                                                                                                                              Llama 3.1 8B quantized to FP8 precision

                                                                                                                                                                                MyShell logomelotts
                                                                                                                                                                                Text-to-SpeechMyShellHosted

                                                                                                                                                                                MeloTTS is a high-quality multi-lingual text-to-speech library by MyShell.ai.

                                                                                                                                                                                  Meta logollama-3.1-8b-instruct
                                                                                                                                                                                  Text GenerationMetaHosted

                                                                                                                                                                                  The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models. The Llama 3.1 instruction tuned text only models are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.

                                                                                                                                                                                  • Planned deprecation
                                                                                                                                                                                  BAAI logobge-m3
                                                                                                                                                                                  Text EmbeddingsBAAIHosted

                                                                                                                                                                                  Multi-Functionality, Multi-Linguality, and Multi-Granularity embeddings model.

                                                                                                                                                                                    Meta logometa-llama-3-8b-instruct
                                                                                                                                                                                    Text GenerationMetaHosted

                                                                                                                                                                                    Generation over generation, Meta Llama 3 demonstrates state-of-the-art performance on a wide range of industry benchmarks and offers new capabilities, including improved reasoning.

                                                                                                                                                                                    • Planned deprecation
                                                                                                                                                                                    OpenAI logowhisper-large-v3-turbo
                                                                                                                                                                                    Automatic Speech RecognitionOpenAIHosted

                                                                                                                                                                                    Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation.

                                                                                                                                                                                    • Batch
                                                                                                                                                                                    Meta logollama-3-8b-instruct-awq
                                                                                                                                                                                    Text GenerationMetaHosted

                                                                                                                                                                                    Quantized (int4) generative text model with 8 billion parameters from Meta.

                                                                                                                                                                                    • Planned deprecation
                                                                                                                                                                                    l
                                                                                                                                                                                    llava-1.5-7b-hfBeta
                                                                                                                                                                                    Image-to-Textllava-hfHosted

                                                                                                                                                                                    LLaVA is an open-source chatbot trained by fine-tuning LLaMA/Vicuna on GPT-generated multimodal instruction-following data. It is an auto-regressive language model, based on the transformer architecture.

                                                                                                                                                                                      OpenAI logowhisper-tiny-enBeta
                                                                                                                                                                                      Automatic Speech RecognitionOpenAIHosted

                                                                                                                                                                                      Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. Trained on 680k hours of labelled data, Whisper models demonstrate a strong ability to generalize to many datasets and domains without the need for fine-tuning. This is the English-only version of the Whisper Tiny model which was trained on the task of speech recognition.

                                                                                                                                                                                        Meta logollama-3-8b-instruct
                                                                                                                                                                                        Text GenerationMetaHosted

                                                                                                                                                                                        Generation over generation, Meta Llama 3 demonstrates state-of-the-art performance on a wide range of industry benchmarks and offers new capabilities, including improved reasoning.

                                                                                                                                                                                        • Planned deprecation
                                                                                                                                                                                        MistralAI logomistral-7b-instruct-v0.2Beta
                                                                                                                                                                                        Text GenerationMistralAIHosted

                                                                                                                                                                                        The Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.2. Mistral-7B-v0.2 has the following changes compared to Mistral-7B-v0.1: 32k context window (vs 8k context in v0.1), rope-theta = 1e6, and no Sliding-Window Attention.

                                                                                                                                                                                        • LoRA
                                                                                                                                                                                        • Planned deprecation
                                                                                                                                                                                        Google logogemma-7b-it-loraBeta
                                                                                                                                                                                        Text GenerationGoogleHosted

                                                                                                                                                                                        This is a Gemma-7B base model that Cloudflare dedicates for inference with LoRA adapters. Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models.

                                                                                                                                                                                        • LoRA
                                                                                                                                                                                        Google logogemma-2b-it-loraBeta
                                                                                                                                                                                        Text GenerationGoogleHosted

                                                                                                                                                                                        This is a Gemma-2B base model that Cloudflare dedicates for inference with LoRA adapters. Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models.

                                                                                                                                                                                        • LoRA
                                                                                                                                                                                        Meta logollama-2-7b-chat-hf-loraBeta
                                                                                                                                                                                        Text GenerationMetaHosted

                                                                                                                                                                                        This is a Llama2 base model that Cloudflare dedicated for inference with LoRA adapters. Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. This is the repository for the 7B fine-tuned model, optimized for dialogue use cases and converted for the Hugging Face Transformers format.

                                                                                                                                                                                        • LoRA
                                                                                                                                                                                        Google logogemma-7b-itBeta
                                                                                                                                                                                        Text GenerationGoogleHosted

                                                                                                                                                                                        Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. They are text-to-text, decoder-only large language models, available in English, with open weights, pre-trained variants, and instruction-tuned variants.

                                                                                                                                                                                        • LoRA
                                                                                                                                                                                        • Planned deprecation
                                                                                                                                                                                        n
                                                                                                                                                                                        hermes-2-pro-mistral-7bBeta
                                                                                                                                                                                        Text GenerationnousresearchHosted

                                                                                                                                                                                        Hermes 2 Pro on Mistral 7B is the new flagship 7B Hermes! Hermes 2 Pro is an upgraded, retrained version of Nous Hermes 2, consisting of an updated and cleaned version of the OpenHermes 2.5 Dataset, as well as a newly introduced Function Calling and JSON Mode dataset developed in-house.

                                                                                                                                                                                        • Function calling
                                                                                                                                                                                        • Planned deprecation
                                                                                                                                                                                        MistralAI logomistral-7b-instruct-v0.2-loraBeta
                                                                                                                                                                                        Text GenerationMistralAIHosted

                                                                                                                                                                                        The Mistral-7B-Instruct-v0.2 Large Language Model (LLM) is an instruct fine-tuned version of the Mistral-7B-v0.2.

                                                                                                                                                                                        • LoRA
                                                                                                                                                                                        Unum logouform-gen2-qwen-500mBeta
                                                                                                                                                                                        Image-to-TextUnumHosted

                                                                                                                                                                                        UForm-Gen is a small generative vision-language model primarily designed for Image Captioning and Visual Question Answering. The model was pre-trained on the internal image captioning dataset and fine-tuned on public instructions datasets: SVIT, LVIS, VQAs datasets.

                                                                                                                                                                                        • Planned deprecation
                                                                                                                                                                                        Meta logobart-large-cnnBeta
                                                                                                                                                                                        SummarizationMetaHosted

                                                                                                                                                                                        BART is a transformer encoder-encoder (seq2seq) model with a bidirectional (BERT-like) encoder and an autoregressive (GPT-like) decoder. You can use this model for text summarization.

                                                                                                                                                                                        • Planned deprecation
                                                                                                                                                                                        Microsoft logophi-2Beta
                                                                                                                                                                                        Text GenerationMicrosoftHosted

                                                                                                                                                                                        Phi-2 is a Transformer-based model with a next-word prediction objective, trained on 1.4T tokens from multiple passes on a mixture of Synthetic and Web datasets for NLP and coding.

                                                                                                                                                                                        • Planned deprecation
                                                                                                                                                                                        Defog logosqlcoder-7b-2Beta
                                                                                                                                                                                        Text GenerationDefogHosted

                                                                                                                                                                                        This model is intended to be used by non-technical users to understand data inside their SQL databases.

                                                                                                                                                                                        • Planned deprecation
                                                                                                                                                                                        Meta logodetr-resnet-50Beta
                                                                                                                                                                                        Object DetectionMetaHosted

                                                                                                                                                                                        DEtection TRansformer (DETR) model trained end-to-end on COCO 2017 object detection (118k annotated images).

                                                                                                                                                                                          ByteDance logostable-diffusion-xl-lightningBeta
                                                                                                                                                                                          Text-to-ImageByteDanceHosted

                                                                                                                                                                                          SDXL-Lightning is a lightning-fast text-to-image generation model. It can generate high-quality 1024px images in a few steps.

                                                                                                                                                                                            l
                                                                                                                                                                                            dreamshaper-8-lcm
                                                                                                                                                                                            Text-to-ImagelykonHosted

                                                                                                                                                                                            Stable Diffusion model that has been fine-tuned to be better at photorealism without sacrificing range.

                                                                                                                                                                                              RunwayML logostable-diffusion-v1-5-img2imgBeta
                                                                                                                                                                                              Text-to-ImageRunwayMLHosted

                                                                                                                                                                                              Stable Diffusion is a latent text-to-image diffusion model capable of generating photo-realistic images. Img2img generate a new image from an input image with Stable Diffusion.

                                                                                                                                                                                                RunwayML logostable-diffusion-v1-5-inpaintingBeta
                                                                                                                                                                                                Text-to-ImageRunwayMLHosted

                                                                                                                                                                                                Stable Diffusion Inpainting is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input, with the extra capability of inpainting the pictures by using a mask.

                                                                                                                                                                                                  Stability.ai logostable-diffusion-xl-base-1.0Beta
                                                                                                                                                                                                  Text-to-ImageStability.aiHosted

                                                                                                                                                                                                  Diffusion-based text-to-image generative model by Stability AI. Generates and modify images based on text prompts.

                                                                                                                                                                                                    BAAI logobge-large-en-v1.5
                                                                                                                                                                                                    Text EmbeddingsBAAIHosted

                                                                                                                                                                                                    BAAI general embedding (Large) model that transforms any given text into a 1024-dimensional vector

                                                                                                                                                                                                    • Batch
                                                                                                                                                                                                    BAAI logobge-small-en-v1.5
                                                                                                                                                                                                    Text EmbeddingsBAAIHosted

                                                                                                                                                                                                    BAAI general embedding (Small) model that transforms any given text into a 384-dimensional vector

                                                                                                                                                                                                    • Batch
                                                                                                                                                                                                    Meta logollama-2-7b-chat-fp16
                                                                                                                                                                                                    Text GenerationMetaHosted

                                                                                                                                                                                                    Full precision (fp16) generative text model with 7 billion parameters from Meta

                                                                                                                                                                                                    • Planned deprecation
                                                                                                                                                                                                    MistralAI logomistral-7b-instruct-v0.1
                                                                                                                                                                                                    Text GenerationMistralAIHosted

                                                                                                                                                                                                    Instruct fine-tuned version of the Mistral-7b generative text model with 7 billion parameters

                                                                                                                                                                                                    • LoRA
                                                                                                                                                                                                    • Planned deprecation
                                                                                                                                                                                                    BAAI logobge-base-en-v1.5
                                                                                                                                                                                                    Text EmbeddingsBAAIHosted

                                                                                                                                                                                                    BAAI general embedding (Base) model that transforms any given text into a 768-dimensional vector

                                                                                                                                                                                                    • Batch
                                                                                                                                                                                                    HuggingFace logodistilbert-sst-2-int8
                                                                                                                                                                                                    Text ClassificationHuggingFaceHosted

                                                                                                                                                                                                    Distilled BERT model that was finetuned on SST-2 for sentiment classification

                                                                                                                                                                                                      Meta logollama-2-7b-chat-int8
                                                                                                                                                                                                      Text GenerationMetaHosted

                                                                                                                                                                                                      Quantized (int8) generative text model with 7 billion parameters from Meta

                                                                                                                                                                                                      • Planned deprecation
                                                                                                                                                                                                      Meta logom2m100-1.2b
                                                                                                                                                                                                      TranslationMetaHosted

                                                                                                                                                                                                      Multilingual encoder-decoder (seq-to-seq) model trained for Many-to-Many multilingual translation

                                                                                                                                                                                                      • Batch
                                                                                                                                                                                                      Microsoft logoresnet-50
                                                                                                                                                                                                      Image ClassificationMicrosoftHosted

                                                                                                                                                                                                      50 layers deep image classification CNN trained on more than 1M images from ImageNet

                                                                                                                                                                                                        OpenAI logowhisper
                                                                                                                                                                                                        Automatic Speech RecognitionOpenAIHosted

                                                                                                                                                                                                        Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification.

                                                                                                                                                                                                          Meta logollama-3.1-70b-instruct
                                                                                                                                                                                                          Text GenerationMetaHosted

                                                                                                                                                                                                          The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models. The Llama 3.1 instruction tuned text only models are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.

                                                                                                                                                                                                          • Planned deprecation
                                                                                                                                                                                                          Meta logollama-3.1-8b-instruct-fast
                                                                                                                                                                                                          Text GenerationMetaHosted

                                                                                                                                                                                                          [Fast version] The Meta Llama 3.1 collection of multilingual large language models (LLMs) is a collection of pretrained and instruction tuned generative models. The Llama 3.1 instruction tuned text only models are optimized for multilingual dialogue use cases and outperform many of the available open source and closed chat models on common industry benchmarks.