Llama 3.2 11B Vision Instruct | Pricing | Token Size | LMSys Score

Models

Github →

Llama 3.2 11B Vision Instruct

Related Links

Overview

The Llama 3.2 11B Vision Instruct model features an intermediate size of 11 billion parameters and combines both textual and visual processing capabilities. It builds upon the foundation of the original Llama models by adding significant enhancements for visual understanding tasks such as image captioning and visual reasoning. Like the other vision models in this series, it employs a dedicated vision adapter that integrates with the pre-trained language model to enhance its performance on multimodal tasks. This model is particularly suited for applications requiring detailed image analysis alongside text input, making it a powerful tool for developers looking to create interactive AI systems.

Specializations

Vision-Language Model: Understands and generates text based on visual input.
Mid-sized Model: Balanced performance and efficiency.
Instruction-Tuned: Capable of following instructions related to visual content.

Integration Guide (Javascript)

To use this model through Portkey, follow these steps:

1. Install Portkey SDK:

npm install --save portkey-ai

2. Set up client with Portkey:

// Import and initialize Portkey

import Portkey from 'portkey-ai'

const portkey = new Portkey({

apiKey: "PORTKEY_API_KEY", // Replace with your Portkey API key

virtualKey: "VIRTUAL_KEY" // Your Fireworks Virtual Key created in Portkey

})

3. Make a request:

// Make a chat completion request

const chatCompletion = await portkey.chat.completions.create({

messages: [{ role: 'user', content: 'Say this is a test' }],

model: 'accounts/fireworks/models/llama-v3p2-11b-vision-instruct',

});

console.log(chatCompletion.choices);

Model Specifications

Release Date:

25/9/2024

Max. Context Tokens:

128K

Max. Output Tokens:

Model Size

11B

Knowledge Cut-Off Date:

December 2023

License:

Open-Source