Overview
The Llama 3.2 11B Vision Instruct model features an intermediate size of 11 billion parameters and combines both textual and visual processing capabilities. It builds upon the foundation of the original Llama models by adding significant enhancements for visual understanding tasks such as image captioning and visual reasoning. Like the other vision models in this series, it employs a dedicated vision adapter that integrates with the pre-trained language model to enhance its performance on multimodal tasks. This model is particularly suited for applications requiring detailed image analysis alongside text input, making it a powerful tool for developers looking to create interactive AI systems.
Specializations
Vision-Language Model: Understands and generates text based on visual input.
Mid-sized Model: Balanced performance and efficiency.
Instruction-Tuned: Capable of following instructions related to visual content.
Integration Guide (Javascript)
To use this model through Portkey, follow these steps:
1. Install Portkey SDK:
npm install --save portkey-ai
2. Set up client with Portkey:
// Import and initialize Portkey
import Portkey from 'portkey-ai'
const portkey = new Portkey({
apiKey: "PORTKEY_API_KEY", // Replace with your Portkey API key
virtualKey: "VIRTUAL_KEY" // Your Fireworks Virtual Key created in Portkey
})
3. Make a request:
// Make a chat completion request
const chatCompletion = await portkey.chat.completions.create({
messages: [{ role: 'user', content: 'Say this is a test' }],
model: 'accounts/fireworks/models/llama-v3p2-11b-vision-instruct',
});
console.log(chatCompletion.choices);
Model Specifications
Release Date:
25/9/2024
Max. Context Tokens:
128K
Max. Output Tokens:
8K
Model Size
11B
Knowledge Cut-Off Date:
December 2023
License:
Open-Source
© 2024 Portkey, Inc. All rights reserved