Overview
The Llama 3.2 90B Vision Instruct model is a large-scale multimodal AI with 90 billion parameters, designed to handle both text and image inputs effectively. This model excels in visual recognition, image reasoning, and captioning tasks, leveraging a unique architecture that integrates image encoder representations into the language model through cross-attention layers. This allows it to perform complex tasks such as visual question answering and document understanding, including interpreting charts and graphs. Despite its impressive size, recent evaluations indicate that it performed similarly to its base version across various datasets, raising questions about the effectiveness of instruction fine-tuning in this context.
Specializations
Vision-Language Model: Understands and generates text based on visual input.
Large-Scale Model: Powerful capabilities for complex tasks like image description and question answering.
Instruction-Tuned: Capable of following instructions related to visual content.
Integration Guide (Javascript)
To use this model through Portkey, follow these steps:
1. Install Portkey SDK:
npm install --save portkey-ai
2. Set up client with Portkey:
// Import and initialize Portkey
import Portkey from 'portkey-ai'
const portkey = new Portkey({
apiKey: "PORTKEY_API_KEY", // Replace with your Portkey API key
virtualKey: "VIRTUAL_KEY" // Your Fireworks Virtual Key created in Portkey
})
3. Make a request:
// Make a chat completion request
const chatCompletion = await portkey.chat.completions.create({
messages: [{ role: 'user', content: 'Say this is a test' }],
model: 'accounts/fireworks/models/llama-v3p2-90b-vision-instruct',
});
console.log(chatCompletion.choices);
Model Specifications
Release Date:
25/9/2024
Max. Context Tokens:
128K
Max. Output Tokens:
8K
Model Size
90B
Knowledge Cut-Off Date:
December 2023
License:
Open-Source
© 2024 Portkey, Inc. All rights reserved