Llama 3.2 90B Vision Instruct | Pricing | Token Size | LMSys Score

Models

Github →

Llama 3.2 90B Vision Instruct

Related Links

Overview

The Llama 3.2 90B Vision Instruct model is a large-scale multimodal AI with 90 billion parameters, designed to handle both text and image inputs effectively. This model excels in visual recognition, image reasoning, and captioning tasks, leveraging a unique architecture that integrates image encoder representations into the language model through cross-attention layers. This allows it to perform complex tasks such as visual question answering and document understanding, including interpreting charts and graphs. Despite its impressive size, recent evaluations indicate that it performed similarly to its base version across various datasets, raising questions about the effectiveness of instruction fine-tuning in this context.

Specializations

Vision-Language Model: Understands and generates text based on visual input.
Large-Scale Model: Powerful capabilities for complex tasks like image description and question answering.
Instruction-Tuned: Capable of following instructions related to visual content.

Integration Guide (Javascript)

To use this model through Portkey, follow these steps:

1. Install Portkey SDK:

npm install --save portkey-ai

2. Set up client with Portkey:

// Import and initialize Portkey

import Portkey from 'portkey-ai'

const portkey = new Portkey({

apiKey: "PORTKEY_API_KEY", // Replace with your Portkey API key

virtualKey: "VIRTUAL_KEY" // Your Fireworks Virtual Key created in Portkey

})

3. Make a request:

// Make a chat completion request

const chatCompletion = await portkey.chat.completions.create({

messages: [{ role: 'user', content: 'Say this is a test' }],

model: 'accounts/fireworks/models/llama-v3p2-90b-vision-instruct',

});

console.log(chatCompletion.choices);

Model Specifications

Release Date:

25/9/2024

Max. Context Tokens:

128K

Max. Output Tokens:

Model Size

90B

Knowledge Cut-Off Date:

December 2023

License:

Open-Source