What is Together AI
Together AI is the AI Acceleration Cloud. It provides easy-to-use APIs and highly scalable infrastructure to run, fine-tune, train, and deploy generative AI models at scale on scalable GPU clusters. It aims to optimize performance and cost for AI workloads.
How to use Together AI
Users can start building with Together AI via their API. The platform supports running, fine-tuning, and training models. It offers tools for inference, fine-tuning, and deploying models on custom hardware.
Features of Together AI
- Model Platform:
- 200+ generative AI models (open-source, multimodal)
- OpenAI-compatible APIs
- Serverless Inference
- Dedicated Endpoints
- Fine-Tuning
- Together Chat app
- Code Execution:
- Code Sandbox (AI development environments)
- Code Interpreter (Execute LLM-generated code)
- Tools:
- Which LLM to Use (Model selection guide)
- GPU Cloud:
- Instant Clusters (Self-serve up to 64 NVIDIA GPUs)
- Reserved Clusters (64 to 10,000+ NVIDIA GPUs)
- Global Data Center Locations (25+ cities)
- Slurm (Cluster management system)
- Access to NVIDIA GPUs (GB200 NVL72, HGX B200, H200, H100)
- Solutions:
- Enterprise-grade infrastructure
- Customer Stories
- Focus on Open Source AI
- Support for various Industries & Use-Cases
- Developers:
- Documentation
- Research
- Model Library
- Cookbooks (Implementation guides)
- Example Apps
Use Cases of Together AI
Together AI can be used to scale businesses with AI across various industries and use-cases. Customer stories highlight applications like accelerating training and inference, and building AI customer support bots.
Pricing
Pricing is available for the platform and GPU usage. This includes per-token and per-minute pricing for Inference, pricing for LoRA and full Fine-Tuning, and hourly rates and custom pricing for GPU Clusters.