llmasaservice.io

Integrate and Scale AI Features with Ease

With LLM as a Service, securely integrate multiple LLM providers with advanced routing, metering, load balancing, streaming conversations, and more.

Want to See it in action?

Shark-Puddle.com was built in a few days to showcase using LLMasaService.io and to put it through its paces.  Pitch your (real or just for fun) business idea and see how it works, view the source code, and try it for yourself!

Streaming Conversational AI

Secure implementation of streaming responses and conversations.

LLM Proxy & "Smart" Routing

Route requests to the right LLM provider and model based on prompt complexity & sensitivity.

Load Balancing and Failover

..for when not if they have an outage.

Safety & Compliance

Redact/Tokenize PII data, Block toxic prompts, and audit call logs.

Secure and Scalable

Secure storage of API keys, dev/ops control panel, call analytics and load shedding.

Data Tenancy (EU)

Route customers to specific LLM providers by region or other settings.

Response Caching

Reduce cost by providing cached responses for repeated prompts.

Customer Token Budgeting

Give trial customers a token allowance to start, meter usage, and integrate billing.

Add Production-Quality AI Features

Safe, Fast and Reliable Generative AI

Dealing with outages, rate limits, scalability and security are all significant challenges when integrating public LLM features into your software.  Learn what we offer.

Multiple vendors and models - reliable and convenient

Multiple vendors and models

Avoid vendor or model lock-in, and manage model groups in one place, including API keys.  Keep any vendor specific API code and model names OUT of your source code.

New Model Introduction and Sunsetting

There will always be a new and better model. You won’t have to change your code to try them out and make them generally available in your application

Get in touch with us today to pilot LLMasaService.io

Smart Routing - right model at the right price for any prompt

Smart ML Model Routing

Use our pre-trained model for choosing between general or stronger models. Or, you can try a stronger model using code or feedback from customers about prompt quality.

Route or block sensitive or unsafe prompts

Control how prompts get routed when they contain hateful, PII or specific keywords. Options are to route to internal models or to block those prompts.

Get in touch with us today to pilot LLMasaService.io

Prompt template library for re-use and consistency

Create re-usable prompts

Create and manage prompts in one place that can be reused in multiple places in your applications

Change and test prompts without code deploy

Make changes to prompts without needing to redeploy production code. 

Trustworthy AI - Make AI Safe and Consistent

Detect and Block Toxic Prompts

Log and avoid harmful prompts being accepted.  Stop  violent, hateful, sexual,  profane or graphic prompts before risking  a vendor responding inappropriately.

Manage Brand and Policy Instructions

Define your brand and legal policy instructions in one place. These instructions will be injected into EVERY prompt ensuring your responses stay on message.

Streaming AI chat example from heyCASEy.io.

Make Streaming AI Chat Features a Snap

Want to add some cool AI chat features to your product?

We did, too – and getting it to work with code examples was easy – but reliably scaling it and doing it securely was a lot harder.

Even at moderate scale, keeping it reliable and available was a major headache.  Provider outages caused our product to fail at the worst times (like product demos).

 

What does it take to get our AI features Production-Ready?

We realized we needed multiple LLM providers so that we could gracefully failover to another when not if they had an outage.

Different providers had different rate limits, so we added the ability to retry a request to different providers whenever we hit a rate limit. 

And let’s not forget EU customers. Without data tenancy settings to route AI chat requests to LLM providers in the EU, they wouldn’t be able to use our software. 

We added response caching, a developer control panel, customer token allowances, secure API key storage, load shedding, and PII data redaction, too. 

And now we’ve packaged up everything we’ve learned for you to use in your applications. 

Make adding streaming AI features easy and focus on adding value to your customers.

Get in touch with us today to pilot LLMasaService.io

Check out the NPM Package Documentation

There are two parts to using LLMasaService.io - our developer control panel and this library that connects your code to our sevice, deployed as a standard NPM pacakge

Designed for Developers

Visibility, Control, and Security with our Developer Control Panel.

Bring multiple providers online, use the “chaos monkey” to test outage behavior, monitor requests and token usage, set up customer tiers and securely store all your API Keys in one place. 

Rapid build

Easy to use code examples

Get your streaming AI chat features online in record time, and build on a reliable service that takes the guesswork out of building AI features that are scalable and secure, allowing you to focus on your unique value to customers. 

Ready to Get Started?

  1. Create an Account
  2. Configure your LLM providers and options
  3. Retrieve your Project Key

2. Add llmasaservice calls to your app

  1. Import the npm package or make direct fetch calls
  2. Implement the example code using your Project Key
  3. Any trouble? Contact us here.