llmasaservice.io

Integrate and Scale AI Features with Ease

With LLM as a Service, securely integrate multiple LLM providers with advanced routing, metering, load balancing, streaming conversations, and more.

Want to See it in action?

Shark-Puddle.com was built in a few days to showcase using LLMasaService.io and to put it through its paces.  Pitch your (real or just for fun) business idea and see how it works, view the source code, and try it for yourself!

Streaming Conversational AI

Secure implementation of streaming responses and conversations.

LLM Proxy and Routing

Route what LLM provider and model gets called on a prompt by prompt (or customer by customer) basis in your application.

Load Balancing and Failover

..for when not if they have an outage.

Customer Token Budgeting

Give trial customers a token allowance to start, meter usage, and integrate billing.

Secure and Scalable

Secure storage of API keys, dev/ops control panel, call analytics and load shedding.

Data Tenancy (EU)

Route customers to specific LLM providers by region or other settings.

Response Caching

Reduce cost by providing cached responses for repeated prompts.

PII Data Redaction

Transparent tokenization so that no vendor gets passed PII data.

The Road to Production-Quality AI Features

Delivering Reliable AI features

Dealing with outages, rate limits, scalability and security are all significant challenges when integrating public LLM features into your software.  See how we approached tackling these in this short video.

Streaming AI chat example from heyCASEy.io.

Make Streaming AI Chat Features a Snap

Want to add some cool AI chat features to your product?

We did, too – and getting it to work with code examples was easy – but reliably scaling it and doing it securely was a lot harder.

Even at moderate scale, keeping it reliable and available was a major headache.  Provider outages caused our product to fail at the worst times (like product demos).

 

What does it take to get our AI features Production-Ready?

We realized we needed multiple LLM providers so that we could gracefully failover to another when not if they had an outage.

Different providers had different rate limits, so we added the ability to retry a request to different providers whenever we hit a rate limit. 

And let’s not forget EU customers. Without data tenancy settings to route AI chat requests to LLM providers in the EU, they wouldn’t be able to use our software. 

We added response caching, a developer control panel, customer token allowances, secure API key storage, load shedding, and PII data redaction, too. 

And now we’ve packaged up everything we’ve learned for you to use in your applications. 

Make adding streaming AI features easy and focus on adding value to your customers.

Get in touch with us today to pilot LLMasaService.io

Check out the NPM Package Documentation

There are two parts to using LLMasaService.io - our developer control panel and this library that connects your code to our sevice, deployed as a standard NPM pacakge

Designed for Developers

Visibility, Control, and Security with our Developer Control Panel.

Bring multiple providers online, use the “chaos monkey” to test outage behavior, monitor requests and token usage, set up customer tiers and securely store all your API Keys in one place. 

Rapid build

Easy to use code examples

Get your streaming AI chat features online in record time, and build on a reliable service that takes the guesswork out of building AI features that are scalable and secure, allowing you to focus on your unique value to customers. 

Ready to Get Started?

  1. Create an Account
  2. Configure your LLM providers and options
  3. Retrieve your Project Key

2. Add llmasaservice calls to your app

  1. Import the npm package or make direct fetch calls
  2. Implement the example code using your Project Key
  3. Any trouble? Contact us here.