Is Hosting Your Own LLM Cheaper than OpenAI?

Sawera Khadium

3 min readFeb 5, 2024

𝐂𝐎𝐌𝐌𝐎𝐍𝐋𝐘 𝐀𝐒𝐊𝐄𝐃 𝐐𝐔𝐄𝐒𝐓𝐈𝐎𝐍 𝐁𝐘 𝐒𝐓𝐀𝐑𝐓𝐔𝐏𝐒

𝐎𝐩𝐞𝐧 𝐀𝐈 𝐀𝐩𝐢 𝐏𝐫𝐢𝐜𝐢𝐧𝐠:

Charges are calculated per tokens. 1000 tokens approx 750 words.

Model wise cost:

𝐆𝐏𝐓-𝟒

𝐈𝐧𝐩𝐮𝐭 𝐂𝐨𝐬𝐭->$0.03 / 1K tokens 𝐎𝐮𝐭𝐩𝐮𝐭 𝐂𝐨𝐬𝐭-> $0.06 / 1K tokens

2. 𝐆𝐏𝐓-𝟑.𝟓 𝐓𝐮𝐫𝐛𝐨

 𝐈𝐧𝐩𝐮𝐭 𝐂𝐨𝐬𝐭-> $0.0010 / 1K tokens 𝐎𝐮𝐭𝐩𝐮𝐭 𝐂𝐨𝐬𝐭->$0.0020 / 1K tokens

Monthly costs of average AI app that uses these api’s For example Email Copywriting Agent app.

𝐅𝐨𝐫 𝐒𝐡𝐨𝐫𝐭 𝐜𝐨𝐧𝐭𝐞𝐧𝐭 𝐚𝐩𝐩 𝐒𝐨𝐥𝐮𝐭𝐢𝐨𝐧𝐬:

If this app writes marketing copywriting posts for users that takes around 100–150 words as input and outputs 600 words content.

That means 1 email costs 1000 tokens for one user around

1. GPT-4 Model

Let's break down the cost calculation based on the given information for GPT-4:

Input Cost:

Input tokens = (150 words) * (1000 tokens / 750 words) = 200 tokens
Input Cost = (200 tokens) * ($0.03 / 1K tokens) = $0.006
Output Cost:

Output tokens = (600 words) * (1000 tokens / 750 words) = 800 tokens
Output Cost = (800 tokens) * ($0.06 / 1K tokens) = $0.048
Total Cost:

Total Cost = Input Cost + Output Cost = $0.006 + $0.048 = $0.054

Total Cost per user = $0.006 + $0.048 = $0.054

If we receive 1000 users requests per day to write Copywriting email than the average monthly cost would be approximately

Month Total Cost = (Cost per user) x (1000 requests per day) x (30 days)
𝐌𝐨𝐧𝐭𝐡 𝐓𝐨𝐭𝐚𝐥 𝐂𝐨𝐬𝐭 = $1,620

Therefore, with GPT4 model if 1000 users use your service every day for 30 days, it would cost $1,620 in total.

2. GPT-3.5 Turbo

Monthly Cost =  ($0.0018 per user) x (1000 requests per day) x (30 days)
𝐌𝐨𝐧𝐭𝐡 𝐓𝐨𝐭𝐚𝐥 𝐂𝐨𝐬𝐭 = $54

Therefore, with GPT-3.5 Turbo model if 1000 users use your service every day for 30 days, it would cost $54 in total.

Host Your Own LLM Pricing:

Llama-2 7b on AWS

The choice of server type significantly influences the cost of hosting your own Large Language Model (LLM) on AWS, with varying server requirements for different models. Opting for the Llama-2 7b (7 billion parameter) model necessitates at least the EC2 g5.2xlarge server instance, priced at around $850 per month.

Additionally, connecting the model to an API for usage (utilizing AWS API Gateway & AWS Lambda) incurs an additional cost. However, with 1000 requests per day, this expense remains below $100 per month.

In summary, the estimated monthly cost for AWS hosting, including server and API usage, is approximately $1,000.

One Little catch here:

Given OpenAI’s token-based pricing, a rise in your daily requests to 2,000 would result in a doubled monthly cost of $2,000.

However, opting for AWS setup ensures seamless handling of this increased load without additional scaling, maintaining a stable monthly cost at $1,000.

As a discerning businessperson, choosing the AWS setup for your 2,000 requests per day application is a prudent sensible decision.

Upgrading your Custom model:

Despite user complaints about the subpar quality of copyrighting emails generated by Llama-2 7B, it is found unsuitable for the intended use case. Subsequent experimentation revealed that Llama-2 13B significantly improved the output quality.

However, adopting Llama-2 13B necessitates a more robust server, substantially increasing costs to approximately $5,000 per month — $3,000 more than the expenses incurred using the OpenAI API.

Conclusion:

Hers’s the key takeaways from today:

Experiment with various models to identify the ones that yield optimal results.
Determine the expected input and output text volumes for each model.
If the text volume is consistent and low, and security is not a primary concern, opting for OpenAI may be the preferable choice.
Otherwise, consider running a cost analysis for AWS to make an informed decision based on your specific requirements.