Is Hosting Your Own LLM Cheaper than OpenAI?
๐๐๐๐๐๐๐๐ ๐๐๐๐๐ ๐๐๐๐๐๐๐๐ ๐๐ ๐๐๐๐๐๐๐๐
๐๐ฉ๐๐ง ๐๐ ๐๐ฉ๐ข ๐๐ซ๐ข๐๐ข๐ง๐ :
Charges are calculated per tokens. 1000 tokens approx 750 words.
Model wise cost:
- ๐๐๐-๐
๐๐ง๐ฉ๐ฎ๐ญ ๐๐จ๐ฌ๐ญ->$0.03 / 1K tokens ๐๐ฎ๐ญ๐ฉ๐ฎ๐ญ ๐๐จ๐ฌ๐ญ-> $0.06 / 1K tokens
2. ๐๐๐-๐.๐ ๐๐ฎ๐ซ๐๐จ
๐๐ง๐ฉ๐ฎ๐ญ ๐๐จ๐ฌ๐ญ-> $0.0010 / 1K tokens ๐๐ฎ๐ญ๐ฉ๐ฎ๐ญ ๐๐จ๐ฌ๐ญ->$0.0020 / 1K tokens
Monthly costs of average AI app that uses these apiโs For example Email Copywriting Agent app.
๐ ๐จ๐ซ ๐๐ก๐จ๐ซ๐ญ ๐๐จ๐ง๐ญ๐๐ง๐ญ ๐๐ฉ๐ฉ ๐๐จ๐ฅ๐ฎ๐ญ๐ข๐จ๐ง๐ฌ:
If this app writes marketing copywriting posts for users that takes around 100โ150 words as input and outputs 600 words content.
That means 1 email costs 1000 tokens for one user around
1. GPT-4 Model
Let's break down the cost calculation based on the given information for GPT-4:
Input Cost:
Input tokens = (150 words) * (1000 tokens / 750 words) = 200 tokens
Input Cost = (200 tokens) * ($0.03 / 1K tokens) = $0.006
Output Cost:
Output tokens = (600 words) * (1000 tokens / 750 words) = 800 tokens
Output Cost = (800 tokens) * ($0.06 / 1K tokens) = $0.048
Total Cost:
Total Cost = Input Cost + Output Cost = $0.006 + $0.048 = $0.054
Total Cost per user = $0.006 + $0.048 = $0.054
If we receive 1000 users requests per day to write Copywriting email than the average monthly cost would be approximately
Month Total Cost = (Cost per user) x (1000 requests per day) x (30 days)
๐๐จ๐ง๐ญ๐ก ๐๐จ๐ญ๐๐ฅ ๐๐จ๐ฌ๐ญ = $1,620
Therefore, with GPT4 model if 1000 users use your service every day for 30 days, it would cost $1,620 in total.
2. GPT-3.5 Turbo
Monthly Cost = ($0.0018 per user) x (1000 requests per day) x (30 days)
๐๐จ๐ง๐ญ๐ก ๐๐จ๐ญ๐๐ฅ ๐๐จ๐ฌ๐ญ = $54
Therefore, with GPT-3.5 Turbo model if 1000 users use your service every day for 30 days, it would cost $54 in total.
Host Your Own LLM Pricing:
Llama-2 7b on AWS
The choice of server type significantly influences the cost of hosting your own Large Language Model (LLM) on AWS, with varying server requirements for different models. Opting for the Llama-2 7b (7 billion parameter) model necessitates at least the EC2 g5.2xlarge server instance, priced at around $850 per month.
Additionally, connecting the model to an API for usage (utilizing AWS API Gateway & AWS Lambda) incurs an additional cost. However, with 1000 requests per day, this expense remains below $100 per month.
In summary, the estimated monthly cost for AWS hosting, including server and API usage, is approximately $1,000.
One Little catch here:
Given OpenAIโs token-based pricing, a rise in your daily requests to 2,000 would result in a doubled monthly cost of $2,000.
However, opting for AWS setup ensures seamless handling of this increased load without additional scaling, maintaining a stable monthly cost at $1,000.
As a discerning businessperson, choosing the AWS setup for your 2,000 requests per day application is a prudent sensible decision.
Upgrading your Custom model:
Despite user complaints about the subpar quality of copyrighting emails generated by Llama-2 7B, it is found unsuitable for the intended use case. Subsequent experimentation revealed that Llama-2 13B significantly improved the output quality.
However, adopting Llama-2 13B necessitates a more robust server, substantially increasing costs to approximately $5,000 per month โ $3,000 more than the expenses incurred using the OpenAI API.
Conclusion:
Hersโs the key takeaways from today:
- Experiment with various models to identify the ones that yield optimal results.
- Determine the expected input and output text volumes for each model.
- If the text volume is consistent and low, and security is not a primary concern, opting for OpenAI may be the preferable choice.
- Otherwise, consider running a cost analysis for AWS to make an informed decision based on your specific requirements.