Money matters

Although trivial, we often tend to underestimate that statement or do not pay enough attention when developing proof-of-concept applications, which later makes it unoptimized all the way to production. In pursuit of quick results, we (I) often develop POCs in the fastest possible way, skipping some deeper considerations on load times, general performance and ... cost. A classic example is the use of the popular ChatGPT API. Here is a quote on their pricing:

Language models

Multiple models, each with different capabilities and price points. Prices are per 1,000 tokens. You can think of tokens as pieces of words, where 1,000 tokens is about 750 words. This paragraph is 35 tokens.

GPT-4

Model
Prompt
Completion
8K context
$0.03 / 1K tokens
$0.06 / 1K tokens
32K context
$0.06 / 1K tokens
$0.12 / 1K tokens

Full Pricing: https://openai.com/pricing

Is this too much you'd ask? Well, let's consider a standard relatively simple prompt:

TOKENS: We can consider a prompt (a question we ask ChatGPT) of around 100 tokens. The response from ChatGPT can be around 400 for simple questions and more than 1000 tokens for more complex ones.
So in total, we can expect around 500-1000 tokens with a single call to the API. If we consider the above pricing, that makes around $0.05 in total.

* The above example is based on the more expensive GPT4 model. Prices can be lower when using gpt-3.5-turbo model.
LOAD TIME: Another consideration we might have when using ChatGPT API is load time. An average response from ChatGPT API can vary between a couple of seconds to more than ten. Which is not really great, especially compared to an average load times in Oracle APEX applications or REST API calls to the Oracle Database.

What happens often is we create an APEX page, hook up ChatGPT API (using a custom package and/or PL/SQL page process) and start throwing prompts at it.

How do we optimize?

Having said all of the above we aim to optimize our apps in several ways: Load time, Price and Data Consistency.

Why Data Consistency?

ChatGPT and other Language models of this type frequently produce DIFFERENT replies to the same prompt. And it's something we're not used to, particularly as Oracle Database and APEX developers.

What I came up with is a local table, storing all requests and responses to the ChatGPT API. In case users use the same prompt - the response is returned from the local table. What's GOOD about this approach:

💰 We save money - because of the decreased amount of calls to ChatGPT when we have the prompt locally cached.
🚅 We improve load times - because getting a cached response from the local database is always faster than ChatGPT responses.
🎯 We get consistent data - because we always get the same response for querying with the same prompt. Once cached in the local database, we no longer go to ChatGPT for consecutive ones.
🏦 We can track our budget and usage - By having all ChatGPT API calls logged, we can easily track our usage. We can perform different analyses, budgets, etc. We could even set alarms (using APEX Automations) for different usage levels.

Using the ChatGPT API

I won't go into big details here. Michelle Skamene has already covered that in great detail in her blog post here: https://www.laureston.ca/2023/01/17/building-a-chatgpt-application-with-oracle-apex-in-record-time/

I will just note that I have named my package chatgpt_util with a procedure called ask_chatgpt. This procedure makes the REST API call to ChatGPT using apex_web_service.make_rest_request procedure and is used in my example below.

Creating database objects

To accomplish the local database caching and prompts reuse, I have set up the following database objects on top of a ChatGPT REST API packaged procedure - chatgpt_util.ask_chatgpt. I have built that package in a similar way to the one described above.

Tables

create table chatgpt_requests 
(    id     number generated by default on null as identity minvalue 1 maxvalue 99999999 increment by 1 start with 1 not null enable, 
    request_date date default sysdate not null enable, 
    prompt clob not null enable, 
    response_text clob, 
    prompt_tokens number, 
    completion_tokens number, 
    total_tokens number, 
    username varchar2(1000 char) not null enable, 
    constraint chatgpt_requests_pk primary key (id)
    using index  enable
) ;

Procedure

https://gist.github.com/plamen9/c4495c564abff055e71419500c6671f6

A sample ChatGPT API JSON response, used in the previous procedure

{
    "id": "cmpl-777777uqMvflrK5AtjAGxKASASAS",
    "object": "text_completion",
    "created": 1684244512,
    "model": "text-davinci-003",
    "choices": [
        {
            "text": "ChatGPT sample response",
            "index": 0,
            "logprobs": null,
            "finish_reason": "stop"
        }
    ],
    "usage": {
        "prompt_tokens": 129,
        "completion_tokens": 23,
        "total_tokens": 152
    }
}

Follow me

Like that post? Follow me on Twitter and LinkedIn!

APEX App Lab - Blog

APEX App Lab - Blog

Better ChatGPT use with a local database cache

Save money on your ChatGPT API requests. And improve speed and data consistency.

Table of contents

Money matters

Language models

GPT-4

How do we optimize?

Using the ChatGPT API

Creating database objects

Follow me

Model	Prompt	Completion
8K context	$0.03 / 1K tokens	$0.06 / 1K tokens
32K context	$0.06 / 1K tokens	$0.12 / 1K tokens