Giving Siri Superpowers

It’s no secret that digital assistants in 2023 are clunky and stale. Siri has lost its edge, while Alexa and Google Assistant were thrown at every wall to see where they might stick. Each of them responds to a limited set of instructions, making them only as useful as the number of commands one can memorize. While it’s clear that generative AI and digital assistants will converge at some point, I’m too eager to wait.

So I’m building “Use Your Brain”, an Apple Shortcut that relays queries from Siri to OpenAI’s GPT-4 LLM and returns the results through Siri’s own interface. It will be a simple implementation to start, just enough for a single prompt-and-response interaction with GPT-4. Once I have the basic concept working, I’ll consider how to expand it to be more practically useful.

ChatGPT is just a web interface for GPT-4, so I’ll be using the OpenAI API to interact with the LLM. Their documentation is well-written, as is the API. Authentication is done by passing an API key as a bearer token, and the endpoint I’m using is for chat completion, which at the time of writing is https://api.openai.com/v1/chat/completions. Chat completion allows one message or an entire conversation to be passed to it, so I can use the same endpoint later when I’m ready to expand the shortcut’s functionality to allow for conversations.

The endpoint only requires two parameters in the request body, model and message. Model is an identifier for the particular OpenAI model to which message should be passed. In this case, the value will be “gpt-4”. The message will be a string containing the prompt to GPT-4. I’ll also be including a token parameter which will be set to 2048; this tells GPT-4 the maximum nummber of tokens to generate in the response. One token is approximately equal to four characters of English text, so one hundred tokens is roughly equal to about seventy-five words. I’m setting this value to a high number to avoid any potential issue with messages being cut off, but I’ll also pass a system message along with the prompt that will tell GPT-4 to keep its answers concise:

You are a helpful personal assistant designed to be used through Apple’s Shortcuts. The user may be interacting with you via speech-to-text on their device, so your responses must be concise and able to be read aloud within 20 to 30 seconds.

The API returns a JSON response with a number of properties:

id
object
created
model
choices
- index
- message
- finish_reason
usage
- prompt_tokens
- completion_tokens
- total_tokens

For this use case, I’m only interested in choices, which contains one or more responses from the LLM. Other information included in the response body, such as the reason the response finished generating, how many tokens were used in the prompt and in the completion, and the total tokens used by the conversation may be useful later in multi-prompt conversations.

I haven’t yet determined in what cases there might be more than one object within choices, but when a user converses with ChatGPT, the interface presents an option to regenerate a response, which I suspect is related. I’ll assume that only one response is returned by the LLM, so I’m targeting it with choices[0].message.content. Its value is a string containing the actual message I’m looking for.

Then it’s as simple as displaying that value in an alert followed by a dialog asking the user if they would like to send another prompt. If so, the shortcut restarts, otherwise it exits.

So far in testing, this shortcut has worked remarkably well and fairly quickly across all devices and modalities that support Shortcuts, whether it’s while I’m hands-free with AirPods in, or working around the house and asking questions through a HomePod.

Multi-prompt conversations are the biggest upgrade I foresee at the moment, but even with single, isolated prompts this shortcut is proving to be very useful.