ChatGPT bot image that can hear and talk

There are three main features being given to ChatGPT:

  1. Speaking
  2. Hearing
  3. and Seeing

These new upgrades introduce new voice and image capabilities within ChatGPT. This launch provides a new interface to interact with chatGPT by allowing you to have a voice conversation to show exactly what you’re talking about as well as presenting the AI bot with an image interface.

Is this another AI advancement to be worried about?

ChatGPT’s Speaking and Hearing

What is it?

The main innovation is the ability to communicate with ChatGPT by using your voice to have a back-and-forth conversation. This allows you to use a Siri-like interface where you speak to the device then the Chatbot will have a clever reply to your input. This new feature utilizes a new text-to-speech model as well as an open-source speech recognition system called Whisper.

To use it, simply prompt it similar to how you would use text prompts then wait for the chatbot to respond with its new voice.

The only users who have the ability to use this voice option are those with Plus or Enterprise subscriptions and those with the ChatGPT mobile app.

Bringing Vision to ChatGPT

Another worthy addition to ChatGPT is the ability to show the bot images. Image input within the app allows users to chat about an image. This seems to be an important step for AI advancement by breaking the barrier from text to the physical world.

There is also the ability to focus on certain parts of the image by circling the image.

As with the voice feature, Image input is only given to users with a Plus or Enterprise subscription.

Potential use cases with voice and image input

The most important use cases with the new features mainly include improving your daily life. Being able to properly explain something with your voice will definitely speed up conversation time.

In regard to the image input, you will now have the ability to expand your reasoning or your observations by sharing a glimpse into the physical world.

There are also possibilities to make money with these new features. In the near future, many AI prompts will appear that will help in the process of using these features. Image or Voice prompts can be sold on AI marketplaces or even be sold with other various AI prompts.

Is it Safe?

Now the big question. How safe is allowing ChatGPT with voice and image input?

According to OpenAI in the section titled, “Making vision both useful and safe”, they say they have “taken technical measures to significantly limit ChatGPT’s ability to analyze and make direct statements about people since ChatGPT is not always accurate and these systems should respect individuals’ privacy.”

From this statement, it shows that OpenAI has taken some precautions when using the image input to analyze people to respect privacy although the page doesn’t mention how it will use the images sent within the app.

We’ll see how the AI giant will use the voice and image information for future models or for further research. All in all, it is best to take safe precautions when using these new features and leave out any private or personal information, the same with when using text input with the chatbot.

OpenAI also explains that they will be expanding access to these features within the coming weeks so keep an eye out for any updates within your app settings.


  • Who can use voice and images in ChatGPT?
    • Users with ChatGPT Plus or Enterprise subscriptions get access to the new voice and images features.
  • How do I speak with ChatGPT?
    • Here are the steps in order to use your voice to speak with ChatGPT:
      1. Download and open the ChatGPT on your mobile device.
      2. Go to Settings, then New Features.
      3. Next, opt into Voice Conversations.
  • Can ChatGPT output audio?
    • Yes, with the new voice features, ChatGPT can now have back and forth voice conversations with users.


