BAM Key Details:

  • Open AI GPT-4, the newest iteration of OpenAI’s Large Language Model—the tech behind ChatGPT—is arriving this month. And it’s multimodal, presenting yet another challenge to one of its most publicized rivals, Google’s Bard. 
  • Multi-modal AI can operate within multiple kinds of input—going beyond text to include speech, images, and video—while GPT-3 and GPT-3.5 are text-only. 

The newest iteration of ChatGPT is just around the corner, expected to arrive this month. 

And, according to the latest news from Microsoft Germany, it will be multimodal, meaning Google has even further to go to keep up—especially after the very public mistakes made by its own AI search engine (Bard). 

After all, when users type a question into an actual search engine, relied upon by millions of people across the globe, they have a reasonable expectation that the results will be accurate. 

Still, don’t discount Google yet. After all, it’s synonymous with “online search,” and that likely won’t change anytime soon. 

It’s also worth noting that multimodal functionality doesn’t guarantee accuracy. We’ve already mentioned some of ChatGPT’s limitations, especially for those who expect it to do all their work for them (which we do not recommend). 

That said, the news about GPT-4 is exciting. After all, GPT-3 has already changed how real estate professionals work—and we can’t wait to dive into the next version.

Microsoft Germany confirms the (approximate) arrival date

Andreas Braun, CTO of Microsoft Germany, confirmed in a recent news release that GPT-4 is coming within a week of March 9, 2023—and that it will, in fact, be multimodal. 

Modality refers to the type of input an AI can process. Multimodal AI can operate within multiple kinds of input, going beyond the usual text to include images, video, and sound/speech.

According to the MS Germany news report, GPT-4 may be able to operate in at least those four modalities.

We will introduce GPT-4 next week, there we will have multimodal models that will offer completely different possibilities—for example, videos…

Andreas Braun

CTO of Microsoft Germany

For reference, GPT-3 and GPT-3.5 only operated in one modality—text. 

While Holger Kenn, Microsoft’s Director of Business Strategy explained what multimodal generally means and what those modalities could include, the report didn’t say whether GPT-4, specifically, will be able to function within all four of the previously mentioned modalities. 

But it seems likely that those four were explicitly mentioned for a reason. 

Kenn explained what multimodal AI is about and what it can potentially do, like translating text into images or even into music and/or video. 

Microsoft is also working on “confidence metrics” to ground their AI with facts to strengthen the accuracy of its responses. 

Microsoft Kosmos-1

While underreported in the U.S., at the beginning of March 2023, Microsoft Germany released another multimodal AI language model called Kosmos-1. 

According to the German news site 

The team subjected the pre-trained model to various tests, with good results in classifying images, answering questions about image content, automated labeling of images, optical text recognition and speech generation tasks….Visual reasoning, i.e., drawing conclusions about images without using language as an intermediate step, seems to be a key here…

Holger Kenn

Microsoft Director Business Strategy

Kosmos-1 integrates the modalities of text and images. GPT-4 goes even further, adding video and, by all appearances, sound.  

GPT-4 works across multiple languages

Reporting confirms that GPT-4 can work across all languages. One example has it receiving a question in German (or English, etc.) and giving an answer in Italian. And while that sounds like an improbable request, it becomes more practical when the user has a question to ask and the source material is in a language they don’t speak. 

The point of this breakthrough is to transcend language barriers by pulling knowledge from sources across all languages. That would make GPT-4 similar to Google’s multimodal AI called MUM (Multitask Unified Model), which can provide answers in English when the data for it exists only in another language. 

GPT-4 applications

There’s no explicit announcement as to where GPT-4 will show up, but the report did mention Azure-OpenAI. 

Meanwhile, Google’s continuing struggle to integrate competing AI technology into its own search engine strengthens the perception that they’re falling behind. White Google already integrates AI in products like Google Lens and Google Maps, among others, GPT-4’s approach is to use AI as assistive technology to help users with small tasks. 

Those little things have a way of piling up and taking time and energy away from tasks that can’t be delegated.

No wonder ChatGPT has already captivated real estate professionals and content creators across the globe. We’re already dreaming up ways to incorporate the different modalities, and can’t wait to test them out. And, as always, we’ll keep you updated.