The race is on to bring the technology behind ChatGPT to the smartphone in your pocket. And judging by the astonishing speed at which technology is advancing, the latest moves in artificial intelligence could transform mobile communications and computing much faster than seemed likely just a few months ago.
As tech companies rush to incorporate generative AI into their software and services, they face significantly higher processing costs. The concern weighed heavily on Google, with Wall Street analysts warn that the company’s profit margins could be squeezed if Internet search users expect AI-generated content in standard search results.
Running generative AI on mobile phones, rather than through the cloud on servers run by Big Tech groups, could answer one of the biggest economic questions raised by the latest tech fad.
Google said last week that it was able to run a version of PaLM 2, its latest large language model, on a Samsung Galaxy phone. While he hasn’t publicly demonstrated the scaled-down model, called the Gecko, the move is the latest sign that a form of AI that required computing resources only found in a data center is quickly starting to make its way to many more places.
The shift could make services like chatbots much cheaper for businesses to run and pave the way for more transformative applications using generative AI.
“You have to make AI hybrid… [running in both] the data center and locally, otherwise it will cost too much money,” Cristiano Amon, chief executive of mobile chip company Qualcomm, told the Financial Times. Leveraging unused processing power on mobile phones was the best way to spread the cost, he said .
When the launch of ChatGPT late last year brought generative AI to widespread attention, the prospect of bringing it to phones seemed distant. In addition to training the so-called large language models behind such services, the work of inference, or running the models to produce results, is also computationally demanding. Phones lack the memory to hold large models like the one behind ChatGPT, as well as the processing power needed to run them.
Generating a response to a query on a device, rather than waiting for a remote data center to produce a result, could also reduce latency or lag when using an application. When a user’s personal data is used to refine generative responses, keeping all processing on a phone could also improve privacy.
More than anything else, Generative AI could make it easier to do common tasks on a smartphone, for example when it comes to things that involve text output. “You could embed [the AI] in every office application: get an email, suggest a response,” Amon said. “You’re going to need the ability to do these things both locally and in the data center.”
Rapid advances in some of the underlying models have changed the equation. The biggest and most advanced ones, like Google’s PaLM 2 and OpenAI’s GPT-4, have dominated the headlines. But an explosion of smaller models has made some of the same capabilities available in less technically demanding ways. They have benefited in part from new techniques for fine-tuning language models based on taking more careful care of the datasets they are trained on, reducing the amount of information they must contain.
According to Arvind Krishna, chief executive officer of IBM, most companies looking to use generative AI in their services will get much of what they need by combining a number of these smaller models. Speaking last week when IBM announced a technology platform to help its customers tap into generative AI, he said many would choose to use open source models, where the code is more transparent and could be adapted, in part because it would be easier to fine-tune the technology using your own data.
Some of the smaller models have already demonstrated amazing capabilities. They include LLaMa, an open source language model released by Meta, which is said to have matched many of the features of the larger systems.
LLaMa comes in various sizes, the smallest of which has only 7 billion parameters, far less than the 175 billion in GPT-3, the revolutionary OpenAI language model released in 2020 — the number of parameters in GPT-4, released this year, was not disclosed. A search model based on LLaMa and developed at Stanford University has already been shown running on one of Google’s Pixel 6 phones.
In addition to their much smaller size, the open source nature of models like this has also made it easier for researchers and developers to adapt them to different computing environments. Qualcomm earlier this year showed off what it claimed was the first Android phone with the Stable Diffusion imaging model, which has about 1 billion parameters. The chipmaker has “quantized” or scaled down the model to make it easier to run on a phone without losing any of its accuracy, said Ziad Asghar, senior vice president at Qualcomm.
With most of the work on adapting models to phones still in an experimental stage, it was too early to gauge whether the efforts would lead to truly useful mobile applications, said Ben Bajarin, an analyst at Creative Strategies. He predicted relatively rudimentary apps, such as voice-activated photo editing functions and simple answering of questions, from the first wave of mobile models with parameters between 1 and 10 billion.
Zoubin Ghahramani, Vice President of Google DeepMind, the artificial intelligence research arm of the internet company, said its Gecko mobile model could process 16 tokens per second, a measure based on the number of short text units large language models work with. Most large models use 1-2 tokens per word generated, suggesting that Gecko could produce around 10-15 words per second on a phone, making it potentially suitable for suggesting text messages or short email replies.
The particular requirements of mobile phones have meant that attention has quickly shifted to so-called multi-mode models that can work with a range of images, text and other inputs, Qualcomm’s Asghar said. Mobile applications are likely to make heavy use of speech and images, he added, rather than the text-heavy applications more common on a personal computer.
The astonishing speed with which generative AI is starting to move to smartphones, meanwhile, is set to increase the focus on Apple, which has thus far stood apart from the speculative frenzy around the technology.
Well-known flaws in generative AI, such as the tendency for large models to “hallucinate” — or when the chatbot responds with fabricated information — meant Apple was unlikely to integrate the technology into the iPhone operating system for a while. time, Bajarin said. Instead, he predicted the company would look for ways to make it easier for app developers to start experimenting with the technology in their own services.
“That’s the position you’ll also see from Microsoft and Google: they’re all going to want to give developers the tools to go and compete [with generative AI applications,]Bajarin said.
With Apple’s Worldwide Developers Conference kicking off on June 5, preceded by Microsoft’s event for developers called Build, the fight for developer attention is about to get intense. Generative AI may still be in its infancy, but the rush to get into the hands of many more users – and pockets – is already booming.
—————————————————-
Source link