Barbara is a tech writer specializing in AI and emerging technologies. With a background as a systems librarian in software development, she brings a unique perspective to her reporting. Having lived in the USA and Ireland, Barbara now resides in Croatia. She covers the latest in artificial intelligence and tech innovations. Her work draws on years of experience in tech and other fields, blending technical know-how with a passion for how technology shapes our world.
Ever since ChatGPT made waves, tech companies have been racing to release their artificial intelligence competitors, and Google has stepped up its AI game with Gemini.
Launched in December 2023 by Google DeepMind, this next-generation model has been integrated across a range of Google products, including Google Search and Workspace and even Pixel phones, making these tools smarter and more responsive and intuitive.
So what is Gemini? It's a large language model developed to understand and generate text that is very similar to what a human might write. It was designed to integrate advanced AI into everyday user interactions and complex enterprise solutions, and you can interact with Google's LLM via the Gemini chatbot on the web or via mobile app.
Gemini has four models: Ultra, Pro, Flash and Nano, each designed for different use cases. One new standout feature is its expanded token context window, which allows for more extended and coherent responses. Gemini 1.5 Flash now offers a 1 million token window, while the 1.5 Pro model pushes that to a whopping 2 million. In comparison, ChatGPT caps at 32,000 tokens in the expanded version.
If you haven't yet developed a soft spot for everything AI-related and all of this sounds confusing, keep on reading.
As technology develops, new artificial intelligence terminology arises. Before we dive deeper, let's quickly break down some previously mentioned key terms.
Generative AI (or gen AI) refers to AI systems that can create content -- think text, images or even music -- based on the data they've been trained on. LLMs like Gemini are a type of generative AI. They learn from massive datasets of text and code and then use that knowledge to understand and generate human-like text.
You've likely interacted with an LLM before, whether through an online customer service chatbot, or even ChatGPT.
These chatbots use LLMs to engage in real-time conversations, provide you with information and solve problems -- though sometimes, they miss the mark and give some weird answers. That is called AI hallucinations. And Google had its fair share of mishaps with them. But, more on this later.
Tokens are the building blocks of text that AI models use to process language. When AI reads and generates text, it breaks everything into small chunks called tokens. These can be whole words, parts of words or even punctuation. For example, in the sentence "Hello, world!" the AI might treat "Hello" and "," as separate tokens.
So when we talk about token limits (e.g., the aforementioned million token context window Gemini has), we're talking about how much the AI can "remember" from the conversation to keep things coherent and relevant.
One thing to keep in mind: as with any new technology, Gemini is still under development, and artificial intelligence continues to improve practically daily.
So how does Gemini play into the devices you use every day? For starters, it's built right into Google's Pixel phones, supercharging many of the phones' AI features.
You've probably used your Pixel to transcribe a voice message or generate a quick email response. That's Gemini doing its magic in the background. It basically helps Pixel get things done faster and more intuitively.
Gemini also plays a big role in AI Overviews on Google Search. If you've noticed more detailed, contextually rich answers popping up at the top of your searches, that's because of this integration. Gemini helps break down complicated topics into bite-size explanations in the search results.
Google came under fire for some advice AI Overviews threw out to users at its launch, including things like eating rocks daily and putting glue in pizza recipes. Google reacted promptly and has since fine-tuned its tool.
All users in the US aged 13 and older who manage their own Google accounts can access AI Overviews. In other countries, such as the UK, India, Mexico, Brazil, Indonesia and Japan, users aged 18 and above can also access this feature. Google plans to continue expanding it globally, with the goal of reaching over a billion users by the end of 2024.
Now some people don't like this feature, and the downside is you can't disable AI Overviews. However, we've covered a few workarounds that might help with that.
When Gemini first launched, it didn't take long for things to go sideways. Google faced criticism over some hallucinations and how it depicted historical figures and different races. It made headlines for showing Black and Asian Nazi soldiers, which, as you can imagine, didn't go over well. Critics accused Google of trying too hard to show diversity, but instead, it made things worse. Google hit the brakes on Gemini's image generation, promising to clean up the mess.
On Aug. 28, after refining the technology, Google announced the latest version of its text-to-image tool, Imagen 3, which will soon be available to Gemini Advanced, Business and Enterprise subscribers. However, the ability to generate images of people is still on hold as Google plays it safe this time around.
Earlier, on Aug. 13, Google launched Gemini Live for Advanced subscribers on Android devices, with plans to expand to iOS soon. Gemini Live offers hands-free, real-time conversations with 10 new voice options, even when the app is in the background or your phone is locked. You can also pause and resume conversations whenever you want, which is a neat feature.
Gemini is free as a personal AI assistant, offering access to the 1.5 Flash model with a 32,000 token context window -- perfect for long, back-and-forth conversations. But for more advanced features, Gemini has a few subscription plans:
For developers and businesses, Google has set up a tiered pricing structure for its Gemini API models, like Flash and Pro. Developers can access them through Google Cloud's API services and integrate AI capabilities directly into applications.
Both models offer scalable AI usage, with pricing and rate limit differences based on the tier and token length. You can check detailed pricing rates on Google's official website. There's also a free tier, giving you a taste with limited usage -- great for testing the waters before diving in.