Quick Links
-
How to Chat With Your PDFs In Gemini
A truly one-of-a-kind use for generative AI models is to scan a long document and give it prompts based on that. In a way, you’re chatting with the PDF. I want to show you how you can do just that right within Google Drive.
Why Chat with PDFs?
You can ask a chatbot just about anything, and it’ll give you specific and accurate information pulled from the PDF you supplied.
For example, you could give it a textbook and ask it to summarize a chapter, explain a diagram, solve a problem, draw a table, create a cheat sheet, design a study plan, or make flashcards. Maybe you could even ask it to create a practice quiz. The chatbot can act like a tutor and teach you from the textbook. The possibilities are endless.
It feels like something out of science fiction because these bots are surprisingly good at it. When chatting with a PDF, the bot is less likely to just fabricate info, and you can always ask it to refer you to the page number to verify the details.
Generally, these bots don’t do well with big PDF files. They either have a limit on file size or they’re locked behind a paywall. Even if they let you upload a big file, they might lose the context after a few texts. That’s because bots, powered by large language models, rely on something called tokens to retain the “context” of a conversation. A token is a unit made of roughly four characters of text. An AI bot only has a limited number of tokens to play with.
Broadly speaking, the more tokens a bot supports, the longer it can “remember” the ongoing conversation without losing context. When a bot loses context, it “forgets” the previous conversation, meaning you have to feed it the same information all over again. And the longer your document, the faster you get to that point.
Google’s Gemini won’t lose context easily because it supports some 1 million tokens for the document analysis context window. According to Google, it’s better than any other commercial bot. If you’re working with lengthy documents, Gemini will do a better job than ChatGPT.
Gemini integrated with Google Drive is the best way I’ve found for working with PDFs. You can ask Gemini questions about the PDFs, prompt it to generate content based on the PDF, or combine PDFs with other files in your Google Drive to build a better context. I say PDF, but chats work with any document type, including Google Docs.
AI chatbots are amazingly clever tech, but they also spit out made-up or incorrect information (at times it can even be harmful info). You shouldn’t seek financial, legal, or medical advice from them. Google warns against taking any professional advice from Gemini. I wouldn’t even recommend uploading a sensitive PDF (say, your medical records or banking details) to these services.
Gemini in Google Drive is Powerful
It doesn’t matter what kind of PDF you’re working with. Gemini handles scanned PDFs and long, complicated PDFs really well. I even sent it a PDF of sheet music, and it was able to understand what it was and explain it to me. Even complicated formatting and images didn’t throw it off.
To be clear, I don’t mean it’ll answer every question about a piece of sheet music—you still might get the occasional “I’m still learning and can’t help with that,” but it does surprisingly well.
Gemini for Google Drive is bundled with premium Google accounts. If you want to use it for free on a personal account, you’ll need to activate Google Workspace Labs. Google has locked Workspace Labs behind an invite-only system, so you can only activate it when Google invites you to join the beta testing program. You might have seen an invitation to enable AI in Google Docs or other Workspace apps. If you enable Labs anywhere, you should immediately see Gemini in your Google Drive as well.
The mobile app doesn’t have this feature. Instead of the Google Drive app, you can use the Gemini Android app with the Google Workspaces extensions enabled. That’s just a workaround though, and it doesn’t work all that well.
For the best experience, log into your Google Drive on the desktop web browser.
Gemini in Google Drive supports these seven languages: Spanish, French, German, Italian, Japanese, Korean, and Portuguese.
How to Chat With Your PDFs In Gemini
You can access a PDF within Gemini in two ways:
Click the Gemini button on Google Drive (the sparkle icon in the top corner). A chat box should open asking for a prompt. Type “@” here, followed by the PDF file name. Gemini will give you autofill suggestions as you type. Once you’ve selected the right file, type your question or prompt and send it.
Alternatively, you can right-click on the file with your mouse and choose “Ask Gemini” from the context menu. Gemini will autofill the filename with the prompt “Tell me about this file” for you. It’ll generate a detailed overview of the PDF. Longer PDFs will get long, detailed summaries. If the file isn’t already in your Google Drive, you’ll have to drag and drop it in from your computer storage.
If you need to bring in another file (it doesn’t have to be a PDF) for added context, type “@” again and give Gemini the file name. Each file should have its own chip.
From here, you can follow up with any queries or prompts you have. If Gemini loses context, you can, once again, type “@” followed by the file name to bring the bot back on track. You will also find a sources tab at the bottom of Gemini’s responses. Depending on how many files you’re working with, you can either have a single source or multiple ones.
Taking Gemini in Google Drive For a Spin
Allow me to show you what all this looks like with a real world example. I started a conversation with Gemini by asking about a 400-page biology textbook, weighing around 50MB. I asked Gemini to teach me a section, and it gave me a breakdown of the whole thing. It reads the text and even “sees” the visuals.
I followed up with more queries and it answered splendidly. I asked it to organize the information into a table for better clarity. It did that beautifully too. I even asked it to explain graphs and diagrams just by specifying the page number and the figure number. It found exactly what I needed and explained it in plenty of detail. I wrapped up the conversation with a request for flashcards and a mock quiz. Both seemed helpful and error-free.
Gemini within Google Drive does a great job of assisting with PDFs. And the best part is you don’t have to worry about uploading multiple files or losing context. It’s all already there on your Drive. I emphasized PDFs, but it works equally well for other document types too. If you’ve worked with Google Docs for some time, you can now search and interact with that entire library using Gemini.
Source link