- Product Innovator
- Posts
- ⚠️ AI Models Running Out of Text Data by 2026
⚠️ AI Models Running Out of Text Data by 2026
Happy Tuesday! Last week was action-packed, and the news keeps heating up.
Welcome, tech friends
Happy Tuesday! Last week was action-packed, and the news keeps heating up.
Let's dive in! 👇
This week, TikTok and ByteDance are all over our Quick Hits. For Trending Tools, we’re showcasing the week’s most exciting products. This week’s Top Story looks at the looming AI data crisis. For this week’s Deep Dive, we review how ChatPLG uses Agentic RAG techniques to copilot go-to-market strategies. In this week’s Social Pulse, we pull from the Lenny vault with Ramp’s Head of Product, Geoff Charles.
This week:
⚡ Quick Hits: News from Amazon, ByteDance, and MIT
🏆 Top Story: AI Models Running Out of Text Data by 2026
🔍 Deep Dive: Using Agentic RAG to power ChatPLG
🌐 Social Pulse: Geoff Charles of Ramp and Solano’s Teleport
🔥 TRENDING TOOLS
🔒 Subscribe to gain access to this feature.
🏆 TOP STORY
⚠️ AI Models Running Out of Text Data by 2026
Made in Midjourney: “a futuristic scene of an empty bookshelf --v 1”
TL;DR: AI models might use all the internet’s available text data by 2026. This scarcity could force tech companies to turn to private information, synthetic data, or lower-quality sources, posing significant ethical and technical challenges.
Key points:
Data Depletion: AI models could consume the internet's freely available text data by as early as 2026.
Future Data Sources: To continue improving, AI might need to use private data, synthetic data, or low-quality sources.
Quality Concerns: Algorithms trained on insufficient or low-quality data produce unreliable outputs.
Legal and Ethical Issues: Harvesting private or intellectual property data without permission could lead to legal battles.
Other Bottlenecks: Future AI advancements could be hindered by power consumption, training costs, and hardware availability.
Why it matters?
AI models' potential exhaustion of internet text data underscores an era in LLM development. As high-quality data runs out, relying on private or synthetic data introduces severe ethical concerns and legal challenges.
Moreover, this shift could compromise the quality and reliability of AI outputs, affecting various applications from everyday technology to critical systems. The tech industry's ability to adapt to these changes while maintaining ethical standards will be crucial.
🔍 DEEP DIVE
An in-depth breakdown of something interesting
Using Agentic RAG to power ChatPLG
Today, we’re diving into something super cool in the world of AI called Agentic RAG.
Most tech professionals have probably heard of RAG, but Agentic RAG takes knowledge-gathering a step further by using multiple agents.
Introducing RAG
Imagine you have a giant library in your brain. That’s an LLM. It is trained on millions of pages of data. But sometimes, LLMs get a little stuck because they can only use what they already know.
This is where RAG steps in. RAG stands for Retrieval Augmented Generation, a technique that combines the strengths of retrieval- and generative-based artificial intelligence.
RAG acts like a supercharged librarian, pulling together the most relevant info from vast resources. It finds the correct information and helps the LLMs give better answers.
RAG-based startups have been the talk of the town. They’ve been snatching venture capital for several years, mainly for their timing and monetizable use cases, like customer service and sales.
RAG acts like a supercharged librarian, pulling together the most relevant info from vast resources.
ChatPLG RAG setup
Instructions
Role
Objectives
Tasks
Desired behavior
Expertise
Prompt security
Vector Database
General knowledge
RAG tools (building agentic inside)
Imagine having a team of genius assistants at your fingertips, always ready to provide accurate and insightful answers. Agentic RAG is like that, but each genius is called an agent and has a particular skill. When you ask a tricky or multi-threaded question, these agents work together to find the best information, compare facts, and even ask more questions to ensure they get everything right.
ChatPLG uses RAG in its current workflow (with primitive Agentic) powered by GPT-4 and uses the following architecture:
How Do These Smart Agents Work?
Breaking Down Questions: If you ask a big question like, "What are the latest discoveries in space?" agents will split it into smaller questions like "What new planets have been found?" and "How do these discoveries affect our knowledge of space?"
Finding Information: Each agent searches for the best information, just like a detective looking for clues. They use different tools and databases to get the most accurate answers.
Putting It All Together: After gathering all the info, agents work together to give you a complete, well-thought-out answer.
Seeing ChatPLG RAG in action
For this demo, I’ve considered creating a SF Dinner Series for Jobseeking Product Managers. The demo below shows how easy it it easy to generate actionable strategy with assistants like ChatPLG.
There are several “notice this RAG” moments in the video below:
The feature list
The url scraper
It’s generative in examples
It’s step fashion
Here is a 60-second video of me using ChatPLG to bootstrap GTM for this Product Manager Dinner Series concept:
Why is Agentic RAG So Cool?
🧠 Smart Planning: Agents carefully plan their steps, breaking down big questions into smaller, manageable tasks to ensure they gather the most accurate and relevant information.
🛠️ Resourceful Tool Use: These agents can access and utilize various tools and resources, from private on-premise servers to internet search engines, to find the best and most up-to-date answers quickly and reliably.
🔄 Continuous Learning: The more agents work, the smarter they become. They learn from each question and answer, constantly improving their ability to handle complex queries to provide better results.
Real-Life Superpowers
Agentic RAG can help in many incredible ways, such as:
📚 Helping Students: Imagine having a super smart tutor who can answer any question, help with homework, and provide detailed explanations for complex topics, making learning easier and more fun.
🔬 Assisting Scientists: Scientists can use Agentic RAG to quickly gather and understand new research, compare findings across multiple studies, and even generate summaries of the latest discoveries, speeding up their work and leading to new breakthroughs.
🤝 Enhancing Customer Service: Companies can use Agentic RAG to handle customer inquiries with precision, providing quick and accurate responses to tricky questions, solving problems efficiently, and improving overall customer satisfaction.
⚕️ Supporting Doctors: In the medical field, doctors can use Agentic RAG to access the latest medical research, diagnose conditions more accurately, and find the best treatment options, enhancing patient care.
✍️ Empowering Writers: Authors and content creators can leverage Agentic RAG to generate ideas, conduct in-depth research, and even draft content, making the writing process faster and more innovative.
⚖️ Boosting Legal Research: Lawyers can utilize Agentic RAG to sift through vast amounts of legal documents, precedents, and case studies, helping them build stronger cases and stay updated on legal developments.
🏫 Improving Education Systems: Schools and educators can use Agentic RAG to create personalized learning plans, develop curriculum materials, and provide additional support to students, enhancing the educational experience.
📈 Enhancing Business Decisions: Businesses can rely on Agentic RAG to analyze market trends, gather competitive intelligence, and provide data-driven insights, leading to better strategic decisions and growth opportunities.
💻 Facilitating E-Learning: Online learning platforms can integrate Agentic RAG to offer personalized study materials, real-time feedback, and interactive learning experiences, making education more accessible and engaging.
🔍 Optimizing Research Projects: Researchers across various fields can use Agentic RAG to compile and analyze data, generate hypotheses, and synthesize information from diverse sources, driving innovation and discovery.
The Future of RAG
As technology continues to advance, RAG and Agentic RAG techniques will become more mainstream.
The future of AI is bright and full of possibilities. Whether you’re a student, a professional, or simply curious, understanding the basic principles of RAG can dramatically enhance your workflows and your understanding of how to apply generative AI.
🌐 SOCIAL PULSE
Highlights from social media and key topics of the week
You're telling me,
there's an 'Uber' competitor on Solana called 'Teleport'?
I just came across this. I own 0, not sure if there's a token.
- Drivers earn "much more".
- Users pay "a lot less".
- Uber takes 44%, teleport takes 15%
- Onchain rewards to motivate early adopters
-… x.com/i/web/status/1…— MattyVerse (@DCLBlogger)
1:40 AM • Jun 20, 2024
Subscribe today to access:
✔️ Weekly curated tool lists
✔️ Access to deep-dives, guides, and templates
✔️ Subscriber-only community offers and opportunities
From the author:
PS. Now that you’ve signed up for the newsletter, you’ll receive updates in your inbox. You can also log in to the website to read the full archives and other posts as they’re published.
🛎️ General Housekeeping Notice: Check your spam folder if you can’t find the newsletter. And please mark this address as ‘not spam.’ If the newsletter isn’t in your spam folder, look in the Promotions tab.
You can always see everything on the website.
Thanks again, and please tell a few friends about this community!