AI Breakfast
Posts
How To "Chat" With Your Documents Offline

How To "Chat" With Your Documents Offline

AI Breakfast
February 14, 2024

Happy Valentine’s Day. It’s Wednesday, February 14th.

Did you know: On this day in 2005, the Youtube.com domain was registered?

In today’s email:

AI Software and Infrastructure
AI Research and Development
AI Applications and Business
AI Ethics and Policy
5 New AI Tools
Latest AI Research Papers

You read. We listen. Let us know what you think by replying to this email.

^{In partnership with PROMPTHERO}

Want to learn how to create an AI influencer?

PromptHero is offering a course on how to do it.

Achieving character consistency is one of the biggest challenges when generating virtual influencers. PromptHero’s course will dive deep into the techniques needed to achieve it: ControlNet, LoRAs, ADetailer, custom models – and more!

These expert techniques will be key in your path to create all sorts of Al humans, from videogame characters to Al influencers.

There's a good chance that AI models will take over in the next decade. We're starting to see it with accounts that are already gathering millions of followers and have no humans involved.

AI Breakfast Readers can save 10% on the course here.

Today’s trending AI news stories

AI Software and Infrastructure

> Nvidia's "Chat with RTX" offers a radical new way to interact with your files. The free 35GB download leverages Retrieval Augmented Generation (RAG) technology, integrating state-of-the-art language models (Llama 2, Mistral) to transform your documents, notes, PDFs, and even YouTube transcripts into a highly personalized AI chatbot. This allows you to pose questions in natural language and receive answers tailored specifically to the information within your own data. For increased privacy, "Chat with RTX" processes everything locally on your Windows RTX machine (RTX 30 series GPU and 16GB RAM required).

> OpenAI has added a new memory feature to ChatGPT. This feature allows the chatbot to remember information from previous conversations and use that knowledge in new chats. Users can control what ChatGPT remembers and forgets, either directly in the chat window or through settings. This memory function likely works by creating a searchable database of facts extracted from conversations. The feature is currently being tested with a limited number of users and is expected to become more widely available in the future.

> Apple introduces Keyframer, an AI-powered tool that generates animation code from static images and natural language descriptions. Built upon large language models (likely GPT-4), Keyframer allows users to describe desired movements (e.g., "Make the clouds drift slowly to the left") and the tool creates the corresponding CSS animation code. Users have granular control, iteratively refining the animation through additional prompts or direct code edits. This research explores how LLMs can be applied to animation, with interviews highlighting the importance of user feedback and maintaining creative agency within the process.

> Nvidia CEO Jensen Huang fires back at OpenAI's Sam Altman, dismissing the need for a multi-trillion dollar AI chip initiative. Huang insists that ongoing gains in processor efficiency will make such an investment obsolete. Nvidia touts its CUDA architecture as the linchpin of AI development, providing researchers unmatched versatility and access to fuel innovation. While acknowledging the GPU's central role in the AI arms race, Huang hints at growing competition. Despite U.S. export controls, Nvidia seeks to maintain dominance in the lucrative Chinese market with modified GPUs, but warns of the rise of domestic Chinese AI chip makers.

> Cohere for AI releases Aya, an open-source multilingual large language model (LLM) exceeding the language coverage of existing models. Aya handles instructions in 100+ languages due to fine-tuning on a unique dataset of prompt/completion pairs across diverse languages, including underrepresented ones like Azerbaijani and Welsh. Dataset construction involved machine translation and human curation for cultural nuance. Aya surpasses other open-source multilingual LLMs, opening doors for language research, preservation, and inclusion in AI advancements.

> Sam Altman revealed GPT-5 will offer advancements in intelligence, speed, and potentially multimodal processing. Altman stresses the model's focus on generalizable intelligence rather than narrow task specialization. Though a release date remains unconfirmed, "Gobi", a rumored multimodal AI model trained on vast data and slated for spring 2024 has fueled speculation that it could be the anticipated GPT-5.

> Google expands generative AI capabilities for US and UK advertisers within its automatically created assets (ACA) feature. ACAs now leverage AI to instantly generate broader, more creative ad ideas. Advertisers who opt in get AI-generated headlines and ad copy, which work in tandem with their responsive search ads. This AI analyzes your landing page, existing ads, and keywords to ensure generated content aligns perfectly with what searchers want, boosting ad relevance. Google’s AI aims to help you explore various ad concepts, and they stress ACAs supplement your existing material - not replace it.

AI Applications and Business

> Microsoft appears to be developing “Automatic Super Resolution”, an AI-powered image upscaling tool for Windows 11 games. Spotted by an X user within the beta OS, it is said to be inspired by Nvidia’s DLSS technology. This could dramatically boost frame rates and visual detail cards. Unlike DLSS, it’s unclear if Microsoft’s upscaling requires specialized hardware. Nvidia’s DLSS leverages AI neural networks trained on high-resolution images to generate enhanced detail from lower-resolution game renders, boosting overall performance significantly. The feature's hardware requirements remain unannounced.

> ElevenLabs now allows you to earn money if you share your voice. The platform allows actors to create high-quality AI replicas of their voice and earn rewards each time their voice is used in the Voice Library. Actors retain control, setting rates and managing how their voice is utilized. For top talent, ElevenLabs offers licensing deals with upfront payments and increased visibility as a featured voice on the platform.

AI Ethics and Policy

> Protesters rallied outside OpenAI’s offices, demanding they end their Pentagon contract and halt all work on artificial general intelligence. Groups Pause AI and No AGI organized the action, triggered by OpenAI’s recent removal of language prohibiting military AI use from its policy. They fear that AGI will surpass human intellect and it holds dangers ranging from societal upheaval to psychological harm. Pause AI wants a global pause on AGI until it’s deemed safe, while No AGI opposes its creation entirely.

> The US Patent and Trademark Office (USPTO) maintains that only humans can be inventors, prohibiting AI systems from directly holding patents. However, the agency's updated guidance encourages AI use within the invention process. Inventors must disclose any AI usage and demonstrate a significant contribution to the invention's conception. Merely prompting an AI, overseeing its work, or being the owner doesn't make a person an inventor. These clarifications follow previous USPTO rulings denying patents and copyrights where AI systems were credited as the sole creative force.

> Penn Engineering became the first Ivy League university to offer an undergraduate degree in Artificial Intelligence (AI). This transformative program, supported by Raj and Neera Singh, addresses the urgent need for innovative AI leaders who can harness its power responsibly. Students will delve into machine learning, computing algorithms, data analytics, and advanced robotics, preparing them to create AI solutions that revolutionize diverse fields. The curriculum features world-renowned faculty and leverages the state-of-the-art facilities of Penn's data science hub, Amy Gutmann Hall.

5 new AI-powered tools from around the web

Ava is an AI-powered Business Development Representative (BDR) that automates prospecting, crafts personalized emails, and schedules meetings, streamlining sales pipeline.

Lazy AI lets you build web apps, AI tools, and automations with simple commands and one-click deployment.

Augie Storyteller uses AI-driven script generation, diverse animation styles, and customizable voice narration so you can transform your ideas into customized animated videos.

FForward.ai is an AI-powered tool that leverages natural language processing (NLP) to analyze customer interviews, extracting key needs, opportunities, and themes to enhance product roadmap prioritization.

InfoBaseAI integrates AI chat with your documents (PDFs, webpages) for Q&A, insights, note-taking, and side-by-side workflow with customizable AI models and plugins.

arXiv is a free online library where researchers share pre-publication papers.

📄 OS-Copilot: Towards Generalist Computer Agents with Self-Improvement

This paper introduces OS-COPILOT, a framework designed to build AI agents that seamlessly interact with various elements within a computer's operating system. Unlike previous efforts focused on single applications, OS-COPILOT allows agents to control web browsers, code, files, multimedia, and other software. Using OS-COPILOT, we developed FRIDAY, a self-improving AI agent that automates general computer tasks. FRIDAY outperforms existing AI assistants on the GAIA benchmark and demonstrates impressive self-learning capabilities, mastering control of unfamiliar applications like Excel and Powerpoint through practice.

📄 UFO: A UI-Focused Agent for Windows OS Interaction

UFO is a powerful AI agent developed by Microsoft that automates complex tasks on Windows computers. It understands your commands in plain language and uses GPT-Vision to analyze what's on your screen. UFO figures out which Windows applications to use, navigates their menus and controls, and even switches between programs to get the job done. A key feature is its ability to directly interact with the Windows interface, making it a true 'action' model. This makes UFO the first AI agent of its kind designed specifically for Windows, offering a new level of convenience and efficiency for users.

📄 World Model on Million-Length Video And Language With RingAttention

Researchers at UC Berkeley have developed an AI model named the World Model (LWM) capable of processing extremely long sequences of video and text. This advancement pushes the boundaries of AI understanding of both human knowledge and the physical world. A key technological innovation is the use of RingAttention, which allows the model to scale to million-length sequences without approximation. The paper addresses challenges associated with training on such massive datasets, proposing solutions like masked sequence packing, loss weighting, and model-generated QA datasets. LWM excels in tasks like understanding long videos (e.g., answering questions about hour-long YouTube compilations) and retrieving information from extremely long documents. Importantly, the researchers have fully open-sourced their 7B parameter LWM models, paving the way for further advancements in this exciting field.

📄 BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data

Researchers at Amazon AGI have developed BASE TTS, the largest text-to-speech model to date. Trained on a massive dataset of 100K hours of speech and comprising 1 billion parameters, this system sets a new benchmark in speech naturalness. One key innovation is the use of novel speechcodes – a representation built upon a speech self-supervised learning model. This technique disentangles speaker identity, allowing the model to focus on the core phonetic and prosodic elements of speech. Moreover, BASE TTS introduces a convolution-based decoder, translating speechcodes into waveforms with remarkable efficiency for real-time synthesis. The researchers investigated how scaling the model and dataset improves the system's ability to understand complex text, resulting in more contextually appropriate prosody. They created a specialized dataset to evaluate these "emergent abilities" in large-scale TTS models.

📄 Learning Continuous 3D Words for Text-to-Image Generation

Researchers at the University of Oxford and Adobe Research propose a novel way to add fine-grained control to text-to-image generation. Their method, Continuous 3D Words, introduces special tokens within text-to-image models which represent continuous attributes like illumination, object orientation, and camera effects. Users can adjust these attributes via sliders during generation, offering unmatched precision. The technique is trained using image renders created from a single 3D mesh and achieves exceptional control over a wide range of attributes. Furthermore, the researchers employ strategies to ensure proper attribute separation and generalization to new objects. Their study demonstrates that Continuous 3D Words enable significant improvements in image generation quality and expressiveness when used in conjunction with text prompts.

Where’s the comic?

As of this morning, ChatGPT is offline, sending a message of “Error in Moderation” to users.

Unfortunately, this has lead to a temporary pause in our ChatGPT Creates Comics feature.

See you on Friday… hopefully!

Thank you for reading today’s edition.

Your feedback is valuable. Respond to this email and tell us how you think we could add more value to this newsletter.

Interested in reaching smart readers like you? To become an AI Breakfast sponsor, apply here.