- AI Breakfast
- Posts
- Google’s “Project Ellmann” May Become Your Biographer
Google’s “Project Ellmann” May Become Your Biographer
Good morning. It’s Monday, December 11th.
Did you know: 15 years ago today, Google Chrome was released?
In today’s email:
Google’s AI Developments
AI Governance, Regulation, and Ethics
AI in Business and Market Dynamics
AI and Human Interactions
5 New AI Tools
Latest AI Research Papers
New Feature: ChatGPT Creates New Yorker Cartoons
You read. We listen. Let us know what you think by replying to this email.
Interested in reaching 46,512 smart readers like you? To become an AI Breakfast sponsor, apply here.
Today’s trending AI news stories
Google’s AI Developments
> Google’s “Project Ellmann” explores the use of its AI model, Gemini, to create a multimodal chatbot biographer. This AI-driven system aims to chronicle a user’s life story by analyzing mobile data, such as photos and search history. It’s designed to recognize significant life events, provide deeper context to images, and infer personal milestones. An accompanying chatbot, “Ellmann Chat,” would possess prior knowledge about the user, enhancing the interaction. While still in the early experimental phase, its integration into products like Google Photos remains uncertain. Google told The Verge that Ellmann was an early internal experiment.
> Google’s AI-powered NotebookLM, developed using the new Gemini Pro, has transitioned from its experimental phase to a fully-fledged service. Initially introduced as Project Tailwind at I/O 2023, NotebookLM is designed to organize and summarize notes, highlighting key topics and generating questions for deeper understanding. The latest update boasts 15 new user-centric features, including a new noteboard space for pinning quotes and excerpts, direct citation links, and text summarization. It also suggests writing improvements and content formats like emails or scripts.
> Google DeepMind’s Oriol Vinyals, the team co-lead, addressed criticisms of the staged Gemini multimodal video. He clarified that the demo’s user prompts and outputs were real but shortened for brevity. The video aimed to inspire developers, as it exhibits potential uses of Gemini Pro and Ultra. However, it faced internal criticism for portraying an unrealistic ease of achieving results with Gemini, leading to memes and jokes among Google employees. Despite controversies, Google maintains the authenticity of the user input and output shown, although it was not real-time or speech-based.
AI Governance, Regulation, and Ethics
> The EU has reached a historic agreement on the world’s first comprehensive laws to regulate AI. This landmark deal, achieved after a strenuous 37-hour negotiation, is set to govern AI, social media, and search engines. The new regulations focus on high-risk AI applications, emphasizing human rights and safety. They introduce a tiered system for regulation based on the computational intensity of AI models. The highest risk category now depends on the number of computer transactions (Flops) required to train the AI. The deal has yet to be formalized into law.
> MIT has published a series of policy briefs addressing AI governance, aimed at guiding U.S. policymakers. The main paper, “A Framework for U.S. AI Governance: Creating a Safe and Thriving AI Sector,” proposes utilizing existing regulatory bodies and legal frameworks for AI oversight. It emphasizes the importance of identifying AI tools’ purposes for appropriate regulation. The briefs suggest a practical approach, starting with existing high-risk human activities regulation, and extending these to AI. Additional papers cover specific AI challenges, including misinformation and surveillance.
AI in Business and Market Dynamics
> UK’s Competition and Markets Authority (CMA) is scrutinizing Microsoft and OpenAI’s partnership for potential ‘relevant merger’ implications. This follows recent management upheavals at OpenAI, including Sam Altman’s brief dismissal and reinstatement. Microsoft’s increased involvement, now with a board seat, raises competition concerns. The CMA’s investigation will examine if their relationship, including Microsoft’s significant investment and collaborative AI development, effectively restricts market competition. The inquiry will assess whether the partnership meets specific regulatory criteria for a ‘relevant merger’, considering their influence over AI foundation models and the broader market.
> Elon Musk’s AI startup xAI recently launched its ChatGPT competitor Grok, for Premium+ subscribers. However, Grok faced scrutiny after allegedly citing OpenAI’s use case policy in response to a user query suggesting it might utilize OpenAI’s codebase. Musk fired back at these claims, highlighting xAI’s data sources from the web, including prevalent ChatGPT outputs. xAI co-founder Igor Babuschkin clarified that Grok doesn’t use OpenAI code, and an update is planned to address the issue. Musk, an original co-founder of OpenAI, distanced himself from the company in 2018.
AI and Human Interaction
> Ray Kurzweil, a former Google engineer and author of The Singularity is Near, predicts that by 2030 humans could achieve immortality through the use of nanobots. These nanobots, he claims, will repair damaged cells and tissues, effectively reversing aging and making humans immune to lethal diseases. Kurzweil, known for his accurate predictions in the past, foresees these developments in the realms of genetics, nanotechnology, and robotics, potentially transforming human existence.
> Presto Automation, a company offering AI-powered drive-thru ordering services, is reportedly using human labor for over 70% of its order, contrary to its claims of advanced automation. According to SEC filings, the majority of Presto’s order completions involve off-site human workers, particularly from the Philippines. This practice misleads consumers about the true nature of AI capabilities. Instances like the Nate app, despite AI claims, highlight ethical concerns about AI authenticity and labor exploitation in lower-wage regions.
In partnership with DEMOSTACK
Demostack has just released its AI Data Generator for demos.
Now sales teams can customize demos at scale with Generative AI.
3 Demo Use Cases for the AI Data Generator:
Scrub PII: Ensure customer data protection without burdening product or R&D teams.
Make Dummy Data SMART: Generating context-specific data for every demo scenario.
Fill Demos with Data: Add data to empty, stale, or outdated demo environments.
When expanding to international markets, new industries, launching new products, or going upmarket, the demo must reflect the new prospect.
Sell internationally with local phone numbers, names and addresses
Break into new verticals with new product content and terminology
Sell upmarket by intelligently replacing smaller numbers with larger numbers
Customized demos land better and win deals.
5 new AI-powered tools from around the web
Holly AI is hiring on autopilot. Automated candidate vetting with your AI-powered virtual recruiter. Holly helps you source, vet and engage with candidates.
Video Translation by Akool transforms videos with one-click translation, featuring audio-detection, high-quality translations, natural voice dubbing, and synchronized lip movements for global reach, educational accessibility, and enhanced business and content engagement.
CopilotKit, an open-source platform revolutionizes app development with Copilot Portal for in-app AI chatbots and Copilot Textarea for AI-assisted writing. Designed for developers, it simplifies integrating AI features into React apps.
Vizard, an AI-driven tool, repurposes long videos into engaging social media shorts for platforms like TikTok and YouTube. It automates cutting, reframing, captioning, and publishing, streamlining content creation for a broader audience reach.
Flux streamlines PCB and circuit design via a browser-based app. Flux Copilot. The tool offers intuitive AI assistance for brainstorming, datasheet queries, part connections, and research, enhancing the agility and efficiency of hardware development.
arXiv is a free online library where researchers share pre-publication papers.
“Everything of Thoughts” (XOT) presents a new approach to thought generation in LLMs, surpassing the limitations of existing paradigms like Chain-of-Thought. XOT integrates Monte Carlo Tree Search (MCTS) and reinforcement learning, enhancing LLMs’ problem-solving abilities by incorporating external knowledge into thought processes. This method demonstrates remarkable performance, efficiency and flexibility across various complex tasks like the Game of 24, 8-Puzzle, and Pocket Cube, significantly outperforming existing methods. XOT’s unique ability to generate comprehensive cognitive mappings with minimal LLM interactions and its capacity for unconstrained thinking allows it to handle problems with multiple solutions efficiently.
The paper introduces a model that combines 2D and 3D diffusion processes to generate high-quality 3D objects. This model addresses the limitations of previous 3D generative models, such as low texture quality and geometric inconsistencies, by integrating a signed distance field (SDF) for 3D learning and multi-view images for 2D learning. It utilizes pretrained 3D and and 2D diffusion models, fine-tuned jointly for enhanced 3D generation. The model allows separate control over 2D texture and 3D geometry, and its outputs can be used as initializations for optimization-based methods, reducing processing time significantly. This approach leads to diverse, scalable, and high-quality 3D generation, as validated by training on ShapeNet and Objaverse datasets.
The paper introduces a new tree-search-based reasoning path generation method for Large Language Models (LLMs). Designed to tackle complex tasks requiring multi-step reasoning, PATHFINDER employs dynamic decoding with varying sampling methods, integrating constraints for enhanced quality and efficiency. It features pruning to manage computational load and uses similarity-based functions for accurate candidate selection. The approach improves performance on arithmetic and commonsense reasoning tasks, revealing superior capability in generating longer, diverse, reasoning chains compared to existing models.
The research introduces a new, resource-efficient training strategy for text-to-image (T2I) diffusion models. ECLIPSE utilizes just 3.3% of the parameters and 2.8% of the data compared to advanced models like DALL-E-2, yet surpasses baseline T2I priors in performance. This method leverages pre-trained vision-language models, such as CLIP, for knowledge distillation into a compact prior model, enhancing efficiency without sacrificing quality. ECLIPSE’s performance is validated with an average of 71.6% preference score under limited resources and achieves results on par with larger models in terms of text composition abilities. This approach offers a promising direction for generating high-quality T2I models with significantly reduced computation and data requirements.
DreaMoving is a novel video generation framework using diffusion models to create high-quality, customized human dance videos. It features Video ControlNet for motion control and Content Guider for identity preservation, utilizing data from Internet-sourced danced video and captions from Minigpt-v2. The framework enhances temporal consistency with motion blocks and offers precise appearance control though image and text prompts. It efficiently adapts to various styles, demonstrating strong generalization capabilities in human-centric video generation.
Enjoy our new feature:
ChatGPT + DALLE 3 Creates New Yorker Cartoons
Thank you for reading today’s edition.
Your feedback is valuable. Respond to this email and tell us how you think we could add more value to this newsletter.
Interested in reaching smart readers like you? To become an AI Breakfast sponsor, apply here.