- AI Breakfast
- Posts
- Gemini Video Hoax, Grok Available, and AI Mind Reading?
Gemini Video Hoax, Grok Available, and AI Mind Reading?
Good morning. It’s Friday, December 8th.
In today’s email:
AI Model Developments and Launches
AI Integration in Devices and Software
AI Innovations and Breakthroughs
OpenAI Controversy
10 New AI Tools
Latest AI Research Papers
You read. We listen. Let us know what you think by replying to this email.
Interested in reaching 46,310 smart readers like you? To become an AI Breakfast sponsor, apply here.
Today’s trending AI news stories
AI Model Developments and Launches
> Google has unveiled its advanced AI model, Gemini, comprising three versions: Nano, Pro, and Ultra. Nano, designed for mobile services, and Pro, rivaling GPT-3.5, are currently accessible, while Ultra aims to outperform GPT-4 in 2024. Gemini Pro, integrated into Bard, has shown superior performance in six out of eight benchmarks, most notably in Massive Multitask Language Understanding, or MMLU.
Google’s roadmap includes embedding Gemini across various products, with the Nano model featuring in the Pixel 8 Pro for summarizing voice memos. Independent evaluations of Gemini’s capabilities, particularly in audio and video understanding, are anticipated.
Google’s Bard chatbot, initially trailing behind ChatGPT, is now significantly enhanced with the integration of Google’s new Gemini AI model. Gemini Pro powers Bard in English across 170 countries, with Google CEO Sundar Pichai noting its improvement in factual responses and coding assistance. A future upgrade, Bard Advanced, will utilize Gemini Ultra for more sophisticated multimodal interactions.
However, Google's Gemini AI demo video, which reached several million views in the first day, has been revealed as misleading. The video, titled "Hands-on with Gemini: Interacting with multimodal AI," showcased the AI model's supposed real-time, intuitive interactions across various tasks, including sketch interpretation, voice queries, and gesture recognition. However, it's now disclosed that the demo used still image frames and text prompts, not live interactions. While the responses were generated by Gemini, the video misrepresented the AI's actual capabilities and interaction modes. The video's content, differing significantly from the actual AI's capabilities, now raises concerns about the integrity of Google's AI demonstrations and the reliability of their AI technology. | Read the Gemini whitepaper
> Meta launches “Imagine with Meta,” a standalone web-based generative AI experience for creating high-resolution images from text prompts, similar to DALL-E and Stable Diffusion. Initially available in the U.S., it produces four images per prompt. Meta plans to implement invisible watermarks for transparency and traceability. AI-generated images previously raised concerns over racial bias. The watermarking, aligns with increasing regulatory emphasis on AI-generated content transparency.
> Apple introduces MLX and MLX Data, open-source machine learning frameworks designed to efficiently run on Apple Silicon. MLX, inspired by PyTorch and Jax, allows data to be processed on supported devices without movement, enhancing efficiency. MLX Data, compatible with MLX, PyTorch, and Jax, focuses on efficient data loading. These tools mark Apple’s strategic entry into the AI development space, traditionally known for its conservative approach and emphasis on embedding AI into products without highlighting AI in marketing.
> Liquid AI, an MIT spinoff led by robotics expert Daniela Rus, is developing a new type of AI dubbed liquid neural networks. These networks, smaller and less resource-intensive than traditional AI models, draw inspiration from the simple neural structures of roundworms. They excel in processing sequential data and adapting to new circumstances, making them suitable for tasks such as autonomous navigation and analyzing variable phenomena. Having raised $37.5 million in seed funding, Liquid AI intends to commercialize these networks by offering a platform for customers to create their own models and providing on-premises AI infrastructure.
> Stability AI introduces StableLM Zephyr 3B, a new 3 billion parameter Large Language Model (LLM), 60% smaller than 7B models, enabling efficient use on edge devices. Tailored for instruction following and Q&A tasks, it's an extension of the StableLM 3B-4e1t model, inspired by Zephyr 7B. Developed using supervised fine-tuning and Direct Preference Optimization (DPO), it excels in text generation, performing competitively with larger models in tests like MT Bench and AlpacaEval. This versatile, lightweight model supports diverse linguistic tasks and is available under a non-commercial license.
> Google DeepMind's AlphaCode 2, based on the Gemini Pro decoder, surpasses 85% of human coders in competitions. It's an enhancement over AlphaCode, featuring advanced sampling and selection techniques for optimal code generation. Tested on Codeforces, it solved 43% of problems, showing potential for improvement with Gemini Ultra. AlphaCode 2 exemplifies the future of AI-assisted human programming, aiding in code design and implementation.
AI Integration in Devices and Software
> Microsoft plans to launch Hudson Valley, an AI-focused Windows update in 2024, emphasizing annual major releases. This update, integrating advanced AI throughout the OS, features an AI-powered Windows Shell with an advanced Copilot for enhanced search and workflow, natural language search, Super Resolution for media upscaling, and real-time multi-language Live Captions. It will also include a dedicated “creator” area and significant energy-saving improvements.
> Musk's xAI released Grok, an AI chatbot for X Premium+ subscribers on social media platform X early this morning. Grok, powered by the Grok-1 model inspired by "The Hitchhiker's Guide to the Galaxy," offers witty, perceptive responses with a "fun" mode, multitasking, shareable chats, and feedback features. Post-acquiring X for $44 billion in 2022, Musk plans to expand in Japan, focusing on app development and market-specific advertising.
AI Innovations and Breakthroughs
> Mind reading: Japanese scientists achieved over 75% accuracy in reconstructing images from brain activity using AI. Participants viewed 1,200 diverse images in an fMRI machine, with AI generating detailed “score charts” based on factors like color and shape. Their brain activity was linked to these charts using a neural signal translator, which adapted to new brain inputs. When subjects later visualized these images, their brain activity was re-measured and translated into new charts. These were processed by a generative AI in a 500-step revision to recreate the original images, surpassing previous accuracy rates of 50.4%.
> Leonardo.Ai, a generative AI art platform, secured $31M in funding to enhance its consumer and enterprise offerings. With over seven million users and 700 million images generated, it offers unique control over AI-generated art, blending text and sketch prompts. Initially focused on gaming, it now caters to various creative industries. The funding will expand sales, marketing, and engineering, further developing its enterprise version with collaboration tools and private cloud hosting.
> Meta introduces Purple Llama, an open project for responsible AI development, offering trust and safety tools, focusing on cybersecurity and input/output safeguards. With over 100 million Llama model downloads, Purple Llama aims to build developer trust, incorporating red and blue team approaches for risk mitigation. Initial tools assess LLM cybersecurity risks and filter AI-generated code for security and content appropriateness. Licensed for both research and commercial use, Purple Llama encourages collaborative, standardized AI development, with partners like AI Alliance, AMD, and Google Cloud.
OpenAI Controversy
In a revealing interview, Helen Toner, a former board member at OpenAI, shed light on the contentious dismissal of Sam Altman, stating trust issues were the primary cause, not AI safety. The decision was part of a strategic effort to fortify OpenAI's core mission. Toner addressed rumors about the organization's stability following Altman's exit, relating to pressure from an OpenAI lawyer. The situation intensified after Altman challenged Toner's paper criticizing AI hype. The ensuing internal power struggle, marked by Altman's alleged manipulation of board members against Toner, culminated in the board's resignation and Altman's subsequent reinstatement.
10 new AI-powered tools from around the web
Demostack’s AI Data Generator turns '“dummy data” smart in sales demos. Create a diverse demo library effortlessly, tailored for every segment, persona, and industry using AI.
TableFlow is an open-source CSV importer for apps, offering customizable, AI-powered data validation and scalable importing. Saves engineering time, ensures data accuracy, and available for self-hosting or as a cloud-service.
Kommunicate GenAI supercharges customer support with generative AI-powered chatbots, easily trained using documents, FAQs, and Knowledge Bases. Allows deployment on websites, mobile app, and messaging platforms like WhatsApp, Messenger, and Telegram.
Stey.ai leverages AI to analyze user behavior data, pinpointing reasons for user drop-offs. It offers session replays, natural language search for specific actions, AI-generated summaries, and user experience analysis reports, enhancing product experience.
Streak AI is a CRM that integrates AI into Gmail, automating data entry, offering natural language queries for deal insights, AI-generated summaries, and customizable pipelines without coding, enhancing team productivity and CRM management.
Superpowered AI offers an API for seamless knowledge retrieval, enhancing LLM applications. It features improved RAG technology for reliable, context-rich information access, ideal for customer support and productivity apps.
Strut AI combines AI-powered tools for writers, enabling project management, note-taking, and drafting. It offers a sleek text editor, collaborative features, and customizable voice/tone options, all in a free, writer-focused package.
Openlayer offers advanced AI monitoring, evaluation, and versioning tools for LLMs and ML products. Features include intelligent testing, proactive alerts, easy integration, and robust reliability analysis. SOC 2 Type 2 compliant, with free options available.
JetBrains AI supercharges IDEs and .NET tools with AI-driven capabilities, including code completion, refactoring suggestions, and documentation generation, along with an integrated AI chat interface.
Resume.co is an intuitive online resume builder, featuring a wide range of customizable templates and an AI-enhanced writing tool. It simplifies creating professional resumes and cover letters, offers ATS-friendly formats, and provides valuable job search advice.
arXiv is a free online library where researchers share pre-publication papers.
Bridge-TTS, a breakthrough text-to-speech system, utilizes Schrodinger bridges, replacing noisy Gaussian priors in diffusion models with clean, deterministic text latents. This approach, offering a data-to-data process, significantly improves synthesis quality and efficiency. It provides robust structural information about the generation target, enhancing interpretability and tractability. Empirical results on the LJ-Speech dataset demonstrate Bridge-TTS's superiority over diffusion counterparts in both 50-step and 1000-step synthesis, establishing a new benchmark in text-to-speech synthesis.
The study introduces a block caching technique for diffusion models enhancing efficiency by 1.5x-1.8x without sacrificing image quality. It targets redundant computations in denoising networks, identifying consistent minimal changes over time for reuse in subsequent steps. Applied to models like LDM and EMU, this method outperforms traditional speed-oriented approaches, yielding more detailed, vibrant images. Validated through experiments and evaluations, it proves effective in generating superior quality images within similar computational limits.
The paper introduces “Illustrated Instructions”, combining LLMs with text-to-image diffusion models to create Stacked Diffusion. This model generated instructional content with visuals, outperforming baselines and sometimes even human articles. It supports goal suggestion, error correction, and personalized instructions, offering potential beyond static web articles. The approach leverages spatial tiling for image consistent, text embedding concatenation, and step-positional encoding, achieving high validity, consistency, and efficacy in instructions.
HyperDreamer is an advanced 3D content generation framework that creates hyper-realistic, editable 3D models from single images. Key features include 360° viewable mesh modeling with high-resolution textures, realistic material property rendering with semantic segmentation, and interactive, text-guided editing. It significantly outperforms existing methods in producing high-quality, editable 3D content, making it a valuable tool for various applications.
Large Language Models for Mathematicians" explores the transformative role of models like ChatGPT in mathematical contexts. Focusing on the underlying transformer architecture, it assesses how these models handle mathematical problems, often outperforming older algorithms but sometimes erring in complex tasks. Despite their limitations in generating precise proofs, LLMs show promise as auxiliary tools, offering search engine capabilities and idea generation for mathematicians. The study underscores LLMs' evolving efficacy, especially in computation and collaborative tasks, while cautioning against over-reliance due to inherent limitations in understanding and creating sophisticated mathematical content.
Thank you for reading today’s edition.
Your feedback is valuable. Respond to this email and tell us how you think we could add more value to this newsletter.
Interested in reaching smart readers like you? To become an AI Breakfast sponsor, apply here.