AI Breakfast
Posts
Google's Gemini Preview Coming

Google's Gemini Preview Coming

AI Breakfast
December 06, 2023

Good morning. It’s Wednesday, December 6th.

In Partnership with CodeRabbit

In today’s email:

AI Advancements and New Technologies
AI in Business and Corporate Strategy
10 New AI Tools
Latest AI Research Papers

You read. We listen. Let us know what you think by replying to this email.

Interested in reaching 46,006 smart readers like you? To become an AI Breakfast sponsor, apply here.

Today’s trending AI news stories

AI Advancements and New Technologies

> Google is set to virtually preview Gemini, its ChatGPT competitor, possibly within this week. Gemini, a generative AI chatbot, promises enhanced capabilities over Google’s current Bard. This comes as Google delayed Gemini’s full launch to 2024 amidst struggles with non-English language queries. The preview could relieve pressure on Google, which has lagged in generative AI against rivals like Microsoft and OpenAI. Gemini, developed by DeepMind, is a multi-modal AI capable of handling text, voice, and image queries and employs reinforcement learning for problem-solving.

> Alibaba's Institute for Intelligent Computing unveils 'Animate Anyone,' a groundbreaking deepfake technology transforming static images into realistic full-motion videos. This advancement significantly refines animation quality, addressing challenges like hallucinations and implausibility in previous models. However, it raises ethical concerns as it's trained on scraped videos of famous TikTokers like Charli D’Amelio and Addison Rae, without their consent. Initially developed for academic purposes at the University of Minnesota, this commercial application spotlights pressing legal and moral dilemmas in AI and content creation, fueling ongoing debates in the digital real. | Watch a demo

> Microsoft’s Seeing AI app, designed to assist blind and low vision individuals, is now available on Android. Previously exclusive to iOS, the app’s Android version supports 18 languages, with plans to expand to 36. Seeing AI offers enhanced features like detailed alt-text descriptions for images and a chatbot for document information. Launched in 2017, Seeing AI helps users identify objects, currency, and documents, similar to Be My Eyes but using AI instead of human assistance.

> Microsoft's Medprompt improves GPT-4's performance in medical applications by employing a unique prompting method, integrating dynamic few-shot selection, self-generated chain-of-thought reasoning, and choice shuffle ensembling. This approach has led to over 90% accuracy on the MedQA dataset and outstanding performance in the MultiMedQA suite. The technique, adaptable to different areas, demonstrates an average 7.3% enhancement across various fields, indicating its potential for wider applications beyond just medical scenarios.

> Mozilla’s innovation team has launched 'llamafile', an open-source tool that simplifies Large Language Model (LLM) distribution by converting them into a single binary file. This file, compatible with six major operating systems, ensures consistent LLM performance across platforms. However, Windows users face a 4 GB size limit on executables.

> IBM’s Quantum Summit revealed 'IBM Quantum Heron,' a quantum processor with five-fold error reduction and the modular 'IBM Quantum System Two,' advancing towards quantum-centric supercomputing. IBM's roadmap extends to 2033, focusing on improving gate operations and scaling error-corrected systems. The launch of Qiskit 1.0 software and the integration of generative AI models in watsonx emphasize IBM's push for automating quantum code development and enhancing circuit optimization.

AI in Business and Corporate Strategy

> Meta Platforms Inc. and IBM, along with over 50 other entities, have established the AI Alliance, an industry group dedicated to open-source AI development, challenging the norm of proprietary AI systems. Key participants include Oracle, AMD, Intel, and Stability AI, as well as academic entities. Such as Cornell University and The National Science Foundation. The initiative seeks to make AI development more accessible and innovative, guided by a future governing board and a technical oversight committee.

> AMD is set to launch its Instinct MI300 GPU, aimed at AI systems, which could intensify competition with Nvidia. The event may include a comparison with Nvidia’s H100 and Microsoft’s involvement, highlighting AMD’s AI hardware and software momentum. This launch is crucial for AMD’s AI market presence, especially as Nvidia faces a delay in its H200 GPU release and U.S. restrictions on chip exports to China. Analysts from Wedbush and Jefferies anticipate significant market growth for AMD, positioning it alongside Nvidia as a key player in the AI chip industry.

> Agility Robotics' RoboFab in Salem, Oregon, set to be the first factory mass-producing humanoid robots, will produce 10,000 units yearly, including the bipedal robot Digit, designed for warehouse tasks. This marks a significant advancement in robotics, offering versatile solutions for industries like Amazon. With China's similar ambitions by 2025, this development signals a major shift in industrial robotics, focusing on humanoid forms for enhanced mobility and functionality.

> ByteDance, the Chinese owner of TikTok, plans to launch a platform for creating AI-powered chatbots, competing with OpenAI's ChatGPT. This move, part of ByteDance's AI product expansion, raises potential U.S. security concerns due to the company's Chinese ties. The platform, aimed to integrate with existing products, could amplify ByteDance's influence in the AI space, but faces regulatory challenges in China and skepticism in the U.S. due to privacy and data security issues.

> Amazon’s AI chatbot Q reportedly leaked sensitive information, including AWS data center locations and details of unreleased features, as revealed by employees in leaked documents obtained by Platformer. Marked as a “sev 2” incident, this requires urgent attention from engineers. Amazon downplayed these concerns, asserting no security breach occurred and denying Q leaked confidential data. Q, designed to outperform rivals like Microsoft and Google in security and privacy for enterprise use, is now in a public preview, offering AWS assistance and coding features.

> Tip your GPT? A recent experiment by programmer Thebes revealed that ChatGPT provides enhanced and more detailed responses when incentivized with a pretend tip. The experiment demonstrated that the AI chatbot’s performance improved with the promise of a financial reward, producing lengthier and more comprehensive answers. However, when offered the tip, ChatGPT declined, stating its primary objective is user satisfaction, not monetary gain. The behavior reflects the training and programming of AI chatbots like ChatGPY and highlights the influence of incentive structures on their response quality.

> Runway collaborates with Getty Images to create an AI video model for enterprise customers, blending Runway’s technology with Getty’s licensed content library. This partnership targets the need for quality, customized video content in various sectors. The upcoming Runway <> Getty Images Model (RGM) will enable businesses to craft unique, brand-aligned videos using their own datasets | Runway ML

> Musk’s X AI, is raising $1 billion in fresh capital, having already secured nearly $135 million from four investors. Announced in July, the company focuses on understanding the universe’s true nature. Its chatbot, Grok, aims to rival OpenAI’s ChatGPT and Google’s Bard technology. Musk recently acquired high-powered GPUs for X AI and announced that investors in X would own 25% of the company. X AI team includes experts from DeepMind, OpenAI, and other leading tech firms.

> Microsoft's Copilot, now on Windows 10 and 11, is introducing six new features, leveraging OpenAI's GPT-4 Turbo. GPT-4 Turbo, with an updated knowledge base to April 2023, enhances task handling capabilities. Other features include a new DALL-E 3 model for improved image creation, an "Inline Compose with rewrite menu" in Edge for text rewriting, and a Multi-Modal with Search Grounding feature integrating GPT-4 with vision for enhanced image query understanding. Additionally, Deep Search in Bing optimizes complex search results, and Code Interpreter aids in tasks like coding and data analysis.

^{In partnership with CODERABBIT AI}

Accelerate Your Code Reviews with CodeRabbit AI

CodeRabbit is here to revolutionize your code reviews with its AI-driven platform. With privacy-focused, contextual pull request reviews, CodeRabbit offers line-by-line code suggestions and interactive chat features to make your coding process more efficient and error-free.

Key Features:

Pull Request Summaries: Understand the intent behind changes with clear summaries and automated release notes.

Line-by-Line Code Suggestions: Receive detailed, actionable suggestions for every line of code changed.

Interactive AI Chat: Engage in contextual conversations within your code lines for better coding solutions.

Customizable Reviews: Tailor the AI to suit your specific coding preferences and needs.

^{Thank you for supporting our sponsors!}

5 new AI-powered tools from around the web

D-ID Creative Reality™ Studio Mobile App transforms AI video tech, offering easy, on-the-go creation of realistic digital humans with customizable features for diverse users, enhancing content creation with innovative simplicity and efficiency.

Papermark AI, an open-source AI document assistant, revolutionizes document engagement by offering summarization, query responses, and transforming pitch decks into memos, simplifying content interaction.

Watermark Remover by Magic Studio offers quick, efficient watermark elimination, preserving image quality, ideal for photographers and designers needing pristine images.

FastCut, an AI-powered tool generates captivating captions for short-form videos like reels and TikToks with a single click, offering automatic transcription, emoji inclusion, and styles mirroring top creators.

Respell, an AI platform automates knowledge work using Elle, an intuitive chat agent. It integrates with Gmail, Notion, Airtable, offering AI workflows, SaaS integrations, and automation suggestions.

SuperDuperDB is an open-source framework for AI integration with existing databases. It supports streaming inference, scalable model training, and vector search without complex MLOps or data migration. Compatible with major SQL databases and tools like PyTorch and OpenAI.

Watermelon, an open-source copilot, streamlines code reviews for software teams. It automates PR pre-reviews, error detection, and integrates with GitHub, Slack, enhancing efficiency in code review processes.

Pikaso, powered by Freepik, is a free AI art generator that transforms simple sketches and words into artistic images. It offers a suite of features including a library of icons, shapes, and elements, allowing users to create and edit images in real-time through basic drawing.

Completely is an AI-powered tool for quick, detailed competitive analysis. It compares competitors across marketing, product features, pricing, audience, customer sentiment, SWOT analysis, and company information.

arXiv is a free online library where researchers share pre-publication papers.

📄 Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models

Researchers from Stanford University and Adobe Research introduce Generative Rendering, a diffusion-based technique for rendering 3D animations to stylized videos using text prompts. This new approach allows users to control untextured 3D scenes, transforming them into high-quality, consistent frames with styles like those of prominent creators. It combines dynamic 3D mesh controllability with diffusion model expressivity, addressing the challenges in video diffusion models’ controllability and amplifying user creativity.

📄 VideoRF: Rendering Dynamic Radiance Fields as 2D Feature Video Streams

VideoRF is the first real-time dynamic radiance field streaming and rendering solution for mobile devices. It converts 4D radiance fields into 2D feature image stream, which can be efficiently compressed and decoded using standard video codecs. This approach leverages temporal and spatial redundancies through a unique training scheme, enabling effective data compression. VideoRF’s rendering pipeline is optimized for mobile GPUs, allowing for real-time, interactive free-viewpoint experiences on various devices. The system represents a significant advancement in neural scene modeling, enhancing VR/AR applications with dynamic, photorealistic environments.

📄 Segment Any 3D Gaussians

Segment Any 3D Gaussians (SAGA) is a 3D segmentation approach blending 2D segmentation model with 3D Gaussian Splatting for interactive 3D segmentation in radiance fields. It efficiently integrates multi-granularity segmentation, achieving nearly 1000× speed enhancement compared to previous methods, allowing real-time interaction. SAGA supports various prompts, including points, scribbles, and masks, and maintains competitive performance with existing benchmarks.

📄 FaceStudio: Put Your Face Everywhere in Seconds

The study introduces an efficient method for identity-preserving image synthesis, focusing on human images. Distinct from resource-intensive methods like Textual Inversion and DreamBooth, this approach utilizes a direct feed-forward mechanism and a hybrid guidance framework that blends stylized and facial images with textual prompts. The model excels in creating diverse applications, including artistic portraits, and demonstrates remarkable efficiency and fidelity in maintaining subjects’ identities, outperforming existing models in versatility and identity preservation.

Thank you for reading today’s edition.

Your feedback is valuable. Respond to this email and tell us how you think we could add more value to this newsletter.

Interested in reaching smart readers like you? To become an AI Breakfast sponsor, apply here.