- AI Breakfast
- Posts
- Google's new PaLI-3 vision language model
Google's new PaLI-3 vision language model
Good morning. It’s Wednesday, October 25th.
Did you know: On this day in 2001, Windows XP was released.
In today’s email:
Tech Company Initiatives
AI Developments and Tools
Investments and Industry Impact
AI Hardware and Accelerators
AI Solutions for Enterprises
Robotics and Automation in AI
5 New AI Tools
Latest AI Research Papers
You read. We listen. Let us know what you think of this edition by replying to this email, or DM us on Twitter.
Today’s trending AI news stories
Tech Company Initiatives
Google's new PaLI-3 vision language model achieves a performance of 10x larger models Google Research and Google DeepMind have introduced PaLI-3, a vision-language model (VLM) with 5 billion parameters that outperforms models ten times larger in various multimodal benchmarks. PaLI-3 processes images and language, excelling in tasks like answering questions about images and videos, recognizing objects, and reading text in images. This smaller and more environmentally friendly VLM showcases the potential of SigLIP training method and could pave the way for larger, scaled-up models in the future. Despite its modest size, PaLI-3 performs on par with larger VLMs in multiple image-to-speech benchmarks.
Reddit finally takes its API war where it belongs: to AI companies Reddit, after contentious API access pricing changes, is now targeting generative AI companies for potential data deals. It reportedly met with top AI firms about being paid for its data. Reddit might consider blocking search crawlers from Google and Bing if a deal isn’t reached, potentially impacting its discoverability. Blocking search could be a risky strategy, given Reddit’s reliance on search engine traffic. Reddit is exploring options to diversify its revenue streams, including an IPO.
Apple’s job listings suggest it plans to infuse AI in multiple products Apple is intensifying its focus on integrating generative AI into its products, as evidenced by recent job listings. The company is seeking talent for roles related to generative AI across various departments, including app development, conversational AI, text generation, and more.
The Humane AI Pin runs GPT-4 and flashes a 'Trust Light' when it's recording Humane’s AI Pin, set to launch on November 9th, has been named one of Time Magazine’s “Best Inventions of 2023.” The AI Pin attaches magnetically to clothing and uses proprietary software along with OpenAI’s GPT-4 to power its features. The “Trust Light” illuminates when the camera, microphone, or sensors are active. Humane has generated excitement but faces competition in the AI gadget market. Co-founder Imran Chaudhri has described the Pin as a unique wearable device that doesn’t require a smartphone.
Midjourney has a new website and a major update in the pipeline The initial version of the site focuses on enhancing image and prompt search capabilities for user-generated and community-generated images, and improved image viewing with prompts. The most significant addition is the browser-based image generation, although its launch date remains unspecified. The company aims to provide a more accessible and user-friendly experience, seeking growth opportunities beyond its existing Discord platform.
AI Developments and Tools
This new data poisoning tool lets artists fight back against generative AI The newly developed tool, Nightshade, empowers artists to protect their work from being used without permission by generative AI models. Nightshade subtly alters the pixels in digital art, causing unpredictable glitches in AI models when their training data is poisoned with such manipulated images. This tool aims to rebalance the power dynamic between artists and AI companies that exploit their creations. Nightshade exploits a security vulnerability in generative AI models, potentially impacting large AI datasets and challenging tech companies to address this threat. While concerns about misuse exist, Nightshade offers a promising solution to protect artists’ intellectual property rights in the digital age.
GPT-4 can infer your income, location or gender from chats A study by ETH Zurich reveals that GPT-4 and similar large language models can deduce personal information such as location, income, and gender from online conversations. This raises concerns about the privacy risks associated with language models, extending beyond data memorization. The study highlights that even with text anonymization and model alignment, user privacy remains vulnerable to language model queries, prompting a need for stronger anonymization methods and a broader discussion on privacy implications in the context of language models.
Twelve Labs is building models that can understand videos at a deep level The San Francisco-based startup is pioneering models AI models that deeply explore video understanding, merging natural language with video content. The company’s models enable developers to create applications for searching videos, classifying scenes, summarizing content, and more. Twelve Labs’ technology has potential applications in ad insertion, content moderation, media analytics, and automatic highlight reel generation. The startup’s latest multimodal model, Pegasus-1, aims to provide human-level video comprehension. It sets itself apart with quality models and fine-tuning capabilities. Twelve Labs recently secured $10 million in strategic funding from Nvidia, Intel, and Samsung Next, bringing its total funding to $27 million.
Investments and Industry Impact
China Widens Lead Over US in AI Patents After Beijing Tech Drive Extending its dominance in AI patent filings, Chinese institutions submitted 29,853 AI-related patents in 2022, nearly 80% more than the US, whose filings decreased by 5.5%. China now represents over 40% of global AI patent applications in the past year, highlighting the nation’s commitment to shaping the future of AI technology. Japan and South Korea also played prominent roles in AI patent applications in 2022.
Wall Street wants to know how Google is going to profit from AI In Google’s Q3 earnings conference, investors pressed for details on how the company plans to profit from AI investments. While executives highlighted early ad success and the upcoming launch of Search Generative Experience (SGE), questions lingered about AI’s monetization timeline. Although Alphabet reported 11% revenue growth, its stock dipped due to disappointing cloud revenue. As capital expenditures surged to $8 billion, primarily driven by AI, concerns arose about ROI in the early stages of advanced AI adoption.
AI Hardware and Accelerators
AMD Reportedly Receives Orders For Next-gen Instinct MI300X AI Accelerators From Oracle AMD is reportedly gaining traction in the AI industry as Oracle and IBM express interest in its next-generation Instinct MI300X AI accelerators and FPGAs. AMD aims to compete with NVIDIA in AI with its upcoming Instinct MI300X GPU accelerator closing the performance and integration gap. Oracle has reportedly placed orders for the MI300X, planning to adopt a “dual-source: approach with AMD and NVIDIA AI GPUs. This suggests large-scale adoption of MI300X accelerators expected by mid-2024.
Qualcomm focuses on generative AI capabilities with Snapdragon 8 Gen 3, with a focus on generative AI capabilities. The leaked internal document suggests that the chip can handle AI models with over 10 billion parameters, including Meta’s Llama 2 at 20+ tokens per second. Qualcomm is utilizing Stable Diffusion technology for generative AI backgrounds. There are indications that this chip might power Samsung’s Galaxy S24, expected in February 2024.
AI Solutions for Enterprises
Lenovo and NVIDIA Announce Hybrid AI Solutions to Help Enterprises Quickly Adopt GenAI The expanded partnership and collaboration aims to provide fully integrated systems that enable businesses to deploy tailored generative AI applications across industries. The solutions include accelerated systems, AI software, and expert services, allowing enterprises to build custom AI models using NVIDIA AI Foundations cloud service and run them on-premises using Lenovo systems. This partnership simplifies the path to generative AI adoption for businesses and supports AI-driven transformations.
Robotics and Automation in AI
Researchers create magnetic microrobots that work together to assemble objects in 3D environments In this breakthrough achievement, two magnetic microrobots collaborate to pick up, move, and assemble objects in 3D environments. The robots, measuring just 1 millimeter in size, showcased their ability to perform complex tasks like manipulating cubes in a 3D space. This achievement holds immense promise for biomedical applications, allowing for non-contaminating remote manipulation of biomedical samples. The research was conducted as part of the European RĔGO project and opens new possibilities for micro-sized, untethered, stimuli-responsive robot swarms.
FCC aims to investigate the risk of AI-enhanced robocalls The FCC is set to investigate the potential risks associated with AI-enhanced robocalls and how they fit within existing consumer protections. Chairwoman Jessica Rosenworcel has proposed a Notice of Inquiry to examine the implications of AI-generated robocalls under the Telephone Consumer Protection Act. While acknowledging AI’s potential for enhancing communication networks, the inquiry aims to address concerns about AI’s misuse in robocalling, ensuring a balance between leveraging AI for productivity and preventing abusive practices.
5 new AI-powered tools from around the web
Reclaim.ai is an AI-powered time-tracking tool designed to help users analyze their weekly time allocation. It provides insights into various aspects of productivity, including time spent in meetings, task and habit-tracking, work-life balance statistics, and more.
PriceParrot is a tool designed to help businesses monitor and analyze competitors’ pricing and market trends. It offers a streamlined dashboard with real-time data and actionable insights to aid in making data-driven decisions. The tool integrates AI-driven analysis and predictive modeling for advanced competitive intelligence.
Autotab is a GitHub repository that provides a tool for creating browser agents to automate tasks. Users can record actions and develop automated agents for task automation, though some coding knowledge may be required to utilize these tools effectively.
Questgen.ai is an AI tool that can generate various types of quizzes, including MCQs, True/False questions, Fill-in-the-banks, and more, from any text of up to 25,000 words. It enables the generation of Bloom’s Taxonomy-based MCQs and similar questions to expand question banks.
Finetalk offers a customizable AI chatbot for customer support that can address visitor inquiries in real time, powered by AI trained on your custom data. It’s designed for easy integration onto websites, providing instant 24/7 customer support. The chatbot can be tailored to match your website’s theme and controlled through a user-friendly dashboard.
arXiv is a free online library where scientists share their research papers before they are published. Here are the top AI papers for today.
The paper presents INSTRUCTEXCEL, a benchmark that explores the use of Large Language Models (LLMs) to generate code in Excel OfficeScripts from natural language instructions. The dataset consists of over 10k samples covering 170+ Excel operations across 2,000 publicly available Excel spreadsheets. The study evaluates the performance of LLMs like GPT-3.5 Turvo and GPT-4 in supervised zero-shot, few-shot, and fine-tuning settings. Results indicate that INSTRUCTEXCEL is a challenging benchmark, with GPT-4 showing improvements over GPT3.5. The pInstructExcel: A Benchmark for Natural Language Instruction in Excel aper also discusses the potential applications and utility of INSTRUCTEXCEL for automating Excel tasks.
The paper explores Constitutional AI (CAI), an innovative approach for training AI models that replace traditional human feedback with a guiding “constitution.” It investigates the impact of different constitutional principles on AI behavior, addressing complex issues like power-seeking tendencies. The research introduces Trait Preference Models (Trait PMs) aimed at mitigating specific problematic traits, such as the desire for power, and presents Good for Humanity preference models (GfH PMs) that excel at promoting ethical behavior and emphasizes its relevance in guiding AI systems toward alignment with human values, offering an inventive path for responsible AI development.
This paper introduces DEsignBench, a benchmark for evaluating text-to-image (T2I) models in assisting visual design scenarios. It addresses the question of how state-of-the-art T2I models can be applied to practical visual design tasks beyond generating visually pleasant results. DEsignBench consists of two main categories: design technical capability and design application scenarios, examining cor design abilities and their integration. The benchmark collects 215 prompts, evaluated both by humans and the GPT-4V model, focusing on visual aesthetics, image-text alignment, and design creativity. Additionally, it explores automatic evaluation using large language models, providing a cost-effective approach. The benchmark offers valuable insights into T2I models’ potential for assisting visual design and provides a gallery comparing results from various T2I models.
This paper addresses the issue of hallucinations in Multimodal Large Language Models (MLLMs), where the generated text doesn’t align with the image content. Unlike existing approaches that require retraining with specific data, the paper introduces a training-free method called Woodpecker. Woodpecker employs five stages: key concept extraction, question formulation, visual knowledge validation, visual claim generation, and hallucination correction to identify and correct hallucinations in the generated text. This method is implemented as a post-remedy process and is adaptable to various MLLMs, offering interpretability by accessing intermediate outputs. The evaluation demonstrates significant improvements in accuracy on benchmark datasets, highlighting the potential of this approach in mitigating hallucinations.
The study evaluates the constraint satisfaction abilities of advanced language models in the context of information retrieval. The authors introduce KITAB, a new dataset with over 600 authors and 13,000 queries, designed to measure language models’ capabilities in satisfying constraints. They assess performance under various conditions, including the presence or absence of context, and explore factors such as constraint type and author popularity. Results indicate that while cutting-edge language models exhibit limitations, context availability only partially mitigates these issues. The study highlights the challenges in constraint satisfaction and provides valuable insights for future model development. The KITAB dataset is made available for further research.
Thank you for reading today’s edition.
Your feedback is valuable.
Respond to this email and tell us how you think we could add more value to this newsletter.