AI Breakfast
Posts
Secret Q-Star Project Acknowledged by Altman

Secret Q-Star Project Acknowledged by Altman

AI Breakfast
December 01, 2023

Good morning. It’s Friday, December 1st.

Did you know: ChatGPT just had its first anniversary?

In today’s email:

AI Research and Development
AI in Industry and Investments
Corporate AI Strategies and Developments
AI Tools and Platforms
6 New AI Tools
Latest AI Research Papers

You read. We listen. Let us know what you think by replying to this email.

Interested in reaching 45,566 smart readers like you? To become an AI Breakfast sponsor, apply here.

Today’s trending AI news stories

AI Research and Development

> OpenAI CEO Sam Altman indirectly confirms the existence of the mysterious Q* project, previously rumored as a secret AGI initiative. While details remain undisclosed, Altman refers to it as an "unfortunate leak" and chooses not to comment further. Speculations suggest Q* might blend Large Language Models with advanced planning algorithms and could mark a significant advancement in AI, potentially leading to more capable systems than current ones like ChatGPT.

> Meta's AI lab celebrates its 10th anniversary with three innovative AI projects: Ego-Exo4D, Seamless Communication, and Audiobox. Ego-Exo4D is a dataset for video learning and multimodal perception, Seamless Communication involves AI models for improved multilingual communication, and Audiobox is a generative AI model for custom audio creation using voice and text inputs. These projects aim to enhance AI's role in Met’a big bets on augmented reality, robotic learning, and audio generation.

> DeepMind researchers have discovered that AI can mimic human social learning, using a method called reinforcement learning in a simulated environment, GoalCycle3D. This technique enables AI agents to quickly learn and retain skills with minimal human data, suggesting a new direction in AI advancement. This approach could lead to a cultural evolution in AI, where behaviors accumulate over generations, aiding in the progress towards artificial general intelligence. | More

> Nvidia CEO Jensen Huang predicts at the 2023 NYT DealBook Summit that artificial general intelligence (AGI) will be realized in five years. He acknowledges current AI limitations in mimicking complex human reasoning. This forecast aligns with diverse tech industry opinions on AGI's potential for problem-solving and risks like fake news and authoritarian regimes.

> Perplexity, a San Francisco startup, has introduced two online large language models, pplx-7b-online and pplx-70b-online, capable of accessing real-time internet data for current responses, unlike offline models like GPT-3.5. These models, fine-tuned on diverse, high-quality data, aim for accuracy and helpfulness, showing superior performance to GPT-3.5 in robustness and academic knowledge. This development addresses challenges in keeping LLM responses fresh and accurate. | Try Perplexity

AI in Industry and Investments

> Together, a generative AI startup, has raised $102.5 million in Series A funding from Kleiner Perkins, Nvidia, and Emergence Capital. This investment will boost its cloud platform, supporting developers in building with both open and custom AI models, and offering an alternative to vendor lock-in. The company provides a cost-effective cloud platform with thousands of GPUs, and works with organizations to integrate AI into various applications.

> AWS and NVIDIA have partnered to enhance generative AI capabilities by providing advanced infrastructure, software, and services. AWS will be the first cloud provider to introduce NVIDIA GH200 Grace Hopper Superchips, host NVIDIA DGX Cloud, and develop Project Ceiba, a GPU-driven AI supercomputer. The collaboration will also see the launch of new Amazon EC2 instances powered by NVIDIA GPUs, with NVIDIA software on AWS to further generative AI development, aiming to spur AI innovation across multiple industries.

> Microsoft's £2.5 billion investment in the UK aims to expand next-generation AI data center infrastructure and train over a million people for the AI economy. This investment, the largest in Microsoft's 40-year UK history, includes adding 20,000 advanced GPUs by 2026 to enhance machine learning and AI models. The expansion will occur across London, Cardiff, and potentially Northern England, focusing on sustainable development with renewable energy-powered data centers.

> US intelligence is investigating Abu Dhabi's AI firm G42, linked to Huawei, for potential transfer of advanced US tech to China. The CIA has profiled CEO Peng Xiao. Amidst concerns, the Biden administration urges G42 to cut ties with Chinese entities and considers sanctions. G42, working with Microsoft, Dell, and OpenAI, is developing a leading AI supercomputer using Cerebra's AI chips.

Corporate AI Strategies and Developments

> Sam Altman has officially resumed his role as CEO of OpenAI, with Microsoft gaining a new nonvoting seat on the company's board. This move follows a tumultuous period at OpenAI, which saw almost the entire staff threaten to quit over Altman's initial ousting. The reorganization leaves questions about the future of chief scientist Ilya Sutskever, despite Altman expressing no ill will towards him. OpenAI's revamped all-male board includes notable figures like Larry Summers and Bret Taylor, with Taylor as chair.

> ByteDance, known for TikTok, has launched "ChitChop," an AI platform by POLIGON featuring over 200 robot services. It caters to six areas: creation, AI drawing, recreation, study, work, and life, each with multiple AI tools. This reflects a trend among Chinese firms like Huawei Cloud, Alibaba, and Baidu, who are increasingly deploying large-scale AI products globally, intensifying competition in the international AI market.

AI Tools and Platforms

> Bitmagic is publicly testing an AI-based tool that lets users create 3D games from text prompts, currently available on Windows. This platform, featuring asset generation and immersive graphics, aims to simplify game development. Future expansions include iOS, Android, and multiplayer options. Access is available through the Bitmagic Discord server.

> Amazon SageMaker Studio now offers Code Editor, an IDE based on Visual Studio Code Open Source, to improve the machine learning development experience. It supports VS Code extensions, integrates with AWS services like Amazon CodeWhisperer, and enables GitHub collaboration. Customizable with SageMaker Distribution containers, it operates on a pay-as-you-go basis for resources. Code Editor is available in various AWS Regions, focusing on efficient and familiar ML project development.

5 new AI-powered tools from around the web

Morph 1.0, an AI-powered Business Intelligence dashboard, simplifies extracting insights from fragmented data across various platforms without coding, offering scalable databases, collaborative tools, and adaptable BI features for professionals in diverse fields.

Manot AI is an insight management platform enhancing computer vision models by identifying their weaknesses, significantly boosting model refinement, accuracy, and cost efficiency, catered for product managers, CV engineers, and data scientists.

Sider 4.0, enhancing AI group chats in Chrome, compares responses from ChatGPT, GPT-4.0, Claude, Bard to minimize errors, integrates with webpages, images, PDFs, and boasts over 2 million users and a high user rating.

Vela Terminal, an AI-native VC co-pilot, aids in spotting market trends, mapping relationships, and identifying fast-growing startups. Developed by ex-Google founders, it offers a unique, open-source approach with premium features for partners.

Qonqur, a gesture-controlled mind mapping app, enhances presentations, learning, brainstorming, and research. Compatible with laptops, smartphones, and USB webcams, it operates without VR glasses and is available at a discounted beta price.

ChatMaxima, a conversational marketing SaaS, offers an intuitive chatbot builder for AI chatbots across various platforms like websites and social media, enhancing customer engagement with centralized communication, analytics, and scalable pricing plans.

arXiv is a free online library where researchers share pre-publication papers.

📄 MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

The MMMU benchmark is a new evaluation standard for multimodal models, featuring 11.5K complex questions from college-level exams, quizzes, and textbooks. Covering six core disciplines and 30 subjects across 183 subfields, it includes diverse image types like diagrams and music sheets. MMMU tests models on advanced perception and reasoning, demanding expert knowledge. Its challenging nature is highlighted by GPT-4V's 56% accuracy, indicating significant room for model improvement and guiding future research towards expert artificial general intelligence.

📄 YUAN 2.0: A Large Language Model with Localized Filtering-based Attention

Yuan 2.0, a sophisticated language model, integrates Localized Filtering-based Attention (LFA) to better understand natural language, scaling from 2.1 to 102.6 billion parameters. It employs innovative data filtering and generation for improved training, and a unique distributed training method enhancing efficiency. Yuan 2.0 excels in code generation, math problem-solving, and chatting, surpassing many existing models. Its open-source release, including weights and code, contributes significantly to AI advancements.

📄 LEDITS++: Limitless Image Editing using Text-to-Image Models

LEDITS++ revolutionizes image manipulation using text-to-image diffusion models by offering an efficient, precise, and versatile technique. Unlike current methods that are computationally heavy, induce drastic image changes, or lack support for multiple edits, LEDITS++ eliminates the need for extensive tuning. It employs a new inversion approach for efficient diffusion sampling and supports multiple simultaneous edits. Its implicit masking techniques ensure changes are confined to relevant areas, enhancing precision. LEDITS++ is architecture-agnostic, compatible with various diffusion models, and demonstrates superior performance in empirical evaluations including the new TEdBench++ benchmark.

📄 PoseGPT: Chatting about 3D Human Pose

PoseGPT innovatively employs LLMs for understanding and generating 3D human poses from images or descriptions. Integrating SMPL poses as tokens, it overcomes traditional methods’ limitations, offering nuanced scene comprehension and reasoning. This framework excels in speculative pose generation and reasoning-based estimation, outperforming current multimodal LLMs. PoseGPT’s approach marks a significant leap in human pose analysis, blending advanced AI with an intuitive grasp of body language.

📄 TrustMark: Universal Watermarking for Arbitrary Resolution Images

TrustMark, a new GAN-based digital watermarking method, strikes a balance between watermark imperceptibility and recovery accuracy. It’s uniquely designed for robust watermark embedding in images, resistant to various perturbations. TrustMark also includes TrustMark-RM for watermark removal, facilitating image re-watermarking.

Thank you for reading today’s edition.

Your feedback is valuable. Respond to this email and tell us how you think we could add more value to this newsletter.

Interested in reaching smart readers like you? To become an AI Breakfast sponsor, apply here.