AI Breakfast
Posts
Watch The "Athletically Intelligent" Robot Dog

Watch The "Athletically Intelligent" Robot Dog

AI might kill us all. Until then, enjoy this email.

AI Breakfast
October 07, 2023

Good morning. It’s Saturday, October 7th.

Did you know: OpenAI CEO Sam Altman appeared on the Joe Rogan Experience yesterday? Check out the episode here on Spotify.

In today’s email:

Editor’s note: What happened to Friday’s email? Unfortunately, COVID put me out yesterday. So here it is, a day late, but with bonus news!

Generative AI and Video Generation
Autonomous Systems and Robotics
AI Hardware and Infrastructure
AI in Education
AI Ethics, and Governance
5 New AI Tools
Latest AI Research Papers

You read. We listen. Let us know what you think of this edition by replying to this email, or DM us on Twitter.

Today’s trending AI news stories

Generative AI and Video Generation

Wayve’s GAIA-1 is making strides in autonomous vehicle training by generating synthetic videos. This generative model uses text, images, video, and action data to create diverse traffic scenarios for training self-driving cars. GAIA-1 acts as a “world model,” understanding and separating driving concepts and improving autonomous systems’ safety and efficiency. With nine billion parameters and the ability to predict different futures and react to scenarios. It’s a powerful tool for autonomous vehicle development. Wayve plans to continue scaling GAIA-1, optimizing its efficiency, and expanding its capabilities to capture a broader perspective in autonomous driving scenarios.

Researchers at the National University of Singapore have unveiled Show-1, an advanced AI model for text-to-video generation. Show-1 leverages a hybrid approach, combining pixel-based and latent-based diffusion models to create high-quality videos from textual descriptions. This innovative combination allows precise text-to-video alignment while maintaining efficiency during upscaling. Show-1 outperforms existing methods achieving comparable or better results while using only 20-25% of the GPU memory required by pixel-based models. These developments are poised to benefit open-source applications significantly.

Canva Bolsters AI Toolkit with Video Generation by Runway Canva has launched Magic Studio, an AI-powered toolset in collaboration with Runway AI, designed to simplify visual content creation. It offers generative AI features such as Magic Switch for format transformation, Magic Media to convert text into images or videos, and Magic Design for automated video and presentation creation. Canva’s AI strategy focuses on developing proprietary models and collaborating with partners like Runway. The company is committed to responsible AI usage, refraining from generating images of celebrities or any content related to public figures or third-party intellectual property. Canva also introduced the Canva Shield initiative to enhance trust, safety, and privacy.

Autonomous Systems and Robotics

AI approach yields ‘athletically intelligent’ robotic dog Stanford University and Shanghai Qi Zhi Institute have developed an autonomous dog-like robot capable of navigating and overcoming physical obstacles using a simplified vision-based algorithm.

The AI-powered “robodog” can scale objects leap across gaps, crawl under thresholds, and squeeze through crevices. Unlike previous systems, this robodog combines perception and control, using a depth camera and machine learning to process inputs and perform agile movements. The key innovation is its autonomy, as it doesn’t rely on complex reward systems or real-world reference data, making it adaptable to new environments. The researchers aim to further enhance the algorithm’s real-world autonomy using 3D vision and graphics.

MIT’s “Air-Guardian” – AI Copilot Enhances Human Precision for Safer Skies: Developed by MIT CSAIL, this system uses eye-tracking and neural network techniques to understand a pilot’s attention and collaboratively make decisions. Air-Guardian reduces flight risks and improves navigation. The technology has broader applications in other fields like cars, drones, and robotics. Its adaptability and dynamic features make it stand out, offering a balanced partnership between humans and machines for improved safety and collaboration.

AI Hardware and Infrastructure

OpenAI is exploring the development of its own artificial intelligence chips to address the shortage of costly AI chips, potentially even considering acquisitions. CEO Sam Altman has highlighted the challenges of procuring graphics processing units, dominated by Nvidia, which currently powers OpenAI’s software. While the decision is pending, this move aligns OpenAI with major tech players like Google and Amazon, emphasizing the importance of designing fundamental chips for AI applications. Developing custom chips would be a substantial investment but could reduce reliance on commercial providers in the long run.

Gradient raises $10M to let companies deploy and fine-tune multiple LLMs. Led by Wing VC, the funding also involved Mango Capital, Tokyo Black, The New Normal Fund, Secure Octane, and Global Founders Capital. Gradient aims to simplify the deployment of specialized and fine-tuned large language models (LLMs) in the cloud, enabling organizations to integrate numerous LLMs into a single system. Their platform provides open-source LLMs and industry-specific models, granting users full ownership and control over their data and models.

AI in Education

ChatGPT to be allowed in Australian schools next year, following unanimous support from education ministers. The national framework, revised by the national AI taskforce, addresses the use of AI technology in education. Despite initial concerns and restrictions, all states and territories (excluding South Australia) will implement the framework from term 1 next year. The adoption includes a $1 million investment in Education Services Australia to establish “product expectations” for generative AI technology. Critics warn of the need for proper governance and regulation to prevent misuse and ensure equitable access to AI in education.

How AI threatens to make traditional college degree 'obsolete': LinkedIn CEO Ryan Roslansky emphasizes that AI makes it nearly impossible for a single educational degree to suffice throughout one’s career. The job market will increasingly demand adaptable employees with soft skills, rendering four-year program-oriented degrees less valuable. As AI-driven technological innovation reshapes industries, skills acquired in college may become obsolete. A shift in job skills, with AI integration in the workplace, raises concerns about potential job displacement. Experts call for measures to address AI’s impact on the workforce.

AI Ethics, and Governance

Microsoft's Bing chat botches election information, “endangers democracy”, study finds A study by AlgorithmWatch and AI Forensics, in partnership with Swiss media outlets SRF and RTS, has uncovered inaccuracies in Microsoft’s Bing Chat responses to questions about elections in Germany and Switzerland. The study found that Bing Chat provided incorrect information, including misleading poll results and incorrect party candidates. While Microsoft claims to be improving the accuracy of Bing Chat, critics argue that it fails to address fundamental issues with large language models.

OpenAI's Official Justification To Why Training Data Is Fair Use and Not Infringement OpenAI’s submission to the United States Patent and Trademark Office (USPTO) regarding intellectual property protection for artificial intelligence innovation asserts that training AI systems aligns with current fair use laws. They emphasize the transformative nature of AI system training and its positive societal impact. OpenAI also highlights the need for resolving legal uncertainties surrounding copyright implications for AI developers. The organization aims to foster innovation in generative AI systems while advocating for a balanced approach that encourages creativity without imposing undue copyright burdens.

5 new AI-powered tools from around the web

GPT Pilot is a developer tool powered by AI designed to supercharge productivity by an astounding 20x. It seamlessly delegates 95% of coding tasks to LLM, transforming the app development landscape. Open-source and user-friendly, it’s poised to redefine the way applications are created.

Fill 3D is a virtual staging platform utilizing generative 3D fill to rapidly produce photorealistic images in just a minute. It delivers high-resolution, realistic results with limitless refinements at no extra expense.

Image App is a versatile AI-driven image generation tool, harnessing cutting-edge Ai models like Stable Diffusion, DALL-E, and others. Users can even train custom models using its platform. With free and subscription-based plans, it provides various image generation and model training choices to cater to diverse needs.

ChatWebby AI is a versatile AI chatbot builder that transforms websites, documents, audio, videos, and more into interactive chatbots. This no-code platform seamlessly connects to your site, reducing costs and providing 24/7 support. With future plans for human interactions and various integrations, it’s shaping the future of AI chatbots.

Enterpret revolutionizes customer feedback by unifying data sources and crafting adaptive AI models tailored to your feedback ecosystem. It extracts precise insights for informed product growth. Trusted by companies like Canva and Notion, Interpret is on a mission to centralize, understand, and shape the future of customer-driven innovation.

arXiv is a free online library where scientists share their research papers before they are published. Here are the top AI papers for today.

📄 Decoding speech perception from non-invasive brain recordings

The study explores decoding speech perception from non-invasive brain recordings, a critical goal in healthcare and neuroscience. It introduces a model trained using contrastive learning to decode perceived speech from EEG and MEG recordings, achieving up to 41% accuracy in identifying speech segments. The research addresses challenges posed by noisy signals and the unknown representation of speech in the brain. By leveraging pretrained speech models and a multi-subject architecture, it offers insights for future Brain-Computer Interface (BCI) development. However, clinical deployment for speech production decoding requires further steps.

📄 Large Language Model Cascades with Mixture of Thought Representations For Cost-Efficient Reasoning

In this paper, the authors propose a novel approach to cost-efficient reasoning using large language models (LLMs) such as GPT-4. They introduce the concept of LLM cascade, where a weaker but more affordable LLM is used to handle simpler questions, and a stronger and more expensive LLM is invoked only for challenging questions. To decide when to route a question to the stronger LLM, they introduce the idea of “answer consistency” from the weaker LLM as a signal of question difficulty. They propose several methods for answer sampling and consistency checking, including leveraging a mixture of two thought representations. Experimental results on various reasoning tasks demonstrate that their LLM cascades achieve comparable performance to using the stronger LLM alone but at only 40% of the cost.

📄 How Far Are Large Language Models From Agents With Theory-of-Mind?

The authors introduce a new evaluation paradigm called T4D (Thinking for Doing) to assess whether large language models (LLMs) can connect their understanding of other people’s mental states to appropriate actions in social scenarios. Unlike traditional social reasoning tasks that focus on making inferences, T4D requires LLMs to decide on proper actions based on observed mental states. The authors find that while LLMs perform well on inference-based tasks ToMi, they struggle with T4D, highlighting the challenge of translating mental state understanding into practical actions. They propose a zero-shot prompting framework, Foresee and Reflect (FaR), which significantly improves LLM performance on T4D, addressing the key bottleneck of identifying implicit inferences.

📄 ECO ASSISTANT: Using LLM Assistant More Affordably and Accurately

The authors introduce EcoAssistant, a framework designed to make use of Large Language Models (LLMs) as assistants for code-driven queries more affordable and accurate. Users often rely on LLMs to generate code for tasks like fetching external information via APIs, but LLMs may not get it right on the first try, leading to iterative code refinement. EcoAssistant addresses this by allowing LLMs to converse with an automatic code executor to iteratively refine code. It also employs a hierarchy of LLM assistants, starting with cheaper options and escalating to more expensive ones if necessary. Additionally, EcoAssistant retrieves solutions from past queries to help future ones. Empirical results show that EcoAssistant offers cost savings and improved accuracy compared to individual LLMs like GPT-4.

📄 Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion

Kandinsky is a cutting-edge text-to-image synthesis model that combines latent diffusion techniques with image prior principles and boasts 3.3 billion parameters. Kandinsky excels in various generative modes, including text-to-image generation, image fusion, and text and image fusion, image variations generation, and text-guided inpainting/outpainting. It achieved an impressive FID score of 8.03 on the COCO-30K dataset, making it a top open-source performer in image generation quality. The system includes user-friendly interfaces and is available as open-source software for both non-commercial and commercial purposes.

Thank you for reading today’s edition.

Your feedback is valuable.

Respond to this email and tell us how you think we could add more value to this newsletter.