AI Breakfast
Posts
Meta’s AI Agents Learn to Move by Copying Toddlers

Meta’s AI Agents Learn to Move by Copying Toddlers

Plus, AI Recreated a Pink Floyd Song with Brain Scans

AI Breakfast
August 16, 2023

Good morning. It’s Wednesday, August 16th.

Did you know: AI has already created as many images as photographers have taken in 150 years?

In today’s email:

Product and Feature Enhancements Using AI
AI Startups and Market Movement
AI Adoption and Influence
Government and Policy Interactions with AI
5 New AI Tools
Latest AI Research Papers

You read. We listen. Let us know what you think of this edition by replying to this email, or DM us on Twitter.

Today’s edition is brought to you by:

Humata: The Premier AI Document Analyzer

Humata.ai is an AI-driven tool that is designed to help users analyze and understand their documents more efficiently. It offers an impressive range of features that are intended to speed up research, learning, and report creation.

It's like having an intelligent assistant at your disposal, offering instant Q&A, automatically generating summaries of complex technical papers, and creating content for reports and tasks in an instant.

This tool truly understands context, nuance, and detail. And above all, it takes data security seriously, encrypting your documents in secure cloud storage.

Its free version is robust and offers a great starting point with a 60-page limit, while the Pro Plan unlocks even more powerful features, like querying across multiple documents simultaneously.

Having used all of the GPT-PDF analyzers available, Humata stands out to me as the best one.

Try Humata AI

Thank you for supporting our sponsors

Today’s trending AI news stories

AI Product and Feature Enhancements

Google looking to incorporate AI writing tool in the ChromeOS: Google is reportedly set to enhance its Chromebooks’ capabilities by introducing an AI-integrated writing and editing tool, tentatively named “Orca.” This tool, accessible via the right-click menu in ChromeOS, is poised to transform the way users engage with text content. The project also referred to as “Mako” and “Manta,” aims to provide rewriting options, preset text prompts, and seamless AI-generated content insertion. This move underscores Google’s ongoing commitment to infusing AI and machine learning into its products. Users will need to provide explicit consent before their content is transmitted to Google’s servers.

Amazon is rolling out a generative AI feature that summarizes product reviews: Amazon introduces a generative AI feature that summarizes product reviews, aiding shoppers in quickly grasping others’ opinions. The AI identifies common themes in concise paragraphs on product pages. Amazon CEO Andy Jassy previously emphasized the significance of generative AI across the company’s various units. The feature aims to enhance user experience and is available to a subset of U.S. mobile shoppers, potentially expanding based on customer feedback. Amazon also offers product insights features to surface recurring themes from reviews.

Google’s AI search experience adds AI-powered summaries, definitions and coding improvements: Google bolsters its AI-powered Search Generative Experience with upgrades to refine search summaries, definitions, and coding assistance. Launched three months ago, this feature enhances conversational search with AI interactions. The updates aim to offer more precise summaries, improved definitions, and better coding suggestions, enhancing user search experiences.

AI Startups and Market Movement

Modular, AI Startup Challenging Nvidia, Discusses Funding at $600 Million Valuation: Modular, a startup aiming to break Nvidia’s grip on AI chips, is reportedly in discussions for a Series A funding round that could value it at $600 million. The software addresses the AI demand that has strained GPU supplies, offering AI developers an alternative to Nvidia’s chips. The co-founder team, including Chris Lattner of Swift programming language fame, adds appeal.

Humane will share more about its mysterious ‘AI Pin’ the same day as October’s eclipse: Humane, a startup founded by ex-Apple employees, plans to unveil details about its AI-powered wearable, the “Humane Ai Pin,” on October 14th, coinciding with a solar eclipse. The device, positioned, as a smartphone replacement, offers unique features demonstrated by Imran Chaudri, such as answering calls and providing translated sentences.

Groundbreaking AI-powered platform visualizes wireless assets: Talon Aerolytics, a pioneering player in SaaS and AI technology, has unveiled a groundbreaking AI-powered computer vision platform that promises to reshape the landscape of wireless telecommunications. This innovation solution empowers wireless operators to gain insights into network assets through comprehensive AI and machine learning applications. By fusing data analytics, predictive modeling, and autonomous decision-making, Talon aims to redefine how carriers and tower owners validate asset inventory, potentially ushering in significant savings.

Kneron to release AI chip this year: Kneron, an AI edge company, is set to challenge Nvidia’s dominance with its upcoming AI chip, KL730. Designed for machine learning and AI applications, the chip promises a significant boost in energy efficiency and processing power. Kneron aims to make LLMs more affordable and efficient, offering a potential alternative to Nvidia’s GPUs.

AI Adoption and Influence

Generative AI Adoption Rate Eclipsing Smartphones & Tablets: Generative AI adoption is rapidly surpassing the growth rates of smartphones and tablets, particularly among younger demographics. This surge is attributed to the accessibility of generative AI, which doesn’t require new hardware purchases. Despite an expected growth rate slowdown, continued usage, particularly among millennials and Gen Z, is anticipated.

AI is going to eliminate way more jobs than anyone realizes: The unexpected accessibility of intuitive AI tools has expedited this transformation, catching even experts off guard. A recent McKinsey report suggests that at least 12 million Americans could find themselves in a different field of work by 2030 due to AI disruption. As a “gale of creative destruction” blows through industries, the need for swift governmental, corporate, and individual adaptation becomes paramount.

Meta’s AI Agents Learn to Move by Copying Toddlers: Meta AI, in collaboration with researchers from McGill University, Northeastern University, and the University of Twente, has unveiled MyoSuite 2.0, featuring AI-driven biochemical models that replicate human-like motor control. In simulated environments, disembodies skeletal arms and legs utilize a multitude of muscles and joints to manipulate objects, mimicking toddler-like exploration. The project aims to enhance robot and avatar movements, applying machine learning to intricate control issues.

Government and Policy Interactions with AI

China Tries to Regulate AI With State Control, Support for Tech Companies: Beijing introduces comprehensive AI rules set to begin on August 15, aimed at finding the delicate balance between state management of technology and nurturing global competitive AI enterprises. The 24 guidelines stipulate that service providers must register offerings and undergo security assessments before launch. Oversight will involve seven entities, including the Cyberspace Administration of China and the National Development and Reform Commission.

Saudi Arabia and UAE race to buy Nvidia chips to power AI ambitions: Gulf nations, Saudi Arabia and the UAE, are acquiring Nvidia’s high-performance chips to bolster their AI ambitions and establish themselves as global leaders. Amid a worldwide shortage of semiconductors for large language models, the Gulf states have obtained thousands of Nvidia chips, raising concerns about the potential misuse of technology by autocratic leaders. Saudi Arabia purchased over 3,000 Nvidia H100 chips for generative AI via King Abdullah University, while the UAE developed its open-source large language model, Falcon.

US DoD AI chief on LLMs: 'I need hackers to tell us how this stuff breaks': US DoD AI chief, Craig Martell, emphasizes that LLMs lack sentience and reasoning abilities. He calls for rigorous model development to mitigate risks of hallucination in AI chatbots, citing concerns about AI-generated false information. Martell urges hackers to identify vulnerabilities in LLMs and emphasizes the need for clear acceptability conditions. He stresses the importance of accuracy and reliability, aiming for “five nines” (99.999%) accuracy in LLMs to prevent harmful consequences, especially in critical scenarios.

OpenAI proposes a new way to use GPT-4 for content moderation: OpenAI introduces a method to employ GPT-4 for content moderation, aiming to alleviate human workloads. By prompting GPT-4 with a policy and evaluating its content judgments against labeled examples, OpenAI refines moderation policies. While this approach claims faster policy deployment, concerns arise about AI biases and accuracy. OpenAI acknowledges the need for human monitoring and validation due to potential biases introduced during training.

Novel AI Applications

AI Recreated a Pink Floyd Song with Brain Scans—and It Sounds Creepy: Researchers at the University of California, Berkeley, have developed an AI model that can decode brain activity to recreate music, such as Pink Floyd’s “Another Brick in the Wall, Part 1.” By attaching electrodes to the brains of patients and focusing on the Superior Temporal Gyrus (STG) region responsible for auditory processing and rhythm perception, the team successfully reconstructed the song, demonstrating potential for brain-computer interface technologies.

Despite fails, ChatGPT wins showdown against Stack Overflow: ChatGPT outperforms Stack Overflow despite inaccuracies, study shows. Purdue University’s research reveals users prefer ChatGPT responses, citing its comprehensive and articulate language style. Concerns arise over erroneous data contamination, leading Stack Overflow to bar ChatGPT-obtained responses.

🎧 Did you know AI Breakfast has a podcast read by a human? Join AI Breakfast team member Luke (an actual AI researcher!) as he breaks down the week’s AI news, tools, and research: Listen here

5 new AI-powered tools from around the web

bodt.io Construct AI chatbots without coding. Create personalized bots in 15 minutes, fueled by your website content with an intuitive platform. Enhance customer interaction, generate leads, and streamline your platforms.

PromeAI, the AI-infused design assistant actualizes creative visions. Its toolkit spans Sketch Rendering, Photo to Sketch, Image Enhancement, AI Supermodel, and more.

Neum AI Ensure up-to-date vector data for LLM applications. This ETL platform synchronizes real-time data into vector stores. Eliminate stale information, scale quality, and streamline LLM usage. With UI, API, and pipeline management, Neum AI automates data extraction, making LLM prompts contextually accurate.

Punky is an AI-driven Discord bot for autonomous community management and growth. Features include rapid setup, AI-fueled expansion, moderation, support, and feedback collection. Roadmap includes dashboard enhancements, AI integration, and server economy tools.

Starbuzz AI elevates influence effortlessly through AI-powered creativity. Revolutionizing influencer marketing, it unveils insights, optimizes strategies, and magnifies the impact.

arXiv is a free online library where scientists share their research papers before they are published. Here are the top AI papers for today.

📄 MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation

MuAViC, a pioneering multilingual audio-visual corpus by Meta AI, revolutionizes robust speech recognition and speech-to-text translation. Encompassing 9 languages, it boasts 1200 hours of audio-visual speech from 8000 speakers, setting the gold standard for multi-lingual audio-visual benchmarks. Distinctively, MuAViC, pioneers cross-lingual translations, a pivotal advancement. It not only provides a vital resource for noise-robust model development, as evident in baseline results but also fuels groundbreaking innovations in audio-visual language technologies.

📄 VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use

VisIT-Bench is a dynamic benchmark designed to evaluate instruction-following vision-language models in real-world scenarios. With 592 queries spanning diverse tasks, the benchmark emphasizes practical instruction-conditioned captions that provide task-specific contexts for responses. These captions, combined with GPT-4 reference outputs, create a robust evaluation framework. The benchmark employs an Elo-based ranking system and win-rate evaluation to gauge model performance accurately. By encouraging community engagement and facilitating iterative advancements, VisIT-Bench serves as a valuable resource for enhancing instruction-following vision-language models.

📄 Platypus: Quick, Cheap, and Powerful Refinement of LLMs

The paper presents Platypus, a set of fine-tuned and merged Large Language Models (LLMs) achieving top performance on the Open LLM Leaderboard. Key contributions include the Open-Platypus dataset, fine-tuning LoRA models, and addressing data leaks. Platypus models excel in LLM metrics, and a 13B variant trains in 5 hours. Merging broad and niche models enhances performance across tasks, with Camel-Platypus2-13B demonstrating consistent gains. Results vary across domains, emphasizing domain specificity. Performance enhancements and declines are observed, necessitating careful model selection for optimal merging. Domain-specific evaluations are vital before implementing merges.

📄 RestoreFormer++: Towards Real-World Blind Face Restoration from Undegraded Key-Value Pairs

Blind face restoration is a challenging task involving the recovery of high-quality facial images from unknown degradations. While existing algorithms have shown progress by incorporating priors, they often overlook vital contextual information and struggle with real-world scenarios. RestoreFormer++ is introduced, featuring innovative multi-head cross-attention mechanisms to model degraded facial features alongside high-quality priors. This approach enhances the realism and fidelity of restored images. Furthermore, an extending degrading model addresses the synthetic-to-real-world gap, ensuring better generalization. Comparative experiments underscore RestoreFormer++’s superiority on both synthetic and real-world datasets.

📄 SpeechX: Neural Codec Language Model as a Versatile Speech Transformer

SpeechX is a versatile speech generation model that tackles diverse audio-text speech tasks, including zero-shot text-to-speech (TTS), noise suppression, speech removal, target speaker extraction, and speech editing. It combines neural codec language modeling with multi-task learning and task-dependent prompting to handle clean and noisy signals. The model’s flexibility, robustness, and extensibility enable effective speech generation and transformation. Experimental results demonstrate SpeechX’s effectiveness across tasks, achieving comparable or superior performance to specialized models. The model’s unique capabilities include preserving background noise during editing and leveraging transcriptions for enhancement.

Thank you for reading today’s edition.

Your feedback is valuable.

Respond to this email and tell us how you think we could add more value to this newsletter.