AI Breakfast
Posts
Forget smartwatches, Microsoft may make a backpack with an AI assistant

Forget smartwatches, Microsoft may make a backpack with an AI assistant

AI Breakfast
September 01, 2023

Good morning. It’s Friday, September 1st.

Did you know: Interested in the process of making YouTube shorts with AI? Here’s a great blog post on Medium detailing the step-by-step process.

In today’s email:

Wearable and Hardware Innovations
AI in Communication and Meetings
AI in Education
Computer Vision and Models
Research and Innovations
Business and Valuation
Legal and Ethics
5 New AI Tools
Latest AI Research Papers

You read. We listen. Let us know what you think of this edition by replying to this email, or DM us on Twitter.

Today’s trending AI news stories

Wearable and Hardware Innovations

Forget smartwatches, Microsoft may make a backpack with an AI assistant Microsoft has filed a patent for an AI-assisted wearable backpack that could use sensors to scan an area and AI to provide responses to users based on what was around that individual. The device could be used for things like providing directions, identifying objects, or even helping skiers choose the best path down a mountain. However, it is important to note that many patents never see the light of day, so it is not yet clear if Microsoft will actually create this product.

Nvidia and AMD say they face new restrictions on AI chip sales Nvidia and AMD are facing new restrictions on AI chip sales in some Middle Eastern countries, expanding beyond China and Russia. While both companies haven’t specified which countries are affected, these restrictions involve a subset of Nvidia’s top-end chips. The Biden administration clarified it hadn’t “blocked” chip sales to the Middle East, with details on specific licenses not publicly disclosed. Nvidia anticipates no immediate significant impact, while AMD has not commented. These restrictions are part of wider export controls and may affect Nvidia’s A100 and H100 chips, which have been subject to previous export restrictions.

Qualcomm banks on AI to get a bigger share of the automotive chip market Qualcomm aims to increase its presence in the automotive chip market by leveraging generative AI. The company showcased how its chips can power AI-driven car assistants, such as helping drivers with daily tasks. While automotive currently constitutes just 3% of Qualcomm’s revenue, the company anticipates over $9 billion in sales from this sector by 2031. Qualcomm faces competition from chipmakers like Intel and Nvidia in the automotive chip market. The company demonstrated scenarios where its chips enable AI-driven car assistants, highlighting its ambition to become an AI-focused company in the automotive sector.

Intel Reveals Two New Xeon Processor Lines at Hot Chips 2023 Intel introduces two new Xeon processor lines, P-cores for high performance and E-cores for efficiency, catering to diverse server-level needs. The Sierra Forest and Granite Rapids chips, based on Intel 3 process node, offer improved power efficiency or core performance respectively. With applications in AI and machine learning, these processors aim to enhance performance per watt and throughput for server-grade applications. Intel’s advancements provide flexibility in data centers and are expected to roll out in the first half of 2024, impacting the Xeon line’s performance.

AI in Communication and Meetings

Duet AI for Google Meet can take notes, summarize, and even attend meetings Google Meet introduces AI-powered features to enhance meetings. Duet AI can take real-time notes, generate summaries, and allow private chats during meetings. Users can also let Duet “attend” meetings on their behalf, generating discussion points. While these features promise convenience, their effectiveness in accurately capturing meeting content remains uncertain. Google aims to further improve Meet’s features with dynamic layouts and teleprompter support. AI integration marks Google’s commitment to advancing video conferencing, addressing the evolving needs of remote work and hybrid environments.

AI in Education

OpenAI Introduces Special Tutor Prompts to Implement ChatGPT in Classrooms OpenAI aims to introduce ChatGPT into classrooms, offering educators guidelines to harness its potential responsibly. Despite concerns about plagiarism, OpenAI suggests applications like language support for English learners, formulating test questions, and promoting critical thinking skills. Teachers are encouraged to evaluate students’ work thoroughly and use ChatGPT as a supportive tool rather than a shortcut for answers. The company acknowledges the challenge of distinguishing AI-generated content and provides comprehensive prompts to guide its use as a tutor or assistant in educational settings. Responsible integration can ensure AI tools enhance, rather than hinder, the learning experience.

Animoca subsidiary builds AI and NFT tools for educators TinyTap, an ed-tech subsidiary of Animoca Brands, introduces new AI and NFT tools for educators. The AI integration allows educators to create educational games and images based on prompts, enhancing content creation for personalized learning experiences. The tools draw on over a decade within the TinyTap system, with plans to expand further. The company also aims to partner with Open Campus to allow NFT holders to mint TinyTap games into NFTs, injecting new liquidity into education. AI and NFTs are seen as transformative elements in the future of education.

Computer Vision and Models

DINOv2: Meta's foundational model for computer vision is now open source Meta has released DINOv2, a self-supervised computer vision model, as open source under the Apache 2.0 license. DINOv2 is designed for various computer tasks like semantic image segmentation and monocular depth estimation. Alongside DINOv2, Meta introduced FACET (FAirness in Computer Vision EvaluaTion), a benchmark for assessing fairness in computer vision models. This dataset includes 32,000 images of 50,000 people, focusing on demographic attributes and physical features. FACET aims to become a standard for evaluating fairness in computer vision models, promoting inclusivity in AI applications.

Research and Innovations

AI predicts chemicals’ smells from their structures Researchers have developed an AI system capable of describing the smells of compounds by analyzing their molecular structures. This system can assign descriptive words like “fruity” or “grassy” to various chemical structures, potentially assisting in the design of synthetic scents and offering insights into how the human brain interprets odors. The AI identified correlations between chemical structures and smells, creating a principal odour map. In tests, the AI predictions closely matched human descriptions, making it a valuable tool in industries like food and cleaning products. However, it does not explain the biological processes behind human smell perception.

AI-powered drone beats human champion pilots An AI-powered drone algorithm named Swift has beaten human world champion drone racers in head-to-head races. Developed by researchers at the University of Zurich, Swift won 15 out of 25 races against human champions and even achieved the fastest lap on a 3D race course. The algorithm uses deep reinforcement learning to navigate the course, processing video data and sensor readings to calculate optimal commands. While Swift’s success has implications for AI in real-world challenges such as search and rescue, its military applications remain uncertain.

Business and Valuation

Generative AI startup AI21 Labs lands $155M at a $1.4B valuation AI21 Labs, a Tel Aviv-based generative AI startup, has secured $155 million in Series C funding round, valuing the company at $1.4 billion. The funding round was led by investors including Walden Catalyst, Pitango, SCB10X, b2venture, and Samsung Next. AI21 Labs specializes in developing text-generating AI tools, and this substantial investment will likely support the company’s efforts to advance its AI technology and expand its operations.

Legal and Ethics

OpenAI disputes authors’ claims that every ChatGPT response is a derivative work OpenAI responds to class-action lawsuits from authors who claim ChatGPT was trained on pirated copies of their books. OpenAI seeks dismissal of claims including vicarious copyright infringement, DMCA violation, and more, arguing the authors misunderstand copyright law and transformative nature of AI models. OpenAI asserts its use of copyrighted materials for innovative language models falls within legal boundaries, citing precedents. The company aims to clarify that not every ChatGPT output is a derivative work and challenges the authors’ claims on legal and factual grounds.

Google's new AI-powered search results are ripping off news sites Google’s new AI-generated search result summaries are drawing criticism for surfacing unattributed content from news sites. The AI-generated digests are designed to provide users with quick overviews of search results without needing to leave the search page. Critics argue that these summaries amount to content theft and might encourage media organizations to place more of their work behind paywalls. Media companies are concerned about the impact of their credibility and potential revenue, prompting some to seek payment from AI companies to use their content for training language models.

🎧 Did you know AI Breakfast has a podcast read by a human? Join AI Breakfast team member Luke (an actual AI researcher!) as he breaks down the week’s AI news, tools, and research: Listen here

5 new AI-powered tools from around the web

Checklist Generator AI introduces AI Checklist Generator, transforming checklist creation with AI automation. Say goodbye to manual tasks, access templates for diverse industries, and streamline processes from tax planning to software development.

Vscoped is an AI-powered transcription service that quickly converts video and audio content to text. With customizable styles, it ensures accurate representation while reflecting individual voice and brand. The platform offers embedded subtitles, versatile editing, formatting tools, and multilingual capability, catering to diverse language needs.

EdTools revolutionizes education with an array of innovative web apps for teachers, students, and parents. It optimizes classroom management, task handling, and communication. Personalized student dashboards showcase assignments and assessments. AI-powered tools enhance efficiency. Parent communication is facilitated through a dedicated dashboard, streamlining school routines.

Tidio’s Lyro AI chatbot empowers small and medium businesses with personalized customer assistance, improving response times and relieving support teams. Built on AI models like Claude LLM, Lyro handles common queries, accelerating customer service while learning from interactions. Democratizing AI for SMBs, Lyro offers a game-changing approach to efficient customer support.

Sprig AI Analysis for Surveys revolutionizes survey data interpretation. Swift insights from AI-generated summaries, eliminating manual sorting. Custom queries and correlations boost comprehension. Tailored follow-up questions unveil hidden trends. Free plan includes surveys, replays, and AI analysis, deepening user insights for actionable improvements in product experiences.

arXiv is a free online library where scientists share their research papers before they are published. Here are the top AI papers for today.

📄 TouchStone: Evaluating Vision-Language Models by Language Models

The paper proposes a new evaluation method for large vision-language models (LVLMs). The method, called TouchStone, is based on a comprehensive visual dialogue dataset that covers five major categories of abilities and 27 subtasks. The authors argue that existing evaluation methods for LVLMs are not comprehensive enough. They focus on recognition comprehension abilities, but neglect visual storytelling abilities. TouchStone addresses this limitation by evaluating LVLMs on a wider range of tasks, including basic descriptive ability, visual recognition ability, visual comprehension ability, and multi-image analysis ability. They also argue that existing evaluation methods are not objective enough. They rely on human evaluation, which can be subjective and time-consuming. TouchStone uses a language model as a judge, which is more objective and scalable.

📄 Learning Vision-based Pursuit-Evasion Robot Policies

The paper introduces a new approach for training vision-based pursuit-evasion robot policies under real-world constraints. It employs privileged learning, leveraging a fully-observable policy for a partially-observable one. The study explores strategies for handling partial observability and uncertainty, with a focus on pursuit-evasion interactions. The synthesized policy showcases behaviors like information gathering, intent prediction, and anticipation, demonstrated on a physical quadrupled robot equipped with an RGB-D camera. The work bridges the gap between strategic robot behavior and real-world, multi-agent interactions, paving the way for more advanced autonomous systems.

📄 RoboTAP: Tracking Arbitrary Points for Few-Shot Visual Imitation

RoboTAP introduces a novel approach to teaching robots new tasks rapidly. It leverages dense tracking (TAPIR) to extract meaningful motion patterns from demonstrations. The system autonomously identifies active points for each task, formulates motion plans, and executes them using a visual-serving controller. RoboTAP demonstrates success in various complex manipulation tasks, even with a few demonstrations. The approach avoids task-specific engineering and provides a versatile solution for quick task onboarding. While limited to visual control and single-plan execution, RoboTAP offers potential applications for efficient data-gathering and real-world problem-solving. Further exploration includes integrating with larger-scale models.

📄 LM-INFINITE: Simple on-the-Fly Length Generalization for Large Language Models

The paper addresses the length generalization problem in Transformer-based Large Language Models (LLMs). LLMs struggle with longer sequences due to issues with relative positional encodings. The authors propose LM-infinite, a simple solution involving a Λ-shaped attention mask and bounded distances, without parameter updates. Theoretical and empirical analysis reveals the factors contributing to generalization failures, including unseen distances and implicitly encoded positional information. LM-Infinite is shown to maintain fluency and generation quality for longer sequences, outperforming fine-tuning. It provides a practical approach to leveraging LLMs on extended contexts efficiently, improving their performance on downstream tasks.

📄 LLASM: Large Language and Speech Model

The paper introduces the Large Language and Speech Model (LLaSM), a multi-modal model that combines speech and text processing for improved human-AI interaction. LLaSM is trained using an end-to-end approach, aligning speech and text embeddings through a modal adaptor. It offers cross-modal conversational capabilities, allowing it to understand and follow speech-language instructions. The model’s effectiveness is demonstrated through experiments, highlighting its natural and convenient interaction with users. The authors release a comprehensive speech-text cross-modal instruction-following dataset, LLaSM-Audio-Instructions, and provide code and a demo. LLaSM’s potential integration with visual modalities is suggested for future work.

Thank you for reading today’s edition.

Your feedback is valuable.

Respond to this email and tell us how you think we could add more value to this newsletter.