AI Breakfast
Posts
New "Groq" AI 10x Faster than GPT-4?

New "Groq" AI 10x Faster than GPT-4?

AI Breakfast
February 21, 2024

Good morning. It’s Wednesday, February 21st.

Did you know: On this day in 1962, John Glenn became the first American to orbit the Earth?

In today’s email:

Advancements in AI Models and Technology
Research and Scientific Breakthroughs
Policy, Governance, and Societal Impact
5 New AI Tools
Latest AI Research Papers
ChatGPT Creates Comics

You read. We listen. Let us know what you think by replying to this email.

Today’s trending AI news stories

Advancements in AI Models and Technology

> Fastest LLM yet? The Groq AI model has surged into the spotlight across social media, challenging the supremacy of ChatGPT and inviting comparisons to Elon Musk's Grok. Powered by its proprietary ASIC chip tailored for large language models (LLMs), Groq achieves impressive response rates, outpacing ChatGPT-3.5 by generating approximately 500 tokens per second, a significant leap from ChatGPT's 40. Developed by Groq Inc., this groundbreaking model introduces the industry's inaugural language processing unit (LPU), circumventing the reliance on expensive GPUs. Despite its recent surge in popularity, Groq Inc. has been in operation since 2016, striving to redefine AI processing and provide viable alternatives to conventional GPU-based models.

> ElevenLabs is expanding its technology to generate sound effects. The company, recently valued at over $1 billion, plans to use OpenAI's text-to-video tool to create audio like "waves crashing" based on text descriptions. This could greatly enhance the realism of AI-generated video. ElevenLabs was founded by Piotr Dabkowski and Mati Staniszewski, who were inspired by issues with dubbing Hollywood films into other languages. The company's technology could have broad applications in content creation.

> Adobe has integrated a new AI Assistant into its Acrobat and Reader PDF software. The AI Assistant, currently in beta, uses a conversational AI engine to answer questions about PDF documents and generate summaries. Users interact with the assistant through a dedicated interface within the software. It can cite specific information within a document and provide concise overviews. Adobe states that user data privacy is protected. The AI Assistant is currently included in Acrobat Standard and Pro subscriptions, with plans for a future paid add-on for full functionality.

> Meta has released the MMCSG (Multi-Modal Conversations in Smart Glasses) dataset, for research into automatic speech recognition and activity detection. Collected with Aria glasses, the dataset includes multi-channel audio (seven microphones), video, and inertial data (accelerometer, gyroscope). Participant data is anonymized for privacy. The MMCSG dataset, released under Meta's Data License could enable real-time language translation and highlights the company's broader AI research investments, including "Imagined with AI" labeling and the Artemis AI chip.

> An App-less Smartphone? Deutsche Telekom will unveil a concept AI phone at Mobile World Congress that aims to replace traditional apps with a single AI assistant. Powered by Brain.ai, the phone uses generative AI to predict and carry out user tasks based on voice or text commands. This eliminates the need to switch between multiple apps. While the concept will be demonstrated at the conference, a commercially available product may take time to develop. This approach is similar to the concept behind the Rabbit R1, another AI-powered device.

> Elon Musk's tweets suggest a potential partnership between X (formerly Twitter) and AI image generator Midjourney. This move signals a shift towards AI-powered content creation on social media platforms and could accelerate Musk's vision for X as an "Everything App." Industry experts see it as a possible strategic response to OpenAI's AI video tool, Sora, fueling rivalry speculations.

Research and Scientific Breakthroughs

> Can AI Determines Sex of Person From Brain Scans? A new study from Stanford Medicine demonstrates that an artificial intelligence model can accurately determine an individual's sex based on brain scans with over 90% accuracy. The research analyzes dynamic MRI scans, highlighting specific brain networks – including the default mode, striatum, and limbic networks – as key to distinguishing between male and female brains. This breakthrough could have significant implications for understanding sex-based differences in neurological and psychiatric disorders, potentially leading to more targeted treatment approaches.

> The All of Us initiative has achieved a notable milestone in advancing inclusive healthcare through its recent revelations. Examining over 245,000 genomes, it has pinpointed a staggering 275 million fresh genetic markers, shedding light on potential contributors to type 2 diabetes. Published across premier scientific journals, these revelations underscore the role of AI in precision medicine. By prioritizing diversity, the program confronts historical disparities in genetic research, ushering in a new era of more comprehensive healthcare solutions.

Policy, Governance, and Societal Impact

> Search Engines to plummet? According to Gartner, Inc., a seismic shift in the world of search engines is on the horizon. By 2026, traditional search engine volume is expected to plummet by 25%, with AI chatbots and virtual agents taking the reins. This forecast will be the focal point of discussions at the upcoming Gartner Tech Growth & Innovation Conference in Grapevine, Texas, on March 20-21. Alan Antin, Vice President Analyst at Gartner, underscores the growing influence of Generative AI (GenAI) solutions, which are poised to supplant traditional search engines for user queries. As this transition unfolds, the emphasis will pivot towards content quality and authenticity, prompting a strategic rethink in marketing channels.

> The House of Representatives has formed a bipartisan task force dedicated to artificial intelligence. House Speaker Mike Johnson and Minority Leader Hakeem Jeffries announced the 24-member group, led by Representatives Jay Obernolte and Ted Lieu of California. The task force aims to develop AI policy recommendations for Congress, addressing issues like deep fakes and misinformation. Members with diverse backgrounds will collaborate on principles and proposals. Johnson highlighted the need to understand AI's impact, while Jeffries emphasized ensuring the technology benefits everyone and is used responsibly.

^{In partnership with SCRIBE}

Automatically create step-by-step guides with Scribe

Scribe just bagged $25M in Series B funding to give you a well-deserved break from answering people’s questions all day. Instead, just create step-by-step guides (automatically, thanks to AI) to share with your team.

• Capture any process using the Chrome extension

• Easily customize steps and redact sensitive info

• Share with colleagues – and get back to your work

^{Thank you for supporting our sponsors!}

5 new AI-powered tools from around the web

Mindware facilitates AI agents’ internet access via an API gateway. Simplifies API integration, real-time data retrieval, and workflow automation.

Repeto.ai accelerates learning through AI assistance. Upload documents, engage with a knowledge-driven chatbot, utilize smart note-taking features, generate adaptive quizzes, and visualize complex topics.

Otio is an AI-driven research and writing platform for scholars. It captures academic content, organizes with AI, provides summaries and AI-assisted writing grounded in sources.

Onvo AI simplifies custom dashboard creation with its dashboard and report builder SDK powered by AI. It integrates multiple data sources enhancing data analytics capabilities.

Varolio unifies messages, leads, and tasks into an AI-powered inbox for accelerated deal closure. Automates responses, converts emails to CRM entries, and prioritizes leads securely.

arXiv is a free online library where researchers share pre-publication papers.

📄 AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling

AnyGPT introduces an any-to-any multimodal language model, integrating speech, text, images, and music with discrete representations for unified processing. It demonstrates promising results across various cross-modal tasks, proving the efficacy of discrete representations in unifying modalities within a large language model. Leveraging data-level preprocessing, AnyGPT facilitates integration of new modalities without altering existing architecture or training paradigms. It synthesizes AnyInstruct-108k, a comprehensive multimodal instruction dataset, enabling handling of arbitrary combinations of multimodal inputs and outputs. Despite stable training, challenges like higher loss and limited music modeling persist. Enhancing tokenizers and context length could improve comprehension and generative potential, fostering advancements in multimodal language models.

📄 Speculative Streaming: Fast LLM Inference without Auxiliary Models

The paper introduces Speculative Streaming, a method developed by Apple to accelerate the decoding of large language models without the need for auxiliary draft models. This technique leverages innovations such as multi-stream attention and parallel speculation and verification to achieve significant speedups across various tasks while maintaining generation quality. Notably, Speculative Streaming requires far fewer extra parameters compared to previous methods like Medusa, making it suitable for resource-constrained devices. By eliminating the need to manage multiple models during execution, Speculative Streaming offers a streamlined and efficient solution for decoding large language models, particularly in applications like Summarization, Structured Queries, and Meaning Representation.

📄 GLoRe: When, Where, and How to Improve LLM Reasoning via Global and Local Refinements

The paper investigates methods to bolster the reasoning capabilities of large language models (LLMs) through global and local refinements, focusing on when, where, and how to refine, within the context of Meta. It introduces Stepwise Outcome-based Reward Models (SORMs) to enhance the evaluation of intermediate reasoning steps, thereby improving refinement accuracy without human annotation. SORMs approximate the optimal policy by predicting the correctness of intermediate steps, providing guidance for refinement decisions. The study contrasts Outcome-based Reward Models (ORMs) and Process-based Reward Models (PRMs), highlighting ORM limitations in assessing intermediate steps. It proposes combining global and local refinements using ORM reranking, resulting in a significant enhancement of LLM accuracy on math tasks.

📄 LONGAGENT: Scaling Language Models to 128k Context through Multi-Agent Collaboration

The paper presents LONGAGENT, a method leveraging multi-agent collaboration to scale up large language models (LLMs) like LLaMA to process long texts of up to 128K tokens. It addresses the challenge of performance degradation in LLMs when processing inputs exceeding 100K tokens, known as "lost in the middle." LONGAGENT consists of a leader and multiple members, where the leader guides members in acquiring information from text chunks, resolves conflicts arising from model hallucinations, and deduces the final response. The approach outperforms GPT-4 in tasks such as 128K-long text retrieval and multi-hop question answering, showing promise in long-text processing. Additionally, a new evaluation benchmark, Needle in a Haystack PLUS, is introduced to comprehensively assess LLMs' long-text capabilities.

📄 A Touch, Vision, and Language Dataset for Multimodal Alignment

The paper introduces a dataset with 44K vision-touch pairs, blending English labels from humans and GPT-4V. It trains a tactile encoder for classification and a touch-vision-language model for generation, outperforming existing models by enhancing alignment. The TVL dataset facilitates multimodal understanding, addressing touch-language integration challenges. Unlike prior works limited to closed vocabularies, this dataset incorporates open-vocabulary language labels. The study also proposes a methodology for training from pseudo-labels, leveraging a vision-only model to generate textual labels for tactile data. Despite limitations, the research pioneers touch-vision-language alignment, offering insights for future multimodal research and applications in robotics and generative models.

ChatGPT Creates Comics

Thank you for reading today’s edition.

Your feedback is valuable. Respond to this email and tell us how you think we could add more value to this newsletter.

Interested in reaching smart readers like you? To become an AI Breakfast sponsor, apply here.