AI Breakfast
Posts
GPT-4V Access Rolled Out to Thousands

GPT-4V Access Rolled Out to Thousands

AI Breakfast
October 13, 2023

Good morning. It’s Friday, October 13th.

Did you know: GPT-4V was rolled out to the majority of ChatGPT Plus subscribers yesterday? Check to see if you have access.

In today’s email:

Technological Advancements and Applications in AI
AI Tools and Platforms in Software Development
AI in Education and Tutoring
AI and Health
AI in Customer Service and User Interaction
AI and Content Generation
AI and Media Entertainment
AI, Ethics, and Accuracy
Regional Focus on AI Development
5 New AI Tools
Latest AI Research Papers

You read. We listen. Let us know what you think of this edition by replying to this email, or DM us on Twitter.

Today’s trending AI news stories

Technological Advancements and Applications in AI

AI just got 100-fold more energy efficient: Northwestern University researchers have unveiled a groundbreaking nanoelectronic device that revolutionizes AI efficiency, performing real-time classification tasks without relying on energy-consuming cloud infrastructure. Consuming 100 times less energy than current technologies, the device enables rapid data processing within wearable, offering immediate medical diagnostics. With enhanced security and decreased reliance on conventional computing hardware, the device signifies a significant step toward sustainable AI integration.

AI reads text from an ancient Herculaneum scroll for the first time: A 21-year-old computer science student, Luke Farritor has made a groundbreaking discovery by reading Greek text in a carbonized scroll from the ancient Roman city of Herculaneum, buried by the eruption of Mount Vesuvius in AD 79. The Vesuvius Challenge, a global contest, awarded Farritor for reading over 10 characters on a small section of papyrus. The application of AI to decipher ancient texts is part of a broader trend, with AI tools aiding the study of languages from Korean to Akkadian, which was used in ancient Mesopotamia.

Google’s AI-powered search experience can now generate images, and write drafts: Google’s AI-powered search feature, the Search Generative Experience (SGE), has been enhanced to incorporate image generation and draft writing capabilities, enabling a more dynamic and interactive search process. With the conversational search mode, users can now engage with the search engine in a more intuitive manner, allowing for the creation of visual content and written drafts directly within the search experience.

Adobe introduces the next generation of Firefly image models at the MAX creative conference. These models are set to advance image editing, design, and vector graphics. Firefly Image 2 stands out for its capacity to generate photorealistic images, improving text input, and offering an extended range of creative control. The introduction of these models coincides with a new billing model for Creative Cloud plans. Finally, Vector Model, for vector graphics, provides easy vector graphic creation from text and Firefly Design supports template generation.

AI Tools and Platforms in Software Development

OpenAI plans major updates to lure developers with lower costs to make AI development more cost-effective and faster, targeting developers and businesses. They will introduce memory storage to developer tools, potentially reducing costs by up to 20 times. Vision capabilities will also be added for image analysis. The updates align with OpenAI’s ambition to become a leading developer platform, expanding beyond consumer applications. These new features are anticipated to be unveiled at OpenAI’s first developer conference in November, encouraging companies to use AI to build chatbots and autonomous agents. Developers can also expect a stateful API to reduce conversation history costs.

Docker Debuts Generative AI Stack and Docker AI, aimed at helping developers create generative AI applications with ease. This initiative combines Docker with Neo4j graph database, LangChain model chaining technology, and Ollama for executing large language models (LLMs). The GenAI stack can be used locally on developers’ systems and was unveiled at the Dockercon 23 conference in Los Angeles, furthering Docker’s commitment to facilitating AI development.

Replit AI has announced the release of Replit Code V1.5 3B, an open-source code generation language model, on Hugging Face. This model is designed for code completion tasks and offers extensive training data from permissively licensed sources, state-of-the-art results, multi-language support, and the incorporation of the latest AI techniques. It can be used as a foundational model for fine-tuning in application-specific contexts, and it has demonstrated impressive performance compared to larger models. Developers can utilize this model for code generation tasks, further enhancing AI’s role in software development.

AI in Education and Tutoring

AI tutor launched in Australia to help students get through exams: Australia’s Zookal has unveiled “Exam Prep Zookal Genius,” an AI-powered tutor designed to support students during high school exams at a fraction of the cost of human tutors. Prices at just $9.99 a month, the platform offers access to AI astronaut tutor “Zookie” and provides instant answers and results, rendering visual content, generating exam questions, and supporting a wide range of subjects. Zookal Genius is a departure from other AI tutors, developed over six years with integrated large language models, and plans to expand globally in 2024.

AI and Health

Scientists develop AI tool that predicts virus mutations: Harvard Medical School and University of Oxford researchers unveiled EVEscape, an AI tool predicting viral variants such as SARS-CoV-2. Published in Nature, the study showcases EVEscape’s capacity to identify concerning mutations, aiding in the development of effective vaccines and therapies. The tool’s potential extends to other viruses, offering vital insights for vaccine and therapy development.

AI in Customer Service and User Interaction

Microsoft Gears up for a Revolutionary Natural Language Customer Support AI: The company has filed a patent outlining its plans for an AI system to understand and effectively address customer support requests using natural language processing (NLP). This advanced AI system aims to revolutionize customer support by employing NLP technology to decode customer issues, generate text-encoding representations of problems, and match them with suitable operating procedures.

Character.AI introduces group chats where people and multiple AIs can talk to each other: The AI chatbot startup backed by a16z, has introduced a new feature that enables group chats where both people and multiple AIs can engage in conversations. This innovative feature extends the capabilities of Character.AI’s chatbot platform, allowing users to have more interactive and dynamic conversations with AI companions.

AI and Content Generation

Deloitte Digital’s Latest Research Forecasts Generative AI’s Transformation of Content Marketing. The study reveals that 26% of marketers are already using Generative AI, with an additional 45% planning to adopt it by the end of 2024. In a digital world, content marketing has become crucial, with demand growing significantly. Using Generative AI, marketers can meet the increasing demand for personalized content more efficiently, saving time and enhancing productivity. Deloitte Digital is at the forefront of this transformative shift, emphasizing the importance of high-quality, personalized content.

Box launches AI-focused Hubs for curated search: Deeply integrated with Box AI, Hubs allows users to organize documents, access critical information quickly, and simplify content creation. This approach addresses the challenge of navigating vast amounts of unstructured enterprise data, making it instantly discoverable and useful. Hubs can be used for various purposes, such as proving HR resources, sales enablement materials, or making any enterprise knowledge accessible and valuable. Box AI’s capabilities will roll out to Enterprise Plus customers in November, with a set number of queries available and options for larger-scale use cases.

AI and Media Entertainment

‘South Park’ Tackling AI for Next Event Special, Releases Teaser “South Park” is set to tackle the theme of artificial intelligence in its next special event, titled, “South Park: Joining the Paderverse.” The special explores AI’s impact on the characters’ lives and will be available on Paramount+. This marks their first special in over a year and continues the show’s tradition of addressing current societal issues with humor and satire.

AI, Ethics, and Accuracy

Meta shows how to reduce hallucinations in ChatGPT & Co with prompt engineering: Meta AI’s new method, Chain-of-Verification (CoVe), combats misinformation in language models like ChatGPT. CoVe prompts the chatbot to generate verification questions based on its initial response, independently comparing them to collected facts to prevent the adoption of false information. The method significantly enhances accuracy for list-based and long-form questions, outperforming newer models and even those with access to external facts. Future enhancements may integrate external knowledge, enabling the model to access external databases for verification questions.

Regional Focus on AI Development

New York wants to be AI's world capital: New York seeks to dethrone the Bay Area as the global tech capital and boost the generative AI industry in key sectors like finance, communications, media, law, and medicine. A 370-event “Tech Week” is set to kick off on October 16, showcasing the city’s tech potential. With around 370,000 tech jobs, New York’s tech sector outranks its finance industry, reflecting the city’s growing importance in the tech landscape.

5 new AI-powered tools from around the web

Blaze, the marketing tool for AI entrepreneurs, was voted #1 on ProductHunt. Blaze is the AI tool that helps teams-of-one create better content in half the time — all in their brand voice. Blaze supports an end-to-end process for creating AI-assisted marketing content in a truly modern document editor.

LLaVA (Large Language and Vision Assistant) is an open-source, large multimodal model adept at integrating vision and language understanding. It sets a new benchmark in accuracy for ScienceQA tasks, demonstrating impressive capabilities similar to vision multimodal GPT-4.

Animaker iOS 2.0 introduces the world’s first Avatar Maker for iOS & iPadOS, offering over a billion unique avatar possibilities and a wide array of assets. The app empowers users to create professional animated videos effortlessly, directly from their mobile devices, with enhanced customization and animation features.

BlazeSQL acts as a personal AI Data Analyst for SQL databases, catering to both non-technical and technical users. It generates SQL code, executes queries, resolves errors, and visualizes data, empowering swift dashboard creation. Available for Mac and Windows, it ensures secure and private data handling, simplifying data insights for users.

LongLLaMA [GitHub] is a powerful language model adept at handling extensive text contexts of up to 256,000 tokens. Built on OpenLLaMA and fine-tuned with the Focused Transformer method, it offers a 3B base variant under the Apache 2.0 license. Its notable feature lies in effectively managing longer contexts than its training data, accompanied by Hugging Face integration tools.

arXiv is a free online library where scientists share their research papers before they are published. Here are the top AI papers for today.

📄 MEMGPT: Towards LLMs as Operating Systems

MEMGPT introduces a pioneering approach for managing extended contexts in Large Language Models (LLMs). Inspired by traditional OS memory systems, it efficiently handles data beyond the fixed context limits. By effectively managing memory tiers, it achieves an illusion of infinite context within the limitations of LLMs. This novel solution significantly enhances performance in domains where fixed context length severely restricts existing LLMs, such as document analysis and multi-session chat. With a focus on dynamic control flow and self-directed memory management, MEMGPT sets a new precedent for enabling deep context understanding in LLMs. It represents a significant step forward in addressing the challenges of long-context tasks.

📄 Jigsaw: Supporting Designers in Prototyping Multimodal Applications by Assembling AI Foundation Models

The paper introduces Jigsaw, a prototype system enabling designers to leverage AI foundation models seamlessly for creative tasks. Using puzzle pieces as metaphors, it facilitates the combination of various foundation model capabilities across different modalities. The system comprises the Catalog Panel for model selection, the Assembly Panel for model assembly, and the Input and Output Panels for data handling. With an Assembly Assistant recommending model chains, Jigsaw helps designers explore, prototype, and document design ideas effectively. Addressing challenges such as model awareness, user-friendliness, model integration, and slow prototyping, Jigsaw aims to streamline the creative process with AI.

📄 LEMUR: Harmonizing Natural Language and Code For Language Agents

The paper introduces Lemur and Lemur-Chat, novel language models combining natural language and coding proficiency to serve as adaptable language agents. Meticulous pre-training using a code-intensive corpus and instruction fine-tuning on text and code data results in state-of-the-art performance across diverse benchmarks. Lemur-Chat’s harmonization of natural and programming languages significantly narrows the gap with proprietary models in agent capabilities. Comprehensive evaluations demonstrate their superiority over existing open-source models. The research emphasizes optimizing the synergy between natural language and coding for advanced language agents.

📄 LANGNAV: Language as a Perceptual Representation For Navigation

The research introduces “LangNav,” leveraging language as a navigational perceptual representation. It employs off-the-shelf vision models for text generation from visual input, facilitating trajectory synthesis and sim-to-real transfer. The approach utilizes prompted GPT-4 to generate synthetic trajectories, enhancing a smaller language model’s performance, surpassing vision-based counterparts in data-scarce scenarios. LangNav demonstrates superior capability, harnessing language as a robust perceptual medium for navigation. The findings suggest the potential of language-driven navigation for real-world applications, highlighting the viability of language as a powerful tool for addressing navigational challenges with limited data availability.

📄 PROMETHEUS: Inducing Fine-Grained Evaluation Capability In Language Models

In this work, the researchers propose PROMETHEUS, an open-source Large Language Model (LLM) designed for fine-grained text evaluation, addressing limitations posed by closed-source LLMs. Trained on the FEEDBACK COLLECTION dataset, PROMETHEUS demonstrates a strong Pearson correlation with human evaluators and outperforms ChatGPT significantly. By providing an accessible and reliable evaluation tool for diverse custom criteria, the team highlights PROMETHEUS’s capability as an effective evaluator. The team emphasizes the model’s reproducibility, cost-effectiveness, and its potential as a universal reward model. The team releases the code, dataset, and model for broader academic use.

Thank you for reading today’s edition.

Your feedback is valuable.

Respond to this email and tell us how you think we could add more value to this newsletter.

Did you know we wrote a book? Grab Decoding AI: A Non-technical Explanation of Artificial Intelligence today for just $2.99!