AI Breakfast
Posts
Test This New, Free AI Image Generator

Test This New, Free AI Image Generator

AI Breakfast
January 24, 2024

Sponsored by

Good morning. It’s Wednesday, January 24th.

In Partnership with MIT

Did you know: 40 years ago today, the original Macintosh was unveiled?

In today’s email:

Advances in AI Technology
AI in Business and Startups
AI in Public Sector and Defense
AI in Consumer Applications
5 New AI Tools
Latest AI Research Papers
ChatGPT Creates Comics

You read. We listen. Let us know what you think by replying to this email.

AI: Balancing Risk and Return

Innovative technologies are revolutionizing business as we know it, and they’re more accessible than ever. But to truly harness the transformative potential of AI, you need to know how and when to use it. And which pitfalls to avoid.

The six-week Artificial Intelligence: Implications for Business Strategy online short course from MIT Sloan School of Management and MIT Computer Science and Artificial Intelligence Laboratory explores AI’s business applications and challenges.

Choose this program to:

Optimize your business: Leverage AI, ML, and robotics to drive efficiencies, improve productivity, and support your growth.
Develop a strategic roadmap: Apply your knowledge to effectively integrate AI into your business.
Gain a dual perspective: Benefit from a course designed by two prestigious schools — the MIT Sloan School of Management and the MIT CSAIL.
Conveniently build career-critical skills: Follow a program that fits your schedule and benefit from 24/7 support and various payment options.

Today’s trending AI news stories

Advances in AI Technology

> Open-source PixArt-δ image generator spits out high-resolution AI images in mere seconds and is currently free to test. PixArt-δ integrates the Latent Consistency Model (LCM) and ControlNet, generating 1024 x 1024 pixel images in under 20 seconds. It offers enhanced precision in text-to-image synthesis and is efficient on GPUs with 8-GB VRAM. So far, PixArt-δ demonstrates superior image quality and control compared to Stability AI's SDXL Turbo, making it a significant competitor in the open-source image generator space.

> Google AI introduces Lumiere, a pioneering text-to-video model. This innovative tool transforms textual prompts into full video clips, revolutionizing video creation. Lumiere uniquely combines spatial and temporal downsampling, elevating video quality and length. It offers features like motion stylization, image-to-video conversion, and advanced editing capabilities. This breakthrough paves the way for versatile AI-driven content creation, reshaping the digital media landscape.

> Elon Musk, the CEO of Tesla and xAI, has announced the upcoming release of Grok-1.5, the latest version of their language model. Expected next month, this new iteration promises "substantial improvements," including reduced instances of erroneous information, often referred to as 'hallucinations' in AI parlance. Grok, launched by Musk's artificial intelligence company xAI last July, is currently accessible to X Premium Plus account subscribers. In addition to general use, Musk plans to integrate a smaller version of Grok into Tesla vehicles

AI in Business and Startups

> Scientists Laurent Sifre and Karl Tuyls, formerly of Google's DeepMind, plan to launch an AI startup, Holistic, in Paris, eyeing over 200 million euros in funding. The initiative, set to develop a novel AI model, marks a significant move in the competitive AI industry, where DeepMind, owned by Google, vies with major players like ChatGPT. The venture reflects a growing trend, similar to Mistral AI's recent successful funding, showcasing the dynamic expansion of AI enterprises by former DeepMind talents.

> The leaked final text of the EU AI Act, due for a vote in February, signals a boost for open-source AI in Europe. It exempts third-party developers providing AI tools, services, or components under an open license from certain compliance obligations, except for foundational models. This move aims to foster trustworthy AI systems within the EU by encouraging standard documentation practices like model maps and datasheets. The Act recognizes the potential of open-source software and data, including models, to spur innovation and economic growth within the Union.

> Google has terminated its contract with Appen, an Australian firm instrumental in training its AI models for Bard and other services. This decision, part of Alphabet's supplier review for efficiency, leaves Appen without a significant revenue stream. Appen's workers, crucial yet underpaid, face layoffs despite union efforts for fair wages. The move reflects industry challenges, with companies like Microsoft, Meta, and Amazon also utilizing such data firms for AI training.

> Chinese startup 01.AI founded by AI luminary Kai-Fu Lee, is gaining rapid prominence in the open-source AI landscape with its model Yi-34B, surpassing Meta's Llama 2 in several benchmark tests. Distinguished for its proficiency in Mandarin and English, Yi-34B is quickly becoming a developer favorite, topping AI model rankings on Hugging Face. Unlike conventional AI powerhouses, 01.AI, despite being a nascent company, encourages open-source development to build a robust developer community and foster innovative AI applications.

AI in Public Sector and Defense

> The Pentagon's advancements in autonomous technology are reshaping military strategy, as evidenced by recent breakthroughs in AI-powered swarm drones and ships. The Navy's 4th Fleet has been instrumental in this shift, utilizing combined air and sea drone operations in multinational exercises to enhance rapid enemy detection and neutralization. Concurrently, the Air Force is exploring the potential of AI in guiding high-performance drones, suggesting a transformative approach to aerial combat.

> AI and law enforcement: Police are exploring the fusion of DNA-based facial predictions with facial recognition technology to crack cold cases. Parabon NanoLabs, pioneering this AI-driven approach, generates 3D facial models from DNA at crime scenes, aiming to assist detectives. However, this approach, blending AI-generated facial phenotyping with facial recognition systems, sparks debates on scientific accuracy and ethical concerns, primarily focusing on the risks of misidentification and implications for civil liberties.

AI in Consumer Applications

> Google’s Chrome has a fresh a set of experimental generative AI capabilities, aiming to refine the user interface by introducing intelligent tab management, tailored AI-crafted themes, and supportive AI-driven writing tools for the web. The Tab Organizer features streamlines the grouping of tabs, automatically generating names and emojis to enhance organization. Moreover, Chrome now allows users to create bespoke browser themes through a sophisticated text-to-image diffusion model. Additionally, the forthcoming "Help me write" function is set to provide AI-guided support for composing web-based content, easing the writing process for users.

> ElevenLabs introduces its innovative Dubbing Studio, transforming video localization for global audiences in 29 languages. This sophisticated platform allows meticulous management of video translations, providing features like speaker identification and modifiable scripts. Users can fine-tune dialogue, ensuring precise accents and tones. It addresses the nuances of language translation, easily rectifying inaccuracies, such as the Spanish terms “banco” and “orilla.” The studio’s launch is accompanied by a multilingual showcase on YouTube demonstrating its broad linguistic capabilities.

5 new AI-powered tools from around the web

Composer IRA, an automated trading platform now enables active trading of stocks and ETFs in IRA accounts, leveraging real-time market signals. It offers personalized strategy creation, 1200+ community-built strategies, or AI-generated strategies, all executed automatically.

Auro Journal is a daily journaling iOS app converting voice entries into text, offering intelligent summaries, and providing insights through mood and tone analysis, fostering introspection and emotional understanding.

Is It You?, an AI-driven app, crafts a chatbot advocate that mimics your conversational style, offering a unique, interactive experience by answering questions and sharing with friends, fostering engaging and personalized communication.

PentaCue offers advanced SEC filings analysis with real-time EDGAR access, detailed citations, simultaneous document insight extraction, and a transparent AI algorithm, serving investors and finance professionals with accurate, up-to-date financial insights.

ChatPhoto, is an AI tool that converts image content into detailed text, enabling users to interact with photos through questions, extract text, gain insights, or create stories and captions in multiple languages.

arXiv is a free online library where researchers share pre-publication papers.

📄 DITTO: Diffusion Inference-Time T -Optimization for Music Generation

DITTO (Diffusion Inference-Time T-Optimization) is a versatile, training-free framework for precise control of pre-trained text-to-music diffusion models, enabling intricate music generation tasks like inpainting, outpainting, and structural adjustments. Leveraging gradient checkpointing for memory efficiency, DITTO manipulates initial noise latents to shape music creation, achieving superior performance in controllability, audio quality, and computational efficiency. This innovative approach significantly enhances the flexibility and quality of music generation, offering a wide array of creative possibilities in the field.

📄 Make-A-Shape: a Ten-Million-scale 3D Shape Mode

Make-A-Shape is a pioneering 3D generative model trained on over 10 million diverse 3D shapes, featuring a novel wavelet-tree representation for compact and expressive encoding. This framework efficiently generates high-quality 3D shapes, supporting a wide array of applications including conditional generation from images, point clouds, and voxels, and even shape completion. Despite its scale and complexity, Make-A-Shape efficiently generates intricate structures, plausible topologies, and clean surfaces in seconds, marking a significant advancement in the field of large-scale 3D generative modeling.

📄 StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice Conversion

StreamVoice is developed in collaboration between the Audio, Speech, and Language Processing Group (ASLP@NPU) at the School of Computer Science, Northwestern Polytechnical University, Xi’an, China, and ByteDance Inc. This innovative streamable, context-aware language model is designed for real-time, zero-shot voice conversion, enabling the conversion of source speech to any target speaker's voice promptly. ByteDance's involvement in the project brings together expertise from academia and industry, contributing to the pioneering features of StreamVoice, such as its fully causal language model and temporal-independent acoustic predictor, which facilitate streaming conversion without the need for complete source speech. The partnership underscores the commitment to advancing the frontiers of voice conversion technology, showcasing StreamVoice's capability to maintain comparable zero-shot performance to non-streaming VC systems in real-time applications.

📄 Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text

"Binoculars" is a zero-shot method to detect Large Language Model (LLM) generated text, utilizing two pre-trained LLMs to compute contrasting scores, significantly outperforming existing methods. It employs log perplexity and cross-perplexity to normalize detection, making it robust against varied prompts and contexts. Extensive evaluations demonstrate its reliability across multiple sources and languages, maintaining high precision even in challenging scenarios. While highly effective, the method's limitations and potential ethical implications are acknowledged, emphasizing the necessity for careful application and acknowledging the potential for evasion by sophisticated adversaries.

📄 Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding

Meta-prompting revolutionizes language model (LM) operations by transforming a single LM into a versatile conductor, guiding it to manage and synthesize responses from multiple expert model instances. This technique breaks down complex tasks into smaller subtasks, each tackled by expert models under specific instructions, while the central LM ensures cohesion and accuracy. Exceptionally effective in zero-shot, task-agnostic scenarios, meta-prompting simplifies user interaction and significantly enhances LM performance across diverse tasks by integrating high-level instructions and external tools like Python interpreters. Testing with GPT-4 demonstrates its superiority over traditional methods, marking a substantial leap in LM scaffolding techniques.

ChatGPT Attempts Comics

Thank you for reading today’s edition.

Your feedback is valuable. Respond to this email and tell us how you think we could add more value to this newsletter.

Interested in reaching smart readers like you? To become an AI Breakfast sponsor, apply here.