- AI Breakfast
- Posts
- Midjourney 5.2 is Now Available
Midjourney 5.2 is Now Available
Good morning. It’s Friday, June 23rd.
Did you know: On this day in 27 years ago, the Nintendo 64 system was first released.
In today’s email:
Midjourney 5.2: AI Image Creation Gets an Upgrade
Stability AI debuts SDXL 0.9 for advanced AI image generation
YouTube set to launch AI-powered dubbing tool
Workplace surveillance enabled by Military AI tools
AWS announces $100M fund for generative AI startups
Cortical Labs plans to create brain cell-powered computers
New Turing test proposed by DeepMind's co-founder
AI system, neuroforecasting, accurately predicts pop hits
4 New AI Tools
Latest AI Research Papers
You read. We listen. Let us know what you think of this edition by replying to this email, or DM us on Twitter.
Today’s edition is brought to you by:
The most accurate AI representation of you, built on the highest security standards.
You control your data. You own your Doppl. Doppl uses cutting-edge machine learning tech to create the most accurate and interactive representation of a unique you via photos, videos, texts and audio.
Your AI twin, or Doppl, is a natural evolution of your digital identity. It also serves as a timeless memory of you. Sign up for our waitlist and be the first to receive the latest updates on our upcoming launch.
Today’s trending AI news stories
Midjourney 5.2: AI Image Creation Gets an Upgrade
Midjourney has announced the release of its 5.2 version, an update to its AI image generator that includes several enhancements long-awaited by the community.
One of the key updates in 5.2 is the introduction of 'Zoom out,' also known as 'outpainting' in other AI image generators. This feature allows the AI to expand on a previously created image, providing additional context to it.
For instance, in a portrait, users can now see more of the environment surrounding the subject. By default, Midjourney offers two zoom levels (1.5x and 2x), but allows advanced users to customize the zoom level and aspect ratio. Additionally, the 'Make Square' function can transform a non-square image into a square one using outpainting.
Along with these features, Midjourney 5.2 brings a new 'Aesthetics System' designed to produce sharper and more visually attractive images. Improvements in text comprehension mean that the AI image engine can now generate visuals that align more closely with the given text.
5.2 also introduces the '/shorten' command, which allows users to identify words in a prompt that have little to no effect. This helps users optimize their prompts more efficiently.
The updated 5.2 version is now available as a test version and is set as the default. Users wishing to disable it or revert to a previous version can use the '/settings' command in Discord.
Quick News
Stability AI launches SDXL 0.9: A Leap Forward in AI Image Generation Stability AI announces the launch of SDXL 0.9 with an impressive parameter count and the ability to generate hyper-realistic imagery. The model is now accessible via ClipDrop, with API availability coming soon. Researchers can apply for access to the model, and an open release is planned for mid-July with SDXL 1.0.
YouTube is getting AI-powered dubbing: YouTube is introducing an AI-powered dubbing tool to assist creators in dubbing their videos in multiple languages. The tool, developed by Aloud, transcribes, translates, and produces the dub based on the video’s content. Currently available in a limited number of languages, YouTube plans to enhance the tool by making translated audio tracks sound like the creator’s voice with improved expression and lip sync in the future. The feature is expected to be implemented in 2024.
Military AI’s Next Frontier: Your Work Computer: Military AI tools originally developed for intelligence purposes are now being sold to employers, enabling surveillance and tracking of employees. These tools, repurposed from authoritarian regimes, use advanced data analytics and deep learning to identify labor organizing, internal lakers, and company critics. The expansion of these surveillance techniques raises concerns about privacy and the abuse of power.
AWS launches $100M program to fund generative AI initiatives: Amazon Web Services (AWS) has launched the AWS Generative AI Innovation Center, a $100 million fund to support startups focused on generative AI. The program aims to connect AWS data scientists and engineers with customers and partners to accelerate innovation and success with generative AI.
This AI Startup Wants To Be The Next Nvidia By Building Brain Cell-Powered Computers: Cortical Labs, an Australian startup, is planning to reshape the AI industry with its human brain cell-powered computers. The company intends to commercially sell these biological computers, capable of tasks like playing video games, by the end of the year.
DeepMind's co-founder suggested testing an AI chatbot's ability to turn $100,000 into $1 million to measure human-like intelligence: Mustafa Suleyman, co-founder of DeepMind, has proposed a modern Turing test to measure an AI chatbot's intelligence. The test involves assessing the chatbot's ability to increase a $100,000 investment to $1 million, suggesting that AI capabilities extend beyond mere language proficiency.
AI Can Spot Pop Hits Better Than Humans: US researchers have developed neuroforecasting, an AI system capable of predicting hit pop songs with 97% accuracy, thereby outperforming human judgment. The system could potentially optimize song selection for playlists and may be applied to other forms of entertainment, such as movies and TV shows.
🎧 Did you know AI Breakfast has a podcast read by a human? Join AI Breakfast team member Luke (an actual AI researcher!) as he breaks down the week’s AI news, tools, and research: Listen here
4 new AI-powered tools from around the web
Waveline Extract is a powerful API for extracting data from documents, images, and PDFs using optical character recognition and AI. Easily extract information from various formats such as text, PDFs, and spreadsheets. Simply upload a document or enter text manually to get started.
Mokkup.ai is a free, cloud-based mock dashboard creation tool that helps analysts create aesthetic wireframes without any design experience. Visualize ideas, customize wireframe mockups, and bring them to life with over 100+ templates, drag-and-drop elements, and customizable screen sizes.
Remotebase 2.0 is a marketplace that connects startups with the top 1% of remote software engineering talent. Using an enhanced vetting process, Remotebase ensures that only skilled professionals make it into their talent pool. They have expanded their focus to include new roles, including AI engineers.
OMMM SPACE is a neuro-acoustic technology program designed to help you focus, relax, and sleep better. Combining voice-neuro feedback with traditional therapy effectively manages anxiety and panic attacks. Simply record your voice, listen with headphones, and experience immediate stress and anxiety relief.
arXiv is a free online library where scientists share their research papers before they are published. Here are the top AI papers for today.
Google AudioPaLM is set to redefine speech comprehension and generation. This groundbreaking language model excels at understanding and translating spoken and written language with exceptional accuracy and speed. It easily converts speech to text, offers real-time language translation, and even replicates voices in different languages. Expect this research to directly integrate into future Google products.
This preprint compares the inference performance of deep learning models on four edge platforms: NVIDIA Jestson Nano, Intel Neural Stick, Google Coral USB Dongle, and Google Coral PCIe. The study analyzes the models as feature extractors and highlights the fastest average inference times, particularly on Google platforms for newer models like MobileNet and EfficientNet. The Intel Neural Stick is identified as a versatile accelerator suitable for running various architectures. The findings aim to provide guidance to engineers in the development of AI edge systems.
RepoFusion is a framework proposed to enhance code completion accuracy by incorporating relevant repository context into language models. The framework utilizes Fusion-In-Decoder architecture and combines multiple retrieved repository contexts with the surrounding context to generate accurate predictions. The authors conducted experiments on single-line code completion tasks and found that RepoFusion, despite its smaller size, outperformed larger models trained on next-token prediction objectives. The study also explores various design choices and releases a dataset of Java repositories, Stack-Repo, along with code and trained checkpoints. The results demonstrate the effectiveness of training code models with repository context.
The paper examines the potential benefits and challenges of using LLMs in the context of Polis, a platform for scalable deliberation. The authors explore how LLMs can enhance the facilitation, moderation, and summarization of Polis conversation, enabling more efficient and insightful collective decision-making. They highlight the promise of LLMs in empowering the public through new methods of categorization and consensus-building. The paper also discusses the risks associated with LLMs and proposes principles and techniques for mitigating them. It concludes with suggestions for future research directions to further augment tools like Polis with LLMs.
This paper introduces FastSAM, a real-time solution for the segment anything task in computer vision. The proposed methods decouple the task into two stages: instance segmentation using a CNN-based detector and prompt-guided selection. By leveraging the efficiency of CNNs, FastSAM achieves comparable performance to the Segment Anything Model (SAM) while significantly reducing computational demands. Experimental results demonstrate its effectiveness and generalization performance on multiple benchmarks. The proposed approach offers a practical and high-speed solution for various vision tasks and highlights the potential of lightweight CNN models in complex vision applications.
Thank you for reading today’s edition.
Your feedback is valuable.
Respond to this email and tell us how you think we could add more value to this newsletter.