AI Breakfast
Posts
Newest AI Text-to-Video Features

Newest AI Text-to-Video Features

AI Breakfast
September 11, 2023

Good morning. It’s Monday, September 11th.

Did you know: OpenAI CEO Sam Altman is also behind a controversial new cryptocurrency Worldcoin?

In today’s email:

AI Development and Competition
Emerging Technologies
AI Safety and Ethics
AI Job Market Trends
AI in Construction and Sustainability
AI in Media and Content Creation
AI in Finance and Trading
AI in E-commerce
5 New AI Tools
Latest AI Research Papers

You read. We listen. Let us know what you think of this edition by replying to this email, or DM us on Twitter.

Today’s trending AI news stories

AI Development and Competition

Meta sets GPT-4 as the bar for its next AI model, says a new report: Meta is reportedly gearing up to train a new AI chatbot model, aiming to rival OpenAI’s GPT-4. The company is acquiring AI training chips and expanding data centers to develop a sophisticated chatbot. The model is expected to begin training in early 2024, with CEO Mark Zuckerberg emphasizing its availability for creating AI tools. Meta aims to reduce reliance on Microsoft’s Azure cloud platform and accelerate AI tool development for human expression emulation. This move aligns with Meta’s generative AI efforts, including rumored AI ‘personas.’ The company faces competition in the AI space from Apple, Google, Microsoft, and Amazon.

Russian AI bot shows larger potential than ChatGPT: Russian AI bot, YandexGPT, is claimed to have shown greater potential than the US-based OpenAI’s ChatGPT. Dmitry Masyuk, the head of IT giant Yandex’s search and ad technologies business group, stated that YandexGPT’s “basic model steadily surpasses” ChatGPT3.5 when it comes to generating responses in Russian. He also mentioned that it outperforms ChatGPT in several cases and even offers improved answers in English when compared to the US chatbot LLama 2 7b. However, he noted that both systems excel in different areas, making direct comparisons challenging.

Emerging Technologies

Move over AI, quantum computing will be the most powerful and worrying technology: Quantum computing is set to become the dominant technology, with 2023 as the pivotal “reset year” for its development. Quantum computers, operating with qubits instead of traditional binary bits, promise unparalleled computational power. They excel at optimization and probabilistic simulations, impacting logistics, healthcare, finance, and more. However, challenges like error correction and qubit isolation remain. Despite short-term hype, quantum computing’s long-term potential is expected to revolutionize multiple industries and reshape global dynamics. Daniel Doll-Steinberg, co-founder of EdenBase, emphasizes the importance of understanding and addressing quantum computing’s security and operational implications.

AI Safety and Ethics

In their quest for AGI safety, researchers are turning to mathematical proof: Researchers Max Tegmark and Steve Omohundro propose enhancing AI safety by using mathematical proofs and formal verification. This approach would require AI systems to provide mathematical proof of safety before taking actions, preventing them from performing unsafe or non-beneficial actions. While technical challenges exist, advances in machine learning for automated theorem proof offer optimism for implementing this concept effectively.

OpinionGPT demonstrates the impact of training data on AI bias: Researchers at Humboldt University of Berlin have introduced OpinionGPT, a language model designed to illustrate the influence of training data on AI models. The model, based on Meta’s 7 billion parameter LLaMa V1 model, was fine-tuned using Reddit data from thematic subreddits related to specific social dimensions like politics, geography, gender, and age. OpinionGPT allows users to explore biases present in AI models and provides nuanced responses based on the biases it has learned. However, the researchers acknowledge the model’s limitations, primarily reflecting the Reddit variant of demographics. Future versions aim to address these challenges by representing combinations of different biases for more nuanced responses.

AI Job Market Trends

AI expert is a hot new position in the freelance jobs market: The freelance job market is experiencing a surge in demand for AI experts, with generative AI-related job postings growing by nearly 250% from July 2021 to July 2023, according to Indeed. LinkedIn reports a 21-fold increase in job postings referencing OpenAI’s “GPT” or “ChatGPT” since November 2022. As AI technology continues to reshape industries, businesses seek skilled freelance developers to integrate AI into their platforms, creating opportunities for AI professionals. A LinkedIn survey indicates that 44% of U.S. executives plan to expand their use of AI, highlighting the growing importance of AI skills in the job market.

AI in Construction and Sustainability

MIT student uses AI to design buildings with less concrete: MIT student Jackson Jewett is using AI for topology optimization to design more efficient and less material-intrusive concrete structures. By employing algorithms that create structures meeting performance requirements while minimizing resource consumption, Jewett aims to reduce carbon emissions from the construction industry, which is responsible for roughly 8% of global CO2 emissions. His work could contribute to curbing emissions and making construction more sustainable.

AI in Media and Content Creation

Text-to-Video AIs Runway Gen-2 and Pika Labs get new features: RunwayML and Pika Labs are updating their AI-driven video systems. RunwayML’s Gen-2 now includes camera control features that allow selective zooming and control over camera movement. Pika Labs introduces image animation and a frame rate increase to 24 frames per second for smoother videos. Pika Labs, a Runway competitor, has gained around 160,000 users on Discord and received $15 million in funding, including investments from former GitHub CEO Nat Friedman, since its founding in April.

AI in Finance and Trading

Nasdaq receives SEC approval for AI-based trade orders: Nasdaq has received approval from the United States Securities and Exchange Commission (SEC) to operate its AI-driven order type, the dynamic midpoint extended life order (M-ELO). This AI system uses real-time reinforcement learning to execute orders, potentially speeding up trade execution. During research and testing, dynamic M-ELO demonstrated a 20.3% increase in fill rates and an 11.4% reduction in mark-outs. By analyzing market conditions and optimizing holding periods in real-time, this AI-driven order type aims to improve fill rates without increasing market impact. Nasdaq continues to integrate AI into its financial services offerings.

AI in E-commerce

‘Magical’ Listing Tool Harnesses the Power of AI to Make Selling on eBay Faster, Easier, and More Accurate: eBay has introduced a new AI-powered listing tool that allows sellers to create detailed listings quickly and easily. Using AI, the tool can analyze images and generate titles, descriptions, and additional item information, including product release date and category. This feature aims to simplify the listing process for sellers and improve the shopping experience for buyers. eBay has been integrating AI technologies into its marketplace to make buying and selling more efficient and accurate. The image-based listing tool is part of eBay’s ongoing efforts to enhance user experiences with AI.

5 new AI-powered tools from around the web

Cosmos is an intelligent video cataloging tool powered by AI, simplifying footage organization and search. It aids in discovering engaging scenes, offers content editing capabilities, and provides updates and tutorials on its latest developments.

G-Prompter empowers users to craft personalized AI prompters, offering diverse artistic styles and self-training options. It also offers the option to access OpenAI’s API for preformatted queries and platform support, enhancing its versatility.

Algomo offers an AI-driven ChatGPT-like chatbot for websites, reducing customer service queries by up to 85%. Featuring a quick no-code setup, it learns from your data, integrates with major tools, and supports various languages. Algomo’s unique features include AI-human agent collaboration, optimization of GPT 4, automatic escalation, and website scalability.

Second offers automated codebase migrations and upgrades with AI agents, streamlining the digital transformation process for businesses. Users can connect Second to their GitHub repos, execute modules, and receive pull requests, reducing developer time spent on maintenance. It supports various codebase transformations and plans to introduce an SDK for community module development.

Loom AI is an AI-driven video messaging tool that enhances workplace productivity. It offers AI-generated titles, summaries, chapters, and task features for improved video message creation and management. Additional functionalities include filler word removal, silence removal, voice, and cam avatars, enhancing overall efficiency.

arXiv is a free online library where scientists share their research papers before they are published. Here are the top AI papers for today.

📄 Towards Practical Capture of High-Fidelity Relightable Avatars

The paper introduces TRAvatar, a practical framework for capturing high-fidelity 3D avatars in diverse lighting conditions. It combines efficient data capture in a specialized Light Stage with a novel network architecture designed to handle linear lighting responses. The approach disentangles dynamic geometry and reflectance, enabling real-time avatar animation and relighting. The network is trained on image sequence, avoiding explicit tracking, and offers superior performance in terms of photorealistic avatar creation and relighting. The paper discusses related work and the design of the capture apparatus. It concludes with an explanation of the training framework, network architecture, data capture, and loss functions.

📄 Large-Scale Automatic Audiobook Creation

The paper introduces a groundbreaking system for large-scale automatic audiobook creation. By leveraging advances in neural text-to-speech and scalable computing, the system can generate thousands of high-quality, open-license audiobooks with various speaking styles and voices, including user-generated voices. The result is over five thousand open-license audiobooks, contributing approximately thirty-five thousand hours of speech to the open source. A live demonstration app enables users to create personalized audiobooks quickly. This innovative system enhances audiobook accessibility and availability.

📄 ProPainter: Improving Propagation and Transformer for Video Inpainting

ProPainter, a novel video inpainting framework, combines dual-domain propagation and an efficient mask-guided sparse video Transformer. Unlike previous methods, ProPainter performs global image propagation with GPU-based flow consistency checks and uses flow-based deformable alignment for feature propagation, addressing issues of texture misalignment and blurry results. Additionally, it introduces an efficient video Transformer that reduces computational complexity and memory usage. ProPainter outperforms existing methods by a significant margin in terms of PSNR, providing an effective and efficient solution for video inpainting while allowing long-range propagation and attention. These innovations offer valuable insights for the video inpainting community.

📄 Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts

The paper introduces sparse Mobile Vision Mixture-of-Experts (V-MoEs) as a method to scale down Vision Transformers (ViTs) for resource-constrained vision applications. Sparse MoEs enable model size decoupling from inference efficiency by routing entire images to experts rather than individual patches. A stable training procedure using super-class information guides the router. Empirical results demonstrate that Mobile V-MoEs achieve a superior trade-off between performance and efficiency compared to dense ViTs, with improvements up to 4.66% on ImageNet-1k. The proposed approach presents a promising solution for efficient and mobile-friendly vision models, addressing the challenges of resource-constrained environments.

📄 From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting

The study explores the balance between informativeness and readability in text summarization. Using a Chain of Density (CoD) prompt, GPT-4 generates summaries with varying levels of density by iteratively adding entities from the source text without increasing the length. Human preferences indicate that an intermediate level of density is favored, striking a balance between clarity and informativeness. Automatic metrics align with human judgments, with Step 4 summaries being rated the highest. This research provides insights into the trade-off between densification and readability in summarization. The datasets and code are available for further study.

Thank you for reading today’s edition.

Your feedback is valuable.

Respond to this email and tell us how you think we could add more value to this newsletter.