• AI Breakfast
  • Posts
  • Stable Diffusion XL: Next-Gen Text-to-Image

Stable Diffusion XL: Next-Gen Text-to-Image

...and how AI is saving pink dolphins

Good morning. It’s Friday, July 28th.

Now This: AI Breakfast is starting a Q&A feature for each edition! Are you looking for a specific AI tool? Curious about the state of open-source? Have a proclamation about AGI? Submit this form and your question may be featured in the next edition!

In today’s email:

  • Stable Diffusion XL 1.0: Next-gen text-to-image

  • Microsoft trials AI tech for Japan's Government

  • ServiceNow, NVIDIA, and Accenture boost generative AI for enterprises

  • Amazon's AI service competes with Microsoft and Google

  • Meta plans to charge cloud providers for AI tech

  • NVIDIA H100 GPUs now on AWS for leading-edge AI

  • G3PO: OpenAI's answer to Llama 2 AI

  • McKinsey predicts workforce changes due to generative AI

  • Intel CEO integrates AI in all products

  • AI helps track endangered pink dolphins

  • 5 New AI Tools

  • Latest AI Research Papers

You read. We listen. Let us know what you think of this edition by replying to this email, or DM us on Twitter.

This week’s editions are brought to you by:

MinervaAI: Empowering AML Compliance with AI-driven Solutions

Streamline and automate regulatory obligations, KYC, EDD, and ongoing monitoring, with auditable reports and reduced manual work.

Trusted by leading companies, MinervaAI offers a cloud-based risk assessment platform for real-time financial crime analysis and decreased risk.

Today’s trending AI news stories

Stable Diffusion XL 1.0 Launches: Next-Gen Text-to-Image Model with Fine-Tuning Feature: Stability AI has introduced its latest cutting-edge text-to-image model, Stable Diffusion XL (SDXL) 1.0, on Amazon Bedrock. This model boasts significant improvements in color output, contrast, lightning, and shadows, along with faster image processing capabilities. It can be accessed through Stability AI's API, GitHub, Clipdrop, and DreamStudio applications. Users can now take advantage of a new beta feature that enables fine-tuning image generation using as few as five images. With a 3.5B parameter base model and a 6.6B parameter model ensemble pipeline, SDXL 1.0 emerges as one of the most potent open-access image models in the market.

Microsoft to Trial AI Technology for Japan's Government Operations: Japan's Digital Agency will be testing AI technology, backed by Microsoft-supported startup OpenAI, for various governmental tasks, including preparing minutes and analyzing government statistics. This marks the first instance of Microsoft AI technology being deployed outside of Europe.

ServiceNow, NVIDIA, and Accenture Collaborate to Boost Generative AI in Enterprises: ServiceNow, NVIDIA, and Accenture have launched the AI Lighthouse program, aimed at accelerating the development and adoption of generative AI capabilities in enterprises. By combining ServiceNow's enterprise automation platform, NVIDIA's AI supercomputing and software, and Accenture's consulting and deployment services, the program aims to assist customers in designing and implementing generative AI use cases. This collaboration will empower enterprises to enhance workflows, increase developer productivity, and obtain intelligent search results through generative AI technology.

Amazon's AI Service Draws Thousands of Users, Challenging Microsoft and Google: Amazon's cloud division, AWS, is successfully competing with Microsoft and Google in the AI space, attracting thousands of customers to try out its AI service, AWS Bedrock. Companies like Sony, Ryanair, and Sun Life have already tested this service, and Amazon plans to make it available to all customers in the near future.

Meta Plans to Charge Cloud Providers for AI Technology: Meta Platforms Inc. is considering charging major cloud computing companies, including Microsoft, Amazon, and Google, for the resale of its large language model, Llama 2. While this move may not generate significant revenue initially, Zuckerberg believes it could become a long-term revenue source.

AWS Introduces Healthcare-Focused Generative AI Services: Amazon has launched AWS HealthScribe, an AI-powered platform designed to assist clinicians in transcribing and analyzing conversations with patients to create electronic health records. Powered by Bedrock, Amazon's platform for generative AI-powered apps, HealthScribe can identify speaker roles and segment transcripts into relevant clinical categories. Currently available for general medicine and orthopedics, HealthScribe allows clinicians to review AI-generated notes before finalizing records, emphasizing security and privacy aspects.

Samsung Shifts Focus to High-End AI Chips Amidst Memory Chip Production Cutbacks: Samsung Electronics is scaling back its memory chip production, including NAND flash, following a $3.4 billion operating loss in the second quarter of 2023. However, the company intends to concentrate on producing high-performance memory chips, such as high bandwidth memory, to meet the growing demand for AI, 5G, IoT, and graphics processing applications. Samsung predicts a gradual rebound in global memory chip demand in the latter half of the year.

NVIDIA H100 GPUs Now Accessible on AWS for Leading-Edge AI: NVIDIA H100 GPUs are now available on Amazon Web Services (AWS) through the new Amazon EC2 P5 instances, delivering unparalleled performance for generative AI models and more. Equipped with transformative features like Transformer Engines and Tensor Cores, the NVIDIA H100 provides powerful AI training and inference capabilities. The P5 instances are designed to reduce training times and enhance scalability in AI/ML and HPC workloads, promising up to six times faster training and a 40% decrease in training costs.

CrowdStrike to Potentially Acquire Bionic.AI in Multimillion-Dollar Deal: CrowdStrike, a leading cybersecurity company, is reportedly in advanced negotiations to acquire Bionic.AI for a sum ranging between $200 million and $300 million. Bionic.AI, a security-focused company, represents another notable merger and acquisition deal in the cybersecurity industry, signaling potential consolidation in the market.

AWS Summit in New York Unveils Cutting-Edge AI and ML Innovations: The recent AWS Summit in New York City showcased revolutionary announcements in the AI/ML field. Attendees were introduced to groundbreaking features, including the integration of foundation models with agents, AWS Entity Resolution for linking related records, and the introduction of Amazon EC2 P5 instances powered by NVIDIA H100 Tensor Core GPUs. Amazon QuickSight now boasts generative business intelligence capabilities, while AWS Glue Studio notebook powered by Amazon CodeWhisperer allows developers to build data integration jobs with AI companions, providing real-time recommendations.

OpenAI Introduces G3PO as Rival to Microsoft x Meta's Llama 2 AI: OpenAI's latest open-source project, G3PO, is expected to rival other LLMs, including Meta’s open-source Llama 2. While specific details about G3PO are currently undisclosed, it is believed to be OpenAI's response to the growing AI competition, particularly from the Microsoft x Meta AI partnership. However, the release date remains unknown as OpenAI is busy with other projects, such as the Superalignment initiative and the potential development of an app store and personalized ChatGPT.

McKinsey Report Predicts Major Workforce Changes Due to Generative AI: A new report by McKinsey predicts significant shifts in the workforce by 2030 due to generative AI. While it won't lead to mass unemployment, around 12 million workers are expected to switch to new fields. Proper training and early preparation, including high school training for STEM fields, are emphasized to prepare workers for these changes. The report also highlights that generative AI will automate around 30% of work hours, impacting various industries and job roles.

Universities Embrace AI as Cheating Prevention Challenges Mount: Several universities participating in a Senate inquiry have acknowledged the difficulty of preventing students from using AI to cheat on assessments. As a response, the tertiary sector is moving away from banning AI technologies and exploring alternative assessment methods. Generative AI is rapidly gaining popularity in academia, prompting universities to adopt new teaching and assessment approaches.

Stack Overflow Unveils OverflowAI to Enhance Platform Experience: Stack Overflow is set to revolutionize its platform with OverflowAI, a suite of generative AI capabilities for both its public site and Stack Overflow for Teams enterprise offering. The new capabilities include upgraded AI search functionality, providing conversational question capabilities and generating answers from Stack Overflow's extensive question-and-answer database. The goal is to complement the existing community-based model while empowering users of all experience levels to find relevant information efficiently. OverflowAI's alpha release is scheduled for August.

Intel CEO Announces AI Integration in All Intel Products: Intel CEO, Pat Gelsinger, announced the company's plans to integrate AI into all of Intel's products during the Q2 2023 earnings call. Meteor Lake, Intel's upcoming consumer chip, will be the first to feature a built-in neural processor for machine learning tasks. Intel aims to make AI a standard feature across its product range, including hearing aids and edge platforms for various use cases, to cater to clients' needs for localized AI processing and reduce reliance on cloud-based solutions.

Meta's AI-Powered Ad Sales Drive Share Surge: Meta experienced a significant surge in its shares, nearly 8%, following an optimistic revenue forecast. AI-powered ad sales were instrumental in driving engagement and revenue growth during the second quarter. Analysts praised Meta's "monster guidance" and its projected growth rate, solidifying its position as a leader in the digital advertising space. However, concerns persist over potential increased expenses in the AI race, leading to uncertainty in CapEx spending growth for 2024.

AI Enables Tracking of Endangered Pink Dolphins: Researchers are using AI technology to track and map the movements of endangered pink dolphins in the Amazon River. By training a neural network to recognize the unique clicks and whistles of the dolphins, AI technology provides a less invasive alternative to conventional tracking methods such as GPS tags or drones. The AI system can distinguish between the two dolphin species and offers valuable insights for conservation efforts.

Adobe's Photoshop Introduces "Generative Expand" AI Tool: Adobe has unveiled "generative expand" in Photoshop, allowing users to add AI-generated imagery beyond the borders of an image's original size. This new tool complements the existing "generative fill" feature, which generates imagery based on text prompts or existing elements in the images. Adobe aims to position AI as an assistant to humans rather than a replacement and has taken precautions to vet data and avoid sensitive areas such as recognizable people or trademarked products. The "generative fill" feature has already been extensively used during beta testing.

Tech Giants and Startup Collaborate to Form AI Regulatory Body: Google, Microsoft, OpenAI, and Anthropic have come together to establish the Frontier Model Forum, an organization dedicated to overseeing the responsible development of advanced AI models. The forum's primary focus is on AI technology that goes beyond current capabilities, with specific objectives such as AI safety research, responsible model deployment, and addressing trust and safety risks. Membership in the forum is open to organizations actively developing frontier models.

5 new AI-powered tools from around the web

ProApp offers an affordable, no-code AI-driven platform for learning design with interactive courses, design challenges, and mentorships to upskill users in various design fields. The app also features design resources, workshops, and user testimonials.

Hissab is a versatile text-based calculator app offering various functions like quick and complex calculations, math operations, date and time calculations, unit conversions, and binary operations. The pro version includes currency exchange rates, autocomplete suggestions, organization tools, and breakdowns of large unit values.

ezML offers a user-friendly cloud-based platform for easy integration of computer vision in apps. With no training data or machine learning code required, create custom vision systems in three simple steps.

Storyboard Hero’s AI Storyboard Generator allows video agencies and creators to quickly generate convincing video concepts and storyboards, reducing time and costs significantly. Seamlessly organize concepts, scripts, and AI-generated images into storyboards, and export PDFs with your branding.

Fillout is a user-friendly AI-powered form builder that allows users to create customizable forms easily. With an intuitive no-code editor and AI assistance, users can customize forms, and generate brand-matching images and themes. Moreover, Fillout’s AI automatically updates forms as needed. The tool is free to use, offering unlimited forms and 1000 submissions per month.

arXiv is a free online library where scientists share their research papers before they are published. Here are the top AI papers for today.

Carnegie Mellon University researchers created WebArena, which simulates real-world web scenarios to enhance agent performance in everyday tasks. The environment includes fully functional websites from various domains like e-commerce and social forums. Researchers release a diverse benchmark with 812 web-based tasks, assessing functional correctness in task execution. The results underscore the challenges in achieving robust agent performance and highlight the need for further advancements. WebArena is a significant step towards human-like problem-solving agents. Access to the environment is available for research purposes.

Researchers from NVIDIA Research, University of Toronto, and Stanford University have introduced “trajdata,” a unified interface addressing challenges in accessing multiple human trajectory datasets for trajectory forecasting research. The diverse data formats and APIs used in these datasets have hindered cross-dataset evaluations. trajdata offers a standardized representation and API, enabling seamless comparisons and comprehensive empirical evaluations. The researchers provide valuable insights for future dataset development.

Researchers from Google Research and Google DeepMInd are taking a bold step towards revolutionizing biomedical AI. Their brainchild, Med-PaLM Multimodal (Med-PaLM M), promises to be a game-changer in the medical world. By curating MultiMedBench, an extensive benchmark comprising 14 diverse biomedical tasks, including image interpretation and genomic variant calling, they put Med-PaLM M to the test. The results are astounding, with the model outperforming specialist counterparts in many cases. While further validation is needed, this ambitious project represents a significant leap toward the creation of versatile and powerful biomedical AI systems that may shape the future of healthcare.

The Points-to-3D framework introduces an innovative solution to the challenges of text-to-3D generation, as reported by researchers ast DAMO Academy, Alibaba Group. By bridging the gap between sparse 3D points and shape-controllable 3D content, this novel approach addresses the issues of view inconsistency and limited shape control. The framework utilizes knowledge distilled from both 2D and 3D diffusion models to guide the generation process. By incorporating controllable sparse 3D points and optimizing NeRF model with score distillation, Points-to-3D achieves remarkable improvements in realism, view consistency, and controllability for 3D content generation.

WavJourney leverages LLMs to create captivating audio content guided by text instructions. This innovative system generates a structured audio script, weaving together speech, music, and sound effects, based on textual descriptions of auditory scenes. The script is then converted into a computer program that utilizes task-specific audio generation models to bring content to life. WavJourney’s interactive and interpretable design fosters collaboration between humans and machines, offering new possibilities for creativity in multimedia content creation. Impressive real-applications including science fiction and education scenarios, have showcased WavJourney’s potential.

Thank you for reading today’s edition.

Your feedback is valuable.


Respond to this email and tell us how you think we could add more value to this newsletter.