What is Generative AI?
What is Generative AI?
Generative AI, sometimes called gen AI, is artificial intelligence (AI) that can create original content—such as text, images, video, audio, or software code—in response to a user’s prompt or request.
What is generative AI?
Generative AI relies on sophisticated machine learning models called deep learning models—algorithms that simulate the learning and decision-making processes of the human brain. These models work by identifying and encoding the patterns and relationships in huge amounts of data, and then using that information to understand users’ natural language requests or questions and respond with relevant new content.
AI has been a hot technology topic for the past decade, but generative AI, and specifically the arrival of ChatGPT in 2022, has thrust AI into worldwide headlines and launched an unprecedented surge of AI innovation and adoption. Generative AI offers enormous productivity benefits for individuals and organizations, and while it also presents very real challenges and risks, businesses are forging ahead, exploring how the technology can improve their internal workflows and enrich their products and services. According to research by the management consulting firm McKinsey, one third of organizations are already using generative AI regularly in at least one business function.¹ Industry analyst Gartner projects more than 80% of organizations will have deployed generative AI applications or used generative AI application programming interfaces (APIs) by 2026.2
How generative AI works
- Training, to create a foundation model that can serve as the basis of multiple gen AI applications.
- Tuning, to tailor the foundation model to a specific gen AI application.
- Generation, evaluation, and retuning, to assess the gen AI application’s output and continually improve its quality and accuracy.
Training
Generative AI begins with a foundation model—a deep learning model that serves as the basis for multiple different types of generative AI applications. The most common foundation models today are large language models (LLMs), created for text generation applications, but there are also foundation models for image generation, video generation, and sound and music generation—as well as multimodal foundation models that can support several kinds of content generation.
The result of this training is a neural network of parameters—encoded representations of the entities, patterns and relationships in the data—that can generate content autonomously in response to inputs, or prompts.
Tuning
Fine tuning
Fine-tuning is labor-intensive. Developers often outsource the task to companies with large data-labeling workforces.
Reinforcement learning with human feedback (RLHF)
In RLHF, human users respond to generated content with evaluations the model can use to update the model for greater accuracy or relevance. Often, RLHF involves people ‘scoring’ different outputs in response to the same prompt. But it can be as simple as having people type or talk back to a chatbot or virtual assistant, correcting its output.Generation, evaluation, more tuning
Truly generative AI models—deep learning models that can autonomously create content on demand—have evolved over the last dozen years or so. The milestone model architectures during that period include:
- Variational autoencoders (VAEs), which drove breakthroughs in image recognition, natural language processing, and anomaly detection.
- Generative adversarial networks (GANs) and diffusion models, which improved the accuracy of previous applications and enabled some of the first AI solutions for photo-realistic image generation.
- Transformers, the deep learning model architecture behind the foremost foundation models and generative AI solutions today.
Variational autoencoders (VAEs)
An autoencoder is a deep learning model comprising two connected neural networks: One that encodes (or compresses) a huge amount of unstructured, unlabeled training data into parameters, and another that decodes those parameters to reconstruct the content. Technically, autoencoders can generate new content, but they’re more useful for compressing data for storage or transfer, and decompressing it for use, than they are for high-quality content generation.
Introduced in 2013, variational autoencoders (VAEs) can encode data like an autoencoder, but decode multiple new variations of the content. By training a VAE to generate variations toward a particular goal, it can ‘zero in’ on more accurate, higher-fidelity content over time. Early VAE applications included anomaly detection (e.g., medical image analysis) and natural language generation.
Generative adversarial networks (GANs)
GANs, introduced in 2014, also comprise two neural networks: A generator, which generates new content, and a discriminator, which evaluates the accuracy and quality the generated data. These adversarial algorithms encourages the model to generate increasingly high-quality outputs.
GANs are commonly used for image and video generation, but can generate high-quality, realistic content across various domains. They’ve proven particularly successful at tasks as style transfer (altering the style of an image from, say, a photo to a pencil sketch) and data augmentation (creating new, synthetic data to increase the size and diversity of a training data set).
Diffusion models
Also introduced in 2014, diffusion models work by first adding noise to the training data until it’s random and unrecognizable, and then training the algorithm to iteratively diffuse the noise to reveal a desired output.
Diffusion models take more time to train than VAEs or GANs, but ultimately offer finergrained control over output, particularly for high-quality image generation tool. DALL-E, Open AI’s image-generation tool, is driven by a diffusion model.
Transformers
First documented in a 2017 paper published by Ashish Vaswani and others, transformers evolve the encoder-decoder paradigm to enable a big step forward in the way foundation models are trained, and in the quality and range of content they can produce. These models are at the core of most of today’s headline-making generative AI tools, including ChatGPT and GPT-4, Copilot, BERT, Bard, and Midjourney to name a few
Transformers use a concept called attention—determining and focusing on what’s most
important about data within a sequence—to
- process entire sequences of data—e.g., sentences instead of individual words—
simultaneously; - capture the context of the data within the sequence;
- encode the training data into embeddings (also called hyperparameters) that
represent the data and its context.
In addition to enabling faster training, transformers excel at natural language processing (NLP) and natural language understanding (NLU), and can generate longer sequences of data—e.g., not just answers to questions, but poems, articles or papers—with greater accuracy and higher quality than other deep generative AI models. Transformer models can also be trained or tuned to use tools—e.g., a spreadsheet application, HTML, a drawing program—to output content in a particular format.
Benefits of generative AI
Enhanced creativity
Gen AI tools can inspire creativity through automated brainstorming—generating multiple novel versions of content. These variations can also serve as starting points or references that help writers, artists, designers, and other creators plow through creative blocks.Improved (and faster) decision-making
Generative AI excels at analyzing large datasets, identifying patterns and extracting meaningful insights—and then generating hypotheses and recommendations based on those insights to support executives, analysts, researchers, and other professionals in making smarter, data-driven decisions.Dynamic personalization
In applications like recommendation systems and content creation, generative AI can analyze user preferences and history and generate personalized content in real time, leading to a more tailored and engaging user experience.Constant availability
Generative AI operates continuously without fatigue, providing around-the-clock availability for tasks like customer support chatbots and automated responses.What generative AI can create
Text
Generative models, especially those based on transformers, can generate coherent, contextually relevant text—everything from instructions and documentation to brochures, emails, website copy, blogs, articles, reports, papers, and even creative writing. They can also perform repetitive or tedious writing tasks (e.g., drafting summaries of documents or meta descriptions of web pages), freeing writers’ time for more creative, higher-value work.
Images and Video
Image generation models such as DALL-E, Midjourney, and Stable Diffusion can create realistic images or original art. They can perform style transfer, image-to-image translation, and other image editing or enhancement tasks. Emerging generative AI video tools can create animations from text prompts and apply special effects to existing videos more quickly and cost-effectively than other methods.
Sound, Speech, and Music
Generative models can synthesize natural-sounding speech and audio content for voice-enabled AI chatbots, digital assistants, audiobook narration, and other applications. The same technology can generate original music that mimics the structure and sound of professional compositions.
Software Code
Generative AI can generate original code, autocomplete code snippets, translate between programming languages, and summarize code functionality. It enables developers to quickly prototype, refactor, and debug applications while offering a natural language interface for coding tasks.
Design and Art
Generative AI models can generate unique works of art and assist in graphic design. Applications include dynamic generation of environments, characters or avatars, and special effects for virtual simulations and video games.
Simulations and Synthetic Data
Generative AI models can be trained to generate synthetic data or synthetic structures based on real or synthetic data. For example, generative AI is applied in drug discovery to generate molecular structures with desired properties, aiding in the design of new pharmaceutical compounds.