Data is the fuel for generative AI. Vast amounts of data and the cloud's crucial ability to store and process it at scale drove the rapid rise of powerful foundation models. If enterprises can corral their scattered data and make it all available, they can easily fine-tune these models or use retrieval augmented generation (RAG) to tailor them to their business needs.
However, the relationship between data and AI goes both ways. AI, too, can be used to improve and enhance your data and make it available for analysis.
While companies have invested heavily in data over the past few years, they often find that it hasn't been enough. The rise of AI has drawn attention to the gaps in their data and the difficulties in accessing or interpreting it. Data may be isolated in organisational silos; it could be incomplete or poor in quality, making it difficult to work with.
Below are three examples of using AI to fuel your data rather than vice-versa. Use cases like these may give you quick wins while also generating value from your data asset.
One of the most resource-intensive tasks in any data project, often consuming as much as 60-70% of the effort, is preparing and moving data to be used for analytics, also called the extract, transform, and load (ETL) processes. This is why AWS is working toward a zero-ETL future.
Fortunately, generative AI can be used to automatically analyse the source and target data structures, and then mapping one into the other. AWS' generative AI coding assistant, Amazon Q Developer, can build data integration pipelines using natural language. This not only reduces the time and effort required but also helps maintain consistency across different ETL processes, making ongoing support and maintenance easier.
Enterprises often have both structured (e.g., customer profiles and sales orders) and unstructured (e.g., social media or customer feedback) data held in a variety of data sources, formats, schemas, and types. The Amazon Q data integration in AWS Glue can generate ETL jobs for over 20 data common data sources, including PostgreSQL, MySQL, Oracle, Amazon Redshift, Snowflake, Google BigQuery, DynamoDB, MongoDB, and OpenSearch.
With generative AI for ETL and data pipelines, data engineers, analysts, and scientists can spend more time solving business problems and deriving insights from the data and less time laying out the plumbing. It is a generative AI use case that most enterprises can start right away.
We often speak of democratising data across an organisation, i.e., taking it out of the hands of the specialists and making it available to everyone. Data analysts and data scientists often find themselves burdened with large, complex projects, limiting their ability to deliver daily, actionable insights to everyone. A barrier to democratisation, however, is that not everyone has the skills to work rigorously and creatively with data.
With generative AI, you can interact with your data using conversational queries and natural language without having to wait for someone to build reports and dashboards to find information, reducing time to value. For instance, a retail executive can ask, "What were our top-performing product categories last quarter, and what factors contributed to their success?"
Regional supply-chain specialists at BMW Group, a global manufacturer of premium automobiles and motorcycles, have been using the generative AI assistant Amazon Q in QuickSight to swiftly respond to supply chain visibility requests from senior stakeholders, like board members.
Data has the power to influence change, but that requires compelling storytelling. Generative AI can make data easy to work with and enjoyable to use by creating visually appealing documents and presentations that bring the data to life. A side benefit is that it can help people across the organisation become more familiar with the data and its interpretation, making the data useful for more complex AI applications.
As enterprises advance in analytics and AI, many realise they lack the data needed to support their newly envisioned use cases. And acquiring third-party data can be prohibitively expensive. Moreover, in regulated industries like healthcare and financial services, where data privacy and security are paramount, using actual customer data may not be possible. Data required to test edge cases in business processes is often limited.
This is where AI-generated high-fidelity synthetic data can come into use for testing, training, and innovation. It mimics the statistical properties and patterns of real datasets while preserving privacy and eliminating sensitive information. It can also be used to augment data for AI model training where data is scarce or sensitive. Besides, executives can use synthetic data for scenario planning to model various business situations and test strategies to mitigate and reduce risk.
Merck, a global pharmaceutical company, uses synthetic data and AWS services to reduce false reject rates in their drug inspection process. The company has reduced its false reject rate by 50% by developing synthetic defect image data with tools like generative adversarial networks (deep learning models that pit two neural networks against each other to generate new synthetic data) and variational autoencoders (generative neural networks that compress data into a compact representation and then reconstruct it, learning to generate new data in the process).
AI-generated synthetic data can unleash innovation and help in creating delightful customer experiences. Amazon One is a fast and convenient service that allows customers to make payments, present the loyalty card, verify their age, and enter the venue using only their palm.
AWS needed a large dataset of palm images to train the system, including variations in lighting, hand poses, and conditions like the presence of a bandage. The team even trained the system to detect highly detailed silicone hand replicas using AI-generated synthetic data. Customers have already used Amazon One more than three million times with 99.9999% accuracy.
These three examples demonstrate how generative AI can be leveraged to unlock the potential of data, extracting value more quickly and demonstrating tangible wins with generative AI. From automating tedious data integration tasks to empowering business users with conversational analytics, generative AI can help teams work smarter, not harder. And by generating synthetic data for testing and innovation, one can fuel new ideas and capabilities that were previously out of reach. The key is not just to view your data as the fuel for generative AI, but also generative AI as a powerful new tool you can apply to your data.