From the course: AI Data Strategy: Data Procurement and Storage
Unlock this course with a free trial
Join today to access over 24,800 courses taught by industry experts.
Best practices for sourcing unstructured data
From the course: AI Data Strategy: Data Procurement and Storage
Best practices for sourcing unstructured data
- [Instructor] In the last video, you saw how to source structured data for ML models, but generative AI brings us into a whole new world, one where we need vast amounts of unstructured data, like text, images, and code. Apart from the sheer volume, the challenge with generative AI is about finding diverse high quality data that you need to help your models learn and consequently generate meaningful content. Teams often think they need to collect every bit of data they can find, but having the biggest data set doesn't guarantee success. It's about having the right kind of data diversity. Think about a large language model like GPT. It's not just trained on perfectly written books and articles. It must also understand how people actually communicate, everything from formal documents to casual conversations, technical manuals to creative writing. This diversity is what allows the model to generate appropriate responses in different contexts. Remember the star framework we discussed…
Contents
-
-
-
-
(Locked)
Sourcing structured data for ML-driven AI products6m 50s
-
(Locked)
Best practices for sourcing unstructured data4m 32s
-
(Locked)
Understanding bias in traditional ML systems6m 42s
-
Bias in generative AI: Challenges and mitigation strategies6m 19s
-
(Locked)
Framework for bias mitigation in AI4m 2s
-
(Locked)
Building intelligent systems with data protection5m 13s
-
(Locked)
Open data platforms: Democratizing AI development5m 1s
-
(Locked)
Leveraging APIs for AI6m 45s
-
(Locked)
Building sustainable data ecosystems5m 3s
-
(Locked)
-
-
-