IBM® watsonx.data empowers enterprises to scale their analytics and AI capabilities with a purpose-built data store, leveraging an open lakehouse architecture. Through its robust querying, governance, and open data formats, IBM watsonx.data facilitates seamless data access and sharing. With IBM watsonx.data, you can swiftly connect to data, extract actionable insights, and optimize data warehouse expenses.
IBM Storage Ceph offers high-performance, scalable object storage that can handle large datasets efficiently. IBM watsonx.data and IBM Storage Ceph can be a powerful combination for building a scalable and cost-effective data lakehouse solution.
This IBM Redbooks® publication dives into how IBM Storage Ceph's robust storage capabilities and IBM watsonx.data's advanced analytics features come together to form a powerful data and AI platform. This platform empowers you to unlock valuable insights from your data and make data-driven decisions.
Using a real-world scenario involving an S3 data lake, you can explore each stage of the data pipeline - ingestion, transformation, and consumption - with step-by-step instructions and hands-on examples. All code samples created for the book's scenarios are available on the Redbooks GitHub repository.
The target audience for this book is data architects, data analysts, data engineers and data scientists working on creating AI and data lakehouse solutions.
For more information on the IBM Storage Ceph architecture and technology that is behind the product, see the IBM Storage Ceph Concepts and Architecture Guide, REDP-5721 IBM Redpaper.
Chapter 1. Introduction
Chapter 2. The modern data lake architecture
Chapter 3. Building a scalable data lake with Ceph Object Storage
Chapter 4. Replacing Hadoop Distributed File System (HDFS) with IBM Storage Ceph Object Storage
Chapter 5. IBM Storage Ceph with IBM watsonx.data
Chapter 6. Retail use case
Chapter 7. Ingest: Landing and raw zones
Chapter 8. Transform: Staging and curated zones
Chapter 9. Consume
Appendix A. Configuring RADOS Gateway using the IBM Storage Ceph graphical user interface
Appendix B. Configuring the command line tools for watsonx.data
The material included in this document is in DRAFT form and is provided 'as is' without warranty of any kind. IBM is not responsible for the accuracy or completeness of the material, and may update the document at any time. The final, published document may not include any, or all, of the material included herein. Client assumes all risks associated with Client's use of this document.