Published on 03 July 2024, updated 03 July 2024
ISBN-10: 0738461539
ISBN-13: 9780738461533
IBM Form #: SG24-8563-00
Authors: Daniel Parkes, Daniel Parkes, Daniel Dominguez Cuadrado, Franck Malterre, Jean-Charles (JC) Lopez, Jun Liu, Kyle Bader, Poyraz Sagtekin, Tushar Agrawal and Vasfi Gucer
IBM watsonx.data empowers enterprises to scale their analytics and AI capabilities with a purpose-built data store, leveraging an open lakehouse architecture. Through its robust querying, governance, and open data formats, IBM watsonx.data facilitates seamless data access and sharing. With IBM watsonx.data, you can swiftly connect to data, extract actionable insights, and optimize data warehouse expenses.
IBM Storage Ceph offers high-performance, scalable object storage that can handle large datasets efficiently. IBM watsonx.data and IBM Storage Ceph can be a powerful combination for building a scalable and cost-effective data lakehouse solution.
This IBM Redbooks® publication dives into how the IBM Storage Ceph robust storage capabilities and the IBM watsonx.data advanced analytics features come together to form a powerful data and AI platform. This platform empowers you to unlock valuable insights from your data and make data-driven decisions.
Using a real-world scenario involving an Amazon Simple Storage Service (S3) data lake, you can explore each stage of the data pipeline (ingestion, transformation, and consumption) with step-by-step instructions and hands-on examples. All code samples that are created for the book's scenarios are available at the Redbooks GitHub repository.
The target audience for this book is data architects, data analysts, data engineers, and data scientists working on creating AI and data lakehouse solutions.
Chapter 1. Introduction
Chapter 2. The modern data lake architecture
Chapter 3. Building a scalable data lake with Ceph Object Storage
Chapter 4. Replacing Hadoop Distributed File System with IBM Storage Ceph Object Storage
Chapter 5. IBM Storage Ceph with IBM watsonx.data
Chapter 6. Retail use case
Chapter 7. Ingest: Landing and raw zones
Chapter 8. Transform: Staging and curated zones
Chapter 9. Consume
Appendix A. Configuring RADOS Gateway by using the IBM Storage Ceph GUI
Appendix B. Configuring the command-line tools for IBM watsonx.data
Appendix C. Additional material