Skip to main content

Unlocking Data Insights and AI: IBM Storage Ceph as a Data Lakehouse Platform for IBM watsonx.data and Beyond

A draft IBM Redbooks publication

thumbnail 

Last updated on 03 June 2024

  1. .PDF (7.2 MB)

Share this page:   

IBM Form #: SG24-8563-00


Authors: Daniel Parkes, Daniel Dominguez Cuadrado, Franck Malterre, Jean-Charles (JC) Lopez, Jun Liu, Kyle Bader, Poyraz Sagtekin, Tushar Agrawal and Vasfi Gucer

    menu icon

    Abstract

    IBM® watsonx.data empowers enterprises to scale their analytics and AI capabilities with a purpose-built data store, leveraging an open lakehouse architecture. Through its robust querying, governance, and open data formats, IBM watsonx.data facilitates seamless data access and sharing. With IBM watsonx.data, you can swiftly connect to data, extract actionable insights, and optimize data warehouse expenses.

    IBM Storage Ceph offers high-performance, scalable object storage that can handle large datasets efficiently. IBM watsonx.data and IBM Storage Ceph can be a powerful combination for building a scalable and cost-effective data lakehouse solution.

    This IBM Redbooks® publication dives into how IBM Storage Ceph's robust storage capabilities and IBM watsonx.data's advanced analytics features come together to form a powerful data and AI platform. This platform empowers you to unlock valuable insights from your data and make data-driven decisions.

    Using a real-world scenario involving an S3 data lake, you can explore each stage of the data pipeline - ingestion, transformation, and consumption - with step-by-step instructions and hands-on examples. All code samples created for the book's scenarios are available on the Redbooks GitHub repository.

    The target audience for this book is data architects, data analysts, data engineers and data scientists working on creating AI and data lakehouse solutions.

    For more information on the IBM Storage Ceph architecture and technology that is behind the product, see the IBM Storage Ceph Concepts and Architecture Guide, REDP-5721 IBM Redpaper.

    Table of Contents

    Chapter 1. Introduction

    Chapter 2. The modern data lake architecture

    Chapter 3. Building a scalable data lake with Ceph Object Storage

    Chapter 4. Replacing Hadoop Distributed File System (HDFS) with IBM Storage Ceph Object Storage

    Chapter 5. IBM Storage Ceph with IBM watsonx.data

    Chapter 6. Retail use case

    Chapter 7. Ingest: Landing and raw zones

    Chapter 8. Transform: Staging and curated zones

    Chapter 9. Consume

    Appendix A. Configuring RADOS Gateway using the IBM Storage Ceph graphical user interface

    Appendix B. Configuring the command line tools for watsonx.data

     

    Special Notices

    The material included in this document is in DRAFT form and is provided 'as is' without warranty of any kind. IBM is not responsible for the accuracy or completeness of the material, and may update the document at any time. The final, published document may not include any, or all, of the material included herein. Client assumes all risks associated with Client's use of this document.