Skip to main content

Unlocking Data Insights and AI: IBM Storage Ceph as a Data Lakehouse Platform for IBM watsonx.data and Beyond

An IBM Redbooks publication

thumbnail 

Published on 03 July 2024, updated 03 July 2024

  1. .PDF (7.0 MB)

 Order hardcopy
Share this page:   

ISBN-10: 0738461539
ISBN-13: 9780738461533
IBM Form #: SG24-8563-00


Authors: Daniel Parkes, Daniel Parkes, Daniel Dominguez Cuadrado, Franck Malterre, Jean-Charles (JC) Lopez, Jun Liu, Kyle Bader, Poyraz Sagtekin, Tushar Agrawal and Vasfi Gucer

    menu icon

    Abstract

    IBM watsonx.data empowers enterprises to scale their analytics and AI capabilities with a purpose-built data store, leveraging an open lakehouse architecture. Through its robust querying, governance, and open data formats, IBM watsonx.data facilitates seamless data access and sharing. With IBM watsonx.data, you can swiftly connect to data, extract actionable insights, and optimize data warehouse expenses.

    IBM Storage Ceph offers high-performance, scalable object storage that can handle large datasets efficiently. IBM watsonx.data and IBM Storage Ceph can be a powerful combination for building a scalable and cost-effective data lakehouse solution.

    This IBM Redbooks® publication dives into how the IBM Storage Ceph robust storage capabilities and the IBM watsonx.data advanced analytics features come together to form a powerful data and AI platform. This platform empowers you to unlock valuable insights from your data and make data-driven decisions.

    Using a real-world scenario involving an Amazon Simple Storage Service (S3) data lake, you can explore each stage of the data pipeline (ingestion, transformation, and consumption) with step-by-step instructions and hands-on examples. All code samples that are created for the book's scenarios are available at the Redbooks GitHub repository.

    The target audience for this book is data architects, data analysts, data engineers, and data scientists working on creating AI and data lakehouse solutions.

    Table of Contents

    Chapter 1. Introduction

    Chapter 2. The modern data lake architecture

    Chapter 3. Building a scalable data lake with Ceph Object Storage

    Chapter 4. Replacing Hadoop Distributed File System with IBM Storage Ceph Object Storage

    Chapter 5. IBM Storage Ceph with IBM watsonx.data

    Chapter 6. Retail use case

    Chapter 7. Ingest: Landing and raw zones

    Chapter 8. Transform: Staging and curated zones

    Chapter 9. Consume

    Appendix A. Configuring RADOS Gateway by using the IBM Storage Ceph GUI

    Appendix B. Configuring the command-line tools for IBM watsonx.data

    Appendix C. Additional material