Skip to main content

Unlocking Data Insights and AI: IBM Storage Ceph as a Data Lakehouse Platform for IBM watsonx.data and Beyond

An IBM Redbooks publication

thumbnail 

Published on 03 July 2024, updated 03 July 2024

  1. .PDF (7.0 MB)

 Order hardcopy
Share this page:   

ISBN-10: 0738461539
ISBN-13: 9780738461533
IBM Form #: SG24-8563-00


Authors: Daniel Parkes, Daniel Parkes, Daniel Dominguez Cuadrado, Franck Malterre, Jean-Charles (JC) Lopez, Jun Liu, Kyle Bader, Poyraz Sagtekin, Tushar Agrawal and Vasfi Gucer

menu icon

Abstract

IBM watsonx.data empowers enterprises to scale their analytics and AI capabilities with a purpose-built data store, leveraging an open lakehouse architecture. Through its robust querying, governance, and open data formats, IBM watsonx.data facilitates seamless data access and sharing. With IBM watsonx.data, you can swiftly connect to data, extract actionable insights, and optimize data warehouse expenses.

IBM Storage Ceph offers high-performance, scalable object storage that can handle large datasets efficiently. IBM watsonx.data and IBM Storage Ceph can be a powerful combination for building a scalable and cost-effective data lakehouse solution.

This IBM Redbooks® publication dives into how the IBM Storage Ceph robust storage capabilities and the IBM watsonx.data advanced analytics features come together to form a powerful data and AI platform. This platform empowers you to unlock valuable insights from your data and make data-driven decisions.

Using a real-world scenario involving an Amazon Simple Storage Service (S3) data lake, you can explore each stage of the data pipeline (ingestion, transformation, and consumption) with step-by-step instructions and hands-on examples. All code samples that are created for the book's scenarios are available at the Redbooks GitHub repository.

The target audience for this book is data architects, data analysts, data engineers, and data scientists working on creating AI and data lakehouse solutions.

Table of Contents

Chapter 1. Introduction

Chapter 2. The modern data lake architecture

Chapter 3. Building a scalable data lake with Ceph Object Storage

Chapter 4. Replacing Hadoop Distributed File System with IBM Storage Ceph Object Storage

Chapter 5. IBM Storage Ceph with IBM watsonx.data

Chapter 6. Retail use case

Chapter 7. Ingest: Landing and raw zones

Chapter 8. Transform: Staging and curated zones

Chapter 9. Consume

Appendix A. Configuring RADOS Gateway by using the IBM Storage Ceph GUI

Appendix B. Configuring the command-line tools for IBM watsonx.data

Appendix C. Additional material