Skip to main content

InfoSphere DataStage Parallel Framework Standard Practices

An IBM Redbooks publication

Note: This is publication is now archived. For reference only.

thumbnail 

Published on 30 July 2010, updated 12 February 2013

  1. .EPUB (8.6 MB)
  2. .PDF (6.9 MB)

Google Play BooksRead in Google Books Order hardcopy
Share this page:   

ISBN-10: 0738434477
ISBN-13: 9780738434476
IBM Form #: SG24-7830-00


Authors: Julius Lerm and Paul Christensen

menu icon

Abstract

In this IBM® Redbooks® publication, we present guidelines for the development of highly efficient and scalable information integration applications with InfoSphere™ DataStage® (DS) parallel jobs.

InfoSphere DataStage is at the core of IBM Information Server, providing components that yield a high degree of freedom. For any particular problem there might be multiple solutions, which tend to be influenced by personal preferences, background, and previous experience. All too often, those solutions yield less than optimal, and non-scalable, implementations.

This book includes a comprehensive detailed description of the components available, and descriptions on how to use them to obtain scalable and efficient solutions, for both batch and real-time scenarios.

The advice provided in this document is the result of the combined proven experience from a number of expert practitioners in the field of high performance information integration, evolved over several years.

This book is intended for IT architects, Information Management specialists, and Information Integration specialists responsible for delivering cost-effective IBM InfoSphere DataStage performance on all platforms.

Table of Contents

Chapter 1. Data integration with Information Server and DataStage

Chapter 2. Data integration overview

Chapter 3. Standards

Chapter 4. Job parameter and environment variable management

Chapter 5. Development guidelines

Chapter 6. Partitioning and collecting

Chapter 7. Sorting

Chapter 8. File Stage usage

Chapter 9. Transformation languages

Chapter 10. Combining data

Chapter 11. Restructuring data

Chapter 12. Performance tuning job designs

Chapter 13. Database Stage guidelines

Chapter 14. Connector Stage guidelines

Chapter 15. Batch data flow design

Chapter 16. Realtime data flow design

Appendix A. Runtime topologies for distributed transaction jobs

Appendix B. Standard practices summary

Appendix C. DataStage naming reference

Appendix D. Example job template

Appendix E. Understanding the parallel job score

Appendix F. Estimating the size of a parallel dataset

Appendix G. Environment variables reference

Appendix H. DataStage data types

 

Others who read this also read