Data Management and IBM IIS Tools

IIS Tool Suite Overview

As per a study conducted by a leading market research and advisory company the data that we have generated in the past two years is many times more than that we generated in over two decades. It has not just multiplied, but have also become complex, varied and is being generated at much more rate than it ever was. These factors present a data integration challenge to the industries and businesses to be able to better utilize their data for help building strategies, provide services, introduce policy regulations such that their business is empowered to bridge or completely meet the gap for that matter between data and analytics.

IIS Tools Suite

Interpreting the Suite

Following are the components listed in the suite:

  1. The tool comes with a built-in staging area which is a separate schema within metadata repository. It allows you to analyze the imported contents, fix any duplicate metadata or identity issue and then reimport. It also allows you to preview the imports before it is shared to the repository thereby giving you a complete confidence in the enterprise metadata shared.
  2. Metadata Asset Manager can only be accessed by users with appropriate metadata user privileges.

InfoSphere Information Analyzer

This tool can help you understand the quality of your data which in turn help you to gain confidence in your data. If your organization despite having a data integration is facing anomalies in data, thereby decreasing the quality of data and causing more and more exceptions, then it is possible that the business rules and the transformation rules have not been framed with a complete understanding of data. This is where Information Analyzer (IA) is useful.

  1. Can perform Column Analysis, Primary Key Analysis, Natural Key Analysis, Foreign-Key Analysis, Cross-domain Analysis and Base-Line Analysis. I will cover details for each of these types of analysis in another blog or you may reach out to us for a demo on request.
  2. Can generate various types of built-in reports based on the different analysis performed. These reports can be generated in different formats like HTML, pdf, excel, etc. and support different languages.
  3. The rules created in IA can directly be used in DataStage jobs.

InfoSphere FastTrack

This tool is the most underrated tool of the suite, but if properly utilized then it can catapult an integration project reducing the time taken by the IT to develop the code implementing a basic level of business checks and transformations. FastTrack eliminates tedious copy and paste from various documents that are maintained to provide specifications to a developer. It links varied information and makes it available on metadata repository for IIS rather than being shared in excels.

  1. Considerably reduce the time and effort for maintaining different versions of source-to-target mapping
  2. Acts as a self-service platform that allows users to use excel like logics and automatically create DataStage jobs without having coding skills in DataStage
  3. The jobs created can act as a starting point or as a template for a DataStage developer defining how to build applications

InfoSphere DataStage and QualityStage:

This platform is one of the most powerful integration platforms. The secret sauce of this is the parallelism and pipelining framework which can be either on-premise or on a cloud. It is highly flexible and scalable and provides integration heterogeneous sources, including Hadoop and Stream-based, on both distributed and mainframe platforms. IT provides end to end integration quickly and enables you to cleanse, transform, monitor and deliver data anytime anywhere.

  1. Packs a unique feature “DataStage Balanced Optimization” which helps to fully harness available resources by utilizing computing power to your relational databases.
  2. Supports direct integration with Amazon S3 storage to load data into and from the cloud. Also, you can integrate seamlessly with REST applications, web services, XML and JSON messages.
  3. Provides a common client tool for integration (DataStage) and data enrichment (QualityStage) helping your organization to integrate, match and standardize, cleanse, govern data quality exceptions.
  4. Offers end-to-end integration with other suite tools leverages shared metadata content, unified installation, and deployment.

InfoSphere Information Governance Catalog:

Referred as IGC for short this tool becomes the start of any enterprise-level data analysis and integration projects. IGC is a browser-based tool capable of providing a marquee of functionalities to an organization to be able to understand and govern its information.

  1. It allows you to attach rules created in IA to the assets in IGC thereby enhancing the governing process.
  2. It follows a workflow-like process for creation, revision and publishes any changes that are made therefore, all the users are aware of the changes and the data specialist, as well as business analyst, can review the changes to either publish or reject.
  3. With the separately licensed” InfoSphere Glossary Anywhere client” you can browse glossary content from any desktop without logging into IGC. This tool provides the capability to search glossary for single words or phrases from any text-based document opened in a Windows desktop. The license is bundled with REST API that has the same search capabilities.

Information Governance Dashboard

IGD is used to evaluate, assess and monitor the governance policies and rules assigned in IGC. You can also query and visualize various types of metadata cataloged by IIS products. The dashboard provides elements such as SQL views, set of predefined Cognos reports and workspaces and a Cognos framework manager that you can use to understand the classes and relationships that contribute to a report.

  1. The included Cognos reports can be used to assess governance progress and customize reports
  2. You can monitor exceptions by using links into Data Quality Exception Console and IBM Stewardship Center to review and process exception sets and correct the exception records

Information Services Director

ISD allows the users to rapidly deploy their Information server logic into services which are deployed locally in your application server. For example, you created a DataStage job that integrates data from various source systems and now an external application wants to consume that data for order entries. This can be easily achieved by ISD.

  1. It acts as an abstraction and encapsulation layer that hides all the complexities of the implementation from the service that consumes published data
  2. The integration service once enabled can be invoked by using a binding protocol like web service, REST or RSS.

InfoSphere Data Click

Building a data lake is an innovative way to utilize organizational data for better reporting and advanced analytics. Organizations that are trying to build a data lake (which is built over a Hadoop or other big data platform system) usually adopt a hybrid system that combines both Hadoop and relational databases. This requires some of the data/systems to be migrated to Hadoop system and here Data Click can help your organization. Data Click is a self-service tool that provides on-demand data integration; it is a separately installed add-on component of InfoSphere BigInsights.

  1. If the target is a Hadoop system, then Data Click automatically creates Hive tables for each table of the source in the target directory.
  2. Data Click can also be used for relational database migration projects as well, for e.g. from Oracle to SQL Server.

InfoSphere Stewardship Console and Data Quality Exception Console

IBM Stewardship Center is a browser-based tool that leverages the strength of IBM Business Process Manager. Stewardship Center provides a process designer view to configure the BPM workflows. Data Stewards have become increasingly responsible to improve the data quality to provide value to their data assets. Using Stewardship Center, data stewards can address the data quality and governance challenges.

  1. Stewardship is coupled with IGC and IA that include pre-built workflows to notify stewards of any asset activities and allow them to approve or reject rule changes
  2. The Data Quality Exception console can display the exceptions and exception records generated from any non-compliance with a rule in IA. All projects, jobs or rules that generate exceptions can send the exception to the console
  3. You can manage the priority of the quality issues in the exception console and manage those exception sets with different BPM process applications in Stewardship Center

Conclusion

IBM Information Infosphere offers the above tools to build data integration and management services for large-scale enterprises. These tools help you to understand, cleanse, transform and deliver unified, trusted enterprise-wide data to your critical business.

--

--

Developer and Designer

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store