Informatica: Normalizer transformation

Chapter 16: Normalizer Transformation:

> Transformation type: Active and connected.

> The Normalizer transformation receives a row that contains multiple-occurring columns and returns a row for each instance of the multiple-occurring data. Thus it takes normalized data and produces normalized data in the output. Thus number of rows in the output generally increases and this makes it an active transformation.

> When we say multiple occurring, it could be single column occurring mulitple times or a group of columns occurring multiple times.

> Normalizer can be used to deal with

multiple-occurring columns and
multiple record types created using redefines.

> The Normalizer transformation parses multiple-occurring columns. They can come due to Occurs clause in cobol file layout or denormalized data from relational source.

For ex: You might have a relational table that stores four quarters of sales by store.

The following source rows contain four quarters of sales by store:

Store1 100 300 500 700

Store2 250 450 650 850

The Normalizer returns a row for each store and sales combination. It also returns an index that identifies the quarter number:

Store1 100 1

Store1 300 2

Store1 500 3

Store1 700 4

Store2 250 1

Store2 450 2

Store2 650 3

Store2 850 4

> For each source row the integration service generates a key for each source row.

The Integration Service increments the generated key sequence number each time it processes a source row. When the source row contains a multiple-occurring column or a multiple-occurring group of columns, the Normalizer transformation returns a row for each occurrence. But each of the row will have the same sequence number. This is called Generated key

> In the output all the single occurring columns in the input, duplicates are created.

Store1 and Store2 getting repeated in the output.

> Types of normalizer transformation:

♦ VSAM Normalizer transformation. A non-reusable transformation that is a Source Qualifier transformation for a COBOL source. The column attributes are read-only. The VSAM Normalizer receives a multiple-occurring source column through one input port.

The normalizer columns are created from the COBOL source.

♦ Pipeline Normalizer transformation. A transformation that processes multiple-occurring data from relational tables or flat files. The pipeline Normalizer transformation represents multiple-occurring columns with one input port for each source column occurrence.

The normalizer columns are to be created manually.

Normalizer transformation tabs:

♦ Transformation. Enter the name and description of the transformation. The naming convention for an Normalizer transformation is NRM_TransformationName.

♦ Ports. View the transformation ports and attributes.

♦ Properties.

♦ Normalizer. Define the structure of the source data. The Normalizer tab defines source data as columns and groups of columns.

♦ Metadata Extensions.

1) PORTS tab:

With normalizer we configure columns in the normalizer tab and the designer creates the ports which we can view on the ports tab.

Pipeline and VSAM Normalizer transformations represent multiple-occurring source columns differently.

> A VSAM Normalizer transformation has one input port for a multiple-occurring column.

> A pipeline Normalizer transformation has multiple input ports for a multiple-occurring column.

When a source column is multiple-occurring, the pipeline and VSAM Normalizer transformations have one output port for the column. The transformation returns a row for each source column occurrence.

The Normalizer transformation has a generated column ID (GCID) port for each multiple-occurring column.

The generated column ID is an index for the instance of the multiple-occurring data. For example, if a column occurs four times in a source record, the Normalizer returns a value of 1, 2, 3, or 4 in the generated column ID based on which instance of the multiple-occurring data occurs in the row.

The naming convention for the Normalizer generated column ID is GCID_< o ccuring_field_name >.

The Normalizer transformation has at least one generated key port. The Integration Service increments the generated key sequence number each time it processes a source row.

Changing ports: You can change the ports on a pipeline Normalizer transformation by editing the columns on the Normalizer tab. To change a VSAM Normalizer transformation, you need to change the COBOL source and recreate the transformation. This means the ports for the cobol source normalizer are read only.

2) PROPERTIES port:

We generally configure following normalizer transformation properties:

Reset: At the end of a session, resets the value sequence for each generated key

value to the value it was before the session.

Restart: Starts the generated key sequence at 1. Each time you run a session, the key sequence value starts at 1 and overrides the sequence value on the

Ports tab.

Tracing Level: Sets the amount of detail included in the session log when you run a session containing this transformation.

3) Normalizer Tab:

The Normalizer tab defines the structure of the source data. The Normalizer tab defines source data as columns and groups of columns.

The column level number identifies groups of columns in the data. Level numbers define a data hierarchy.

Quarterly_Data is a group-level column. It is Level 1. The Quarterly_Data group occurs four times in each row. Sales_by_Quarter and Returns_by_Quarter are Level 2 columns and belong to the group.

Each column has an Occurs attribute. The Occurs attribute identifies columns or groups of columns that occur more than once in a source row.

Editing normalizer columns: When you create a pipeline Normalizer transformation, you can edit the columns. When you create a VSAM Normalizer transformation, the Normalizer tab is read-only.

Level Group columns: Columns in the same group occur beneath a column with a lower level number. When each column is the same level, the transformation contains no column groups.

Occurs: The number of instances of a column or group of columns in the source row.

Normalizer Transformation Generated Keys:

> The Normalizer transformation returns at least one generated key column (note this is different than GCID column) in the output row. The Integration Service increments the generated key sequence number each time it processes a source row.

> When you create a Normalizer transformation, the generated key value is 1 by default. The naming convention for the Normalizer generated key is GK_< redefined_field_name >.

> You can view the current generated key values on the Normalizer transformation Ports tab. At the end of each session, the Integration Service updates the generated key value in the Normalizer transformation to the last

value generated for the session plus one. The maximum generated key value is 9,223,372,036,854,775,807.

> You cannot change the current generated key values on the Normalizer transformation Ports tab.

How to change the Generated Key Values ?

We cannot directly edit the value of the GK, however we can change the generated key value in the following ways:

♦ Reset the generated key sequence: When you reset the generated key sequence, the Integration Service resets the generated key start value back to the value it was before the session. Can be set on the properties tab.

♦ Restart the generated key sequence: When you restart the generated key sequence, the Integration Service starts the generated key sequence at 1 the next time it runs a session.

When you restart the generated key sequence, the generated key start value does not change in the Normalizer transformation until you run a session. When you run the session, the Integration Service overrides the sequence number value on the Ports tab.

When you reset or restart the generated key sequence, the reset or restart affects the generated key sequence values the next time you run a session. You do not change the current generated key sequence values in the

Normalizer transformation. When you reset or restart the generated key sequence, the option is enabled for every session until you disable the option.

VSAM Normalizer Transformation: COBOL source

> The VSAM Normalizer transformation is basically the source qualifier for a COBOL source definition.

> A COBOL source is a flat file that can contain multiple-occurring data and multiple types of records in the same file.

A COBOL source definition can have an OCCURS statement that defines a multiple-occurring column.

The COBOL source definition can also contain a REDEFINES statement to define more than one type of record in the file.

> Assume we have a file structure as follows:

01 SALES_RECORD.

03 HDR_DATA.

05 HDR_REC_TYPE PIC X.

05 HDR_STORE PIC X(02).

03 STORE_DATA.

05 STORE_NAME PIC X(30).

05 STORE_ADDR1 PIC X(30).

05 STORE_CITY PIC X(30).

03 DETAIL_DATA REDEFINES STORE_DATA.

05 DETAIL_ITEM PIC 9(9).

05 DETAIL_DESC PIC X(30).

05 DETAIL_PRICE PIC 9(4)V99.

05 DETAIL_QTY PIC 9(5).

05 SUPPLIER_INFO OCCURS 4 TIMES .

10 SUPPLIER_CODE PIC XX.

10 SUPPLIER_NAME PIC X(8).

The first three characters of each sales record is the header. The header includes a record type and a store ID. The value of Hdr_Rec_Type defines whether the rest of the record contains store information or merchandise information.

The Sales_Rec, Hdr_Data, Store_Data, Detail_Data, and Supplier_Info columns are group- level columns that identify groups of lower level data. Group-level columns have a length of zero because they contain no data. None of these columns are output ports in the source definition.

The Normalizer transformation contains at least one generated key output port. When the COBOL source has multiple-occurring columns, the Normalizer transformation has a generated column ID output port.

No of redefines = number of GK( if no redefine then atleast one GK)

NO of occurs = number of GCIDs

The COBOL source row might contain the following data:

Item1 ItemDesc 100 25 A Supplier1 B Supplier2 C Supplier3 D Supplier4

The Normalizer returns the following detail data rows from the COBOL source row:

Item1 ItemDesc 100 25 A Supplier1 1 1

Item1 ItemDesc 100 25 B Supplier2 1 2

Item1 ItemDesc 100 25 C Supplier3 1 3

Item1 ItemDesc 100 25 D Supplier4 1 4

Each output row contains a generated key and a column ID. The Integration Service updates the generated key value when it processes a new source row. In the detail data rows, the generated key value is 1.

The column ID defines the Supplier_Info column occurrence number. The Integration Service updates the column ID for each occurrence of the Supplier_Info. The column ID values are 1, 2, 3, 4 in the detail data rows.

VSAM Normalizer Ports Tab:

The VSAM Normalizer Ports tab shows the transformation input and output ports.

The transformation does not have input or output ports for group level columns.

VSAM Normalizer Tab:

When you create a VSAM Normalizer transformation, the Mapping Designer creates the columns from a COBOL source. The Normalizer tab displays the same information as the COBOL source definition. You cannot edit the columns on a VSAM Normalizer tab.

Attribute Description

POffs: Physical offset. Location of the field in the file. The first byte in the file is zero.

Plen: Physical length. Number of bytes in the field.

Column Name: Name of the source field.

Level: Provides column group hierarchy.

Occurs: The number of instances of a column or group of columns in the source row.

Datatype: The transformation datatype can be String, Nstring, or Number.

Prec: Precision. Length of the column.

Scale: Number of decimal positions for a numeric column.

Picture: How the data is stored or displayed in the source. Picture 99V99 defines a numeric field with two implied decimals. Picture X(10) indicates ten characters.

Usage: COBOL data storage format such as COMP, BINARY, and COMP-3.

Key Type: Type of key constraint to apply to this field. When you configure a field as a primary key, the Integration Service generates unique numeric IDs for this field when running a session with a COBOL source.

Signed (S) Indicates whether numeric values are signed.

Trailing Sign (T) Indicates that the sign (+ or -) exists in the last digit of the field. If not enabled, the sign appears as the first character in the field.

Included Sign (I) Indicates whether the sign is included in any value appearing in the field.

Real Decimal Point (R)

Indicates whether the decimal point is a period (.) or the decimal point is represented by the V character in a numeric field.

Redefines Indicates that the column REDEFINES another column.

Steps to create VSAM Normalizer transformation (.ie for COBOL source):

When you create a VSAM Normalizer transformation, you drag a COBOL source into a mapping and the Mapping Designer creates the transformation columns from the source. The Normalizer transformation is the source qualifier for the COBOL source in the mapping.

The Mapping Designer identifies nested records and multiple-occurring fields in the COBOL source. It creates the columns and ports in the Normalizer transformation from the source columns.

To create a VSAM Normalizer transformation:

1. In the Mapping Designer, create a new mapping or open an existing mapping.

2. Drag a COBOL source definition into the mapping.

The Designer adds a Normalizer transformation and connects it to the COBOL source definition.

3. If the Create Normalizer Transformation dialog box appears, you can choose from the following options:

♦ VSAM Source. Create a transformation from the COBOL source definition in the mapping.

♦ Pipeline. Create a transformation, but do not define columns from a COBOL source.

To create the VSAM Normalizer transformation, select the VSAM Normalizer transformation option. The dialog box displays the name of the COBOL source definition in the mapping. Select the COBOL source definition and click OK.

Pipeline Normalizer Transformation:

When you create a pipeline Normalizer transformation, you define the

columns based on the data the transformation receives from another type of transformation such as a Source Qualifier transformation. The Designer creates the input and output Normalizer transformation ports from the columns you define

Above shows the Normalizer transformation columns for a transformation that receives four sales columns in each relational source row.

Pipeline Normalizer Ports Tab:

The pipeline Normalizer Ports tab displays the input and output ports for the transformation. It has one input/output port for each single-occurring column you define in the transformation. It has one port for each occurrence of a multiple-occurring column.

To change the ports in a pipeline Normalizer transformation, modify the columns in the Normalizer tab.

Pipeline Normalizer Tab

When you create a pipeline Normalizer transformation, you define the columns on the Normalizer tab. The Designer creates input and output ports based on the columns you enter on the Normalizer tab.

Steps to Create a Pipeline Normalizer Transformation:

When you create a pipeline Normalizer transformation, you define the columns on the Normalizer tab.

1. In the Transformation Developer or the Mapping Designer, click Transformation > Create. Select Normalizer transformation. Enter a name for the Normalizer transformation.

The naming convention for Normalizer transformations is NRM_TransformationName .

Click Create and click Done.
Open the Normalizer transformation and click the Normalizer tab.
Click Add to add a new column. The Designer creates a new column with default attributes. You can change the name, datatype, precision, and scale.
To create a multiple-occurring column, enter the number of occurrences in the Occurs column.
To create a group of multiple-occurring columns, enter at least one of the columns on the Normalizer tab. Select the column. Click Level.
You can change the column level for other columns to add them to the same group. Select a column and click Level to change it to the same level as the column above it. Columns in the same group must appear sequentially in the Normalizer tab
Change the occurrence at the group level to make the group of columns multiple-occurring.
Click Apply to save the columns and create input and output ports.

The Designer creates the Normalizer transformation input and output ports. In addition, the Designer creates the generated key columns and a column ID for each multiple-occurring column or group of columns.
Select the Properties tab to change the tracing level or reset the generated key sequence numbers after the next session.

Note that we cannot drag columns to the normalizer transformation. We need to

create ports on the normalizer tab and then input-output ports automatically created on the ports tab.

Example of Using a Normalizer Transformation in a Mapping:

When a Normalizer transformation receives more than one type of data from a COBOL source, you need to connect the Normalizer output ports to different targets based on the type of data in each row.

Ex: The Sales_File source record contains either store information or information about items that a store sells.

The COBOL source definition and the Normalizer transformation have columns that represent fields in both types of records. You need to filter the store rows from the item rows and pass them to different targets.

The Hdr_Rec_Type defines whether the record contains store or merchandise data. When the Hdr_Rec_Type value is “S,” the record contains Store_Data. When the Hdr_Rec_Type is “D,” the record contains Detail_Data. Detail_Data always includes four occurrences of Supplier_Info fields.

To filter data, connect the Normalizer output rows to a Router transformation to route the store, item, and supplier data to different targets

The Normalizer transformation creates a generated key when the COBOL source contains a group of multiple-occurring columns. You can pass a group of multiple-occurring columns to a different target than the other columns in the row.

The Normalizer transformation generates a GK_Detail_Sales key for each source row. The GK_Detail_Sales key represents one Detail_Record source row.

Important notes:

When you create ports manually, add them on the Normalizer tab in the transformation, not the Ports tab.

Informatica

Pages

Normalizer transformation

No comments:

Post a Comment