Chapter
16: Normalizer Transformation:
>
Transformation type: Active and connected.
>
The Normalizer transformation receives a row that contains
multiple-occurring columns and returns a row for each instance of the
multiple-occurring data. Thus it takes normalized data and produces
normalized data in the output. Thus number of rows in the output
generally increases and this makes it an active transformation.
>
When we say multiple occurring, it could be single column occurring
mulitple times or a group of columns occurring multiple times.
>
Normalizer can be used to deal with
- multiple-occurring columns and
- multiple record types created using redefines.
>
The Normalizer transformation parses multiple-occurring columns. They
can come due to Occurs clause in cobol file layout or denormalized
data from relational source.
For
ex: You might have a relational table that stores four quarters
of sales by store.
The
following source rows contain four quarters of sales by store:
Store1
100 300 500 700
Store2
250 450 650 850
The
Normalizer returns a row for each store and sales combination. It
also returns an index that identifies the quarter number:
Store1
100 1
Store1
300 2
Store1
500 3
Store1
700 4
Store2
250 1
Store2
450 2
Store2
650 3
Store2
850 4
>
For
each source row the integration service generates a key for each
source row.
The
Integration Service increments the generated key sequence number each
time it processes a source row. When the source row contains a
multiple-occurring column or a multiple-occurring group of columns,
the Normalizer transformation returns a row for each occurrence. But
each of the row will have the same sequence number. This is called
Generated key
>
In the output all the single occurring columns in the input,
duplicates are created.
Store1
and Store2 getting repeated in the output.
>
Types of normalizer transformation:
♦
VSAM Normalizer transformation.
A non-reusable transformation that is a Source Qualifier
transformation for a COBOL source. The
column attributes are read-only.
The
VSAM Normalizer receives a multiple-occurring source column through
one input port.
The
normalizer columns are created from the COBOL source.
♦
Pipeline Normalizer
transformation. A
transformation that processes multiple-occurring data from
relational tables or flat files. The
pipeline Normalizer transformation represents multiple-occurring
columns with one input port for each source column occurrence.
The
normalizer columns are to be created manually.
Normalizer
transformation tabs:
♦ Transformation.
Enter the name and description of the transformation. The naming
convention for an Normalizer transformation is
NRM_TransformationName.
♦ Ports.
View the transformation ports and attributes.
♦
Properties.
♦
Normalizer.
Define the structure of the source data. The Normalizer tab defines
source data as columns and groups of columns.
♦
Metadata Extensions.
1)
PORTS tab:
- With normalizer we configure columns in the normalizer tab and the designer creates the ports which we can view on the ports tab.
- Pipeline and VSAM Normalizer transformations represent multiple-occurring source columns differently.
>
A VSAM Normalizer transformation has one input port for a
multiple-occurring column.
>
A pipeline Normalizer transformation has multiple input ports for a
multiple-occurring column.
- When a source column is multiple-occurring, the pipeline and VSAM Normalizer transformations have one output port for the column. The transformation returns a row for each source column occurrence.
- The Normalizer transformation has a generated column ID (GCID) port for each multiple-occurring column.
The
generated column ID is an index for the instance of the
multiple-occurring data. For example, if a column occurs four times
in a source record, the Normalizer returns a value of 1, 2, 3, or 4
in the generated column ID based on which instance of the
multiple-occurring data occurs in the row.
The
naming convention for the Normalizer generated column ID is GCID_<
o ccuring_field_name >.
- The Normalizer transformation has at least one generated key port. The Integration Service increments the generated key sequence number each time it processes a source row.
- Changing ports: You can change the ports on a pipeline Normalizer transformation by editing the columns on the Normalizer tab. To change a VSAM Normalizer transformation, you need to change the COBOL source and recreate the transformation. This means the ports for the cobol source normalizer are read only.
2)
PROPERTIES port:
We generally configure following
normalizer transformation properties:
Reset:
At the end of a session, resets the value sequence for each generated
key
value
to the value it was before the session.
Restart:
Starts the generated key sequence at 1. Each time you run a session,
the key sequence value starts at 1 and overrides the sequence value
on the
Ports
tab.
Tracing
Level:
Sets the amount of detail included in the session log when you run a
session containing this transformation.
3)
Normalizer Tab:
The
Normalizer tab defines the structure of the source data. The
Normalizer tab defines source data as columns and groups of columns.
The
column level number identifies groups of columns in the data. Level
numbers define a data hierarchy.
Quarterly_Data
is a group-level column. It is Level 1. The Quarterly_Data group
occurs four times in each row. Sales_by_Quarter and
Returns_by_Quarter are Level 2 columns and belong to the group.
Each
column has an Occurs attribute. The Occurs attribute identifies
columns or groups of columns that occur more than once in a source
row.
Editing
normalizer columns: When you create a pipeline Normalizer
transformation, you can edit the columns. When you create a VSAM
Normalizer transformation, the Normalizer tab is read-only.
Level
Group columns: Columns in the same group occur beneath a column
with a lower level number. When each column is the same level, the
transformation contains no column groups.
Occurs:
The number of instances of a column or group of columns in the source
row.
Normalizer
Transformation Generated Keys:
>
The Normalizer transformation returns at
least one generated key column
(note this is different than GCID
column) in the output
row. The Integration Service increments the generated key sequence
number each time it processes a source row.
>
When you create a Normalizer transformation, the generated key value
is 1 by default. The naming convention for the Normalizer generated
key is GK_< redefined_field_name >.
>
You can view the current generated key values on the Normalizer
transformation Ports tab. At the end of each session, the Integration
Service updates the generated key value in the Normalizer
transformation to the last
value
generated for the session plus one. The maximum generated key value
is 9,223,372,036,854,775,807.
>
You cannot change the current generated key values on the Normalizer
transformation Ports tab.
How
to change the Generated Key Values ?
We
cannot directly edit the value of the GK, however we can change the
generated key value in the following ways:
♦
Reset the generated key sequence:
When you reset the generated key sequence, the Integration Service
resets the generated key start value back to the value it was before
the session. Can be set on the properties tab.
♦
Restart the generated key
sequence: When you restart the
generated key sequence, the Integration Service starts the generated
key sequence at 1 the next time it runs a session.
When
you restart the generated key sequence, the generated key start value
does not change in the Normalizer transformation until you run a
session. When you run the session, the Integration Service overrides
the sequence number value on the Ports tab.
When
you reset or restart the generated key sequence, the reset or restart
affects the generated key sequence values the next time you run a
session. You do not change the current generated key sequence values
in the
Normalizer
transformation. When you reset or restart the generated key sequence,
the option is enabled for every session until you disable the option.
VSAM
Normalizer Transformation: COBOL source
>
The VSAM Normalizer transformation is basically the
source qualifier for a COBOL source definition.
>
A COBOL source is a flat file that can contain multiple-occurring
data and multiple types of records in the same file.
A
COBOL source definition can have an OCCURS statement that defines a
multiple-occurring column.
The
COBOL source definition can also contain a REDEFINES statement to
define more than one type of record in the file.
>
Assume we have a file structure as follows:
01
SALES_RECORD.
03
HDR_DATA.
05
HDR_REC_TYPE PIC X.
05
HDR_STORE PIC X(02).
03
STORE_DATA.
05
STORE_NAME PIC X(30).
05
STORE_ADDR1 PIC X(30).
05
STORE_CITY PIC X(30).
03
DETAIL_DATA REDEFINES STORE_DATA.
05
DETAIL_ITEM PIC 9(9).
05
DETAIL_DESC PIC X(30).
05
DETAIL_PRICE PIC 9(4)V99.
05
DETAIL_QTY PIC 9(5).
05
SUPPLIER_INFO OCCURS 4 TIMES .
10
SUPPLIER_CODE PIC XX.
10
SUPPLIER_NAME PIC X(8).
The
first three characters of each sales record is the header. The header
includes a record type and a store ID. The value of Hdr_Rec_Type
defines whether the rest of the record contains store information or
merchandise information.
The
Sales_Rec, Hdr_Data, Store_Data, Detail_Data, and Supplier_Info
columns are group- level columns that identify groups of lower level
data. Group-level columns have a length of zero because they contain
no data. None of these columns are output ports in the source
definition.
The
Normalizer transformation contains at least one generated
key output port. When the COBOL
source has multiple-occurring columns, the Normalizer
transformation has a generated column ID output port.
No
of redefines = number of GK( if no redefine then atleast one GK)
NO
of occurs = number of GCIDs
The
COBOL source row might contain the following data:
Item1
ItemDesc 100 25 A Supplier1 B Supplier2 C Supplier3 D Supplier4
The
Normalizer returns the following detail data rows from the COBOL
source row:
Item1
ItemDesc 100 25 A Supplier1 1 1
Item1
ItemDesc 100 25 B Supplier2 1 2
Item1
ItemDesc 100 25 C Supplier3 1 3
Item1
ItemDesc 100 25 D Supplier4 1 4
Each
output row contains a generated key and a column ID. The Integration
Service updates the generated key value when it processes a new
source row. In the detail data rows, the generated key value is 1.
The
column ID defines the Supplier_Info column occurrence number. The
Integration Service updates the column ID for each occurrence of the
Supplier_Info. The column ID values are 1, 2, 3, 4 in the detail data
rows.
VSAM
Normalizer Ports Tab:
The
VSAM Normalizer Ports tab shows the transformation input and output
ports.
The
transformation does not have input or output ports for group level
columns.
VSAM
Normalizer Tab:
When
you create a VSAM Normalizer transformation, the Mapping Designer
creates the columns from a COBOL source. The Normalizer tab displays
the same information as the COBOL source definition. You cannot edit
the columns on a VSAM Normalizer tab.
Attribute
Description
POffs:
Physical offset. Location of the field in the file. The first byte in
the file is zero.
Plen:
Physical length. Number of bytes in the field.
Column
Name: Name of the source field.
Level:
Provides column group hierarchy.
Occurs:
The number of instances of a column or group of columns in the source
row.
Datatype:
The transformation datatype can be String, Nstring, or Number.
Prec:
Precision. Length of the column.
Scale:
Number of decimal positions for a numeric column.
Picture:
How the data is stored or displayed in the source. Picture 99V99
defines a numeric field with two implied decimals. Picture X(10)
indicates ten characters.
Usage:
COBOL data storage format such as COMP, BINARY, and COMP-3.
Key
Type: Type of key constraint to apply to this field. When you
configure a field as a primary key, the Integration Service generates
unique numeric IDs for this field when running a session with a COBOL
source.
Signed
(S) Indicates whether numeric values are signed.
Trailing
Sign (T) Indicates that the sign (+ or -) exists in the last
digit of the field. If not enabled, the sign appears as the first
character in the field.
Included
Sign (I) Indicates whether the sign is included in any value
appearing in the field.
Real
Decimal Point (R)
Indicates
whether the decimal point is a period (.) or the decimal point is
represented by the V character in a numeric field.
Redefines
Indicates that the column REDEFINES another column.
Steps
to create VSAM Normalizer transformation (.ie for COBOL source):
When
you create a VSAM Normalizer transformation, you drag a COBOL source
into a mapping and the Mapping Designer creates the transformation
columns from the source. The Normalizer transformation is the source
qualifier for the COBOL source in the mapping.
The
Mapping Designer identifies nested records and multiple-occurring
fields in the COBOL source. It creates the columns and ports in the
Normalizer transformation from the source columns.
To
create a VSAM Normalizer transformation:
1.
In the Mapping Designer, create a new mapping or open an existing
mapping.
2.
Drag a COBOL source definition into the mapping.
The
Designer adds a Normalizer transformation and connects it to the
COBOL source definition.
3.
If the Create Normalizer Transformation dialog box appears, you can
choose from the following options:
♦ VSAM
Source. Create a transformation from the COBOL source definition in
the mapping.
♦ Pipeline.
Create a transformation, but do not define columns from a COBOL
source.
To
create the VSAM Normalizer transformation, select the VSAM Normalizer
transformation option. The dialog box displays the name of the COBOL
source definition in the mapping. Select the COBOL source definition
and click OK.
Pipeline
Normalizer Transformation:
When
you create a pipeline Normalizer transformation, you define the
columns
based on the data the transformation receives from another type of
transformation such as a Source Qualifier transformation. The
Designer creates the input and output Normalizer transformation ports
from the columns you define
Above
shows the Normalizer transformation columns for a transformation that
receives four sales columns in each relational source row.
Pipeline
Normalizer Ports Tab:
The
pipeline Normalizer Ports tab displays the input and output ports for
the transformation. It has one input/output port for each
single-occurring column you define in the transformation. It has one
port for each occurrence of a multiple-occurring column.
To
change the ports in a pipeline Normalizer transformation, modify the
columns in the Normalizer tab.
Pipeline
Normalizer Tab
When
you create a pipeline Normalizer transformation, you define the
columns on the Normalizer tab. The Designer creates input and output
ports based on the columns you enter on the Normalizer tab.
Steps
to Create a Pipeline Normalizer Transformation:
When
you create a pipeline Normalizer transformation, you define the
columns on the Normalizer tab.
1.
In the Transformation Developer or the Mapping Designer, click
Transformation > Create. Select Normalizer transformation. Enter a
name for the Normalizer transformation.
The
naming convention for Normalizer transformations is
NRM_TransformationName .
- Click Create and click Done.
- Open the Normalizer transformation and click the Normalizer tab.
- Click Add to add a new column. The Designer creates a new column with default attributes. You can change the name, datatype, precision, and scale.
- To create a multiple-occurring column, enter the number of occurrences in the Occurs column.
- To create a group of multiple-occurring columns, enter at least one of the columns on the Normalizer tab. Select the column. Click Level.
- You can change the column level for other columns to add them to the same group. Select a column and click Level to change it to the same level as the column above it. Columns in the same group must appear sequentially in the Normalizer tab
- Change the occurrence at the group level to make the group of columns multiple-occurring.
- Click Apply to save the columns and create input and output ports.The Designer creates the Normalizer transformation input and output ports. In addition, the Designer creates the generated key columns and a column ID for each multiple-occurring column or group of columns.
- Select the Properties tab to change the tracing level or reset the generated key sequence numbers after the next session.
Note
that we cannot drag columns to the normalizer transformation. We need
to
create
ports on the normalizer tab and then input-output ports automatically
created on the ports tab.
Example
of Using a Normalizer Transformation in a Mapping:
When
a Normalizer transformation receives more than one type of data from
a COBOL source, you need to connect the Normalizer output ports to
different targets based on the type of data in each row.
Ex:
The Sales_File source record contains either store information or
information about items that a store sells.
The
COBOL source definition and the Normalizer transformation have
columns that represent fields in both types of records. You need to
filter the store rows from the item rows and pass them to different
targets.
The
Hdr_Rec_Type defines whether the record contains store or merchandise
data. When the Hdr_Rec_Type value is “S,” the record contains
Store_Data. When the Hdr_Rec_Type is “D,” the record contains
Detail_Data. Detail_Data always includes four occurrences of
Supplier_Info fields.
To
filter data, connect the Normalizer output rows to a Router
transformation to route the store, item, and supplier data to
different targets
The
Normalizer transformation creates a generated key when the COBOL
source contains a group of multiple-occurring columns. You can pass a
group of multiple-occurring columns to a different target than the
other columns in the row.
The
Normalizer transformation generates a GK_Detail_Sales key for each
source row. The GK_Detail_Sales key represents one Detail_Record
source row.
Important
notes:
When
you create ports manually, add them on the Normalizer tab in the
transformation, not the Ports tab.
No comments:
Post a Comment