Rank Transformation


Chapter 17 : Rank Transformation

  • Active and connected.
  • We can select top 10 or bottom 10 rows .. thus the number of rows decreases and thus it is an active transformation.
  • During a session, the Integration Service compares an input row with rows in the data cache. If the input row out-ranks a cached row, the Integration Service replaces the cached row with the input row.
  • If you configure the Rank transformation to rank across multiple groups (Using Group by in SQL), the Integration Service ranks incrementally for each group it finds.
  • The Integration Service stores group information in an index cache and row data in a data cache. If you create multiple partitions in a pipeline, the Integration Service creates separate caches for each partition.

  • Following is what we do when we create a Rank transformation:

Enter a cache directory.
Select the top or bottom rank.(TYpe of Rank)
Select the input/output port that contains values used to determine the rank. You can select only one port to define a rank.
Select the number of rows falling within a rank.
Define groups for ranks (define the columns to be used for GROUP BY)

  • Following are the types of ports on the Rank Transformation:

  1. Input: Minimum one: Create an input port to receive data from another
    transformation.
  2. Output: Minimum one: Create an output port for each port you want to link to another transformation. You can designate input ports as output ports.
  3. Variable Ports: Not Required: Can use to store values or calculations to use in an expression. Variable ports cannot be input or output ports. They pass data within the transformation only.
  4. Rank Port: Only one: Use to designate the column for which you want to rank values. You can designate only one Rank port in a Rank transformation. The Rank port is an input/output port.
You must link the Rank port to another transformation.

  • Rank index:The Designer creates a RANKINDEX port for each Rank transformation. The Integration Service uses the Rank Index port to store the ranking position for each row in a group.The RANKINDEX is an output port only. You can pass the rank index to another transformation in the mapping or directly to a target.

If two rank values match, they receive the same value in the rank index and the transformation skips the next value. Just like Rank SQL function.

  • Defining Groups:
    Like the Aggregator transformation, the Rank transformation lets you group information.
    Example: If you want to select the 10 most expensive items by manufacturer, you would first define a group for each manufacturer.

  • Steps to create a Rank Transformation:
  1. In the Mapping Designer, click Transformation > Create. Select the Rank transformation. Enter a name for the Rank. The naming convention for Rank transformations is RNK_ TransformationName.
  2. Link columns from an input transformation to the Rank transformation.
  3. Click the Ports tab and select the Rank (R) option for the rank port.
  4. If you want to create groups for ranked rows, select Group By for the port that defines the group.
  5. Click the Properties tab and select whether you want the top or bottom rank.
  6. For the Number of Ranks option, enter the number of rows you want to select for the rank

  • Following describes the properties tab in more detail:
    Setting
    Meaning
    Cache Directory
    Local directory where the Integration Service creates the index and data cache files. By default, the Integration Service uses the directory entered in the Workflow Manager for the process variable $PMCacheDir.
    Top/Bottom
    Specifies whether you want the top or bottom ranking for a column.
    Number of Ranks
    Number of rows you want to rank.
    Tracing Level
    Determines the amount of information the Integration Service writes to the session log about data passing through this transformation in a session.
    Rank Data Cache Size
    Data cache size for the transformation. Default is 2,000,000 bytes.You can configure a numeric value, or you can configure the Integration Service to determine the cache size at runtime. If you configure the Integration Service to determine the cache size, you can also configure a maximum amount of memory for the Integration Service to allocate to the cache.
    Rank Index Cache Size
    Index cache size for the transformation. Default is 1,000,000 bytes. You can configure a numeric value, or you can configure the Integration Service to determine the cache size at runtime. If you configure the Integration Service to determine the cache size, you can also configure a maximum amount of memory for the Integration Service to allocate to the cache.
    Transformation Scope
    Specifies how the Integration Service applies the transformation logic to incoming data:

    - Transaction. Applies the transformation logic to all rows in a transaction. Choose Transaction when a row of data depends on all rows in the same transaction, but does not depend on rows in other transactions.

    - All Input. Applies the transformation logic on all incoming data. When you choose All Input, the PowerCenter drops incoming transaction boundaries.

1 comment:

  1. It was really a nice article and i was really impressed by reading this Informatica Online Course

    ReplyDelete