You are on page 1of 432

Creating and Working with TS Quality

Projects

Version 10.5

October 2006

Opening this package indicates your acceptance of the terms and conditions of the HarteHanks license agreement. The customer acknowledges and agrees that (a) the System and
all related documentation are confidential trade secrets of Harte-Hanks or Harte-Hanks
licensors and (b) title to and intellectual property rights in the System and related
documentation (including without limitation all copyright, trademark, trade secret and patent
rights) are and shall remain the confidential proprietary property and information of HarteHanks and Harte-Hanks licensors.

The customer shall use the system only in accordance with this Agreement. The customer
shall not disclose, copy, or reproduce any portion of the system or documentation in any form
to any third person without the prior written consent of Harte-Hanks, nor allow third parties
to do the same. The customer shall keep the System and all confidential information in the
strictest confidence.

Creating and Working with TS Quality Projects

October 2006

Trillium Software System is a registered trademark of Harte-Hanks. UNIX is a registered


trademark of UNIX System Labs, Inc. AIX, AS/400, CICS, OS/390, RS-6000, and NUMA-Q are
registered trademarks of International Business Machines Corporation. HP-UX is a registered
trademark of Hewlett-Packard Company. Windows NT, Windows 98, Windows 2000, Windows
XP are registered trademarks of Microsoft Corporation. Solaris and Java are registered
trademarks of Sun Microsystems. Unisys is a registered trademark of Unisys Corporation. ZIP
Code, ZIP +4 and CASS are registered trademarks of the U.S. Postal Service. PAF is a
registered trademark of the Royal Mail. InstallShield is a registered trademark and service
mark of InstallShield Corporation. All other brand names and products are trademarks or
registered trademarks of their respective companies.

Copyright Trillium Software a division of Harte-Hanks, Inc. 2006


All rights reserved.

TOC-1

Creating and Working with TS Quality Projects


CHAPTER 1

Introduction ................................................................ 1-1


Sample Project......................................................... 1-2

CHAPTER 2

Working with a Project ............................................ 2-1


Types of Projects ...................................................... 2-3
Using the Control Center ........................................... 2-4
Start the Control Center .......................................... 2-4
Set Up the Control Center........................................ 2-5
Creating a Project..................................................... 2-9
Understanding the Control Center Features .................2-16
Project Panel ........................................................2-16
Project Viewer.......................................................2-17
Step Viewer ..........................................................2-19
Using the Data Flow Architect....................................2-20
Graphics View .......................................................2-21
List View ..............................................................2-26
Using a Project Step ................................................2-28
The Data Dictionary Language (DDL) .........................2-33
Methods of Creating a DDL .....................................2-33
Creating a DDL Using the DDL Editor........................2-36
Creating a DDL in a Text Editor ...............................2-39
Type Keyword .......................................................2-42

CHAPTER 3

Investigating Your Data .......................................... 3-1


View Data Using the Data Browser .............................. 3-3
View DDLs Using the DDL Editor ................................. 3-9
Analyze Data Using TS Discovery...............................3-12
Identify the Problems with Data ................................3-13

CHAPTER 4

Using the Global Steps ............................................. 4-1


Using the Global Data Router ..................................... 4-3
Input and Output Settings ....................................... 4-3
Process Settings ..................................................... 4-5
Run the Global Data Router and View Results ............4-11

TOC-2
CHAPTER 5

Cleansing Your Data ................................................. 5-1


Using the Transformer .............................................. 5-3
Input and Output Settings ....................................... 5-3
Using Multiple Input Files to Create an Output DDL ..... 5-7
Process Settings ..................................................... 5-9
Conditionals............................................................5-21
Syntax .................................................................5-21
Operators in Conditional Statements ........................5-26
Operators for Asian Characters................................5-28
Build a Conditional Statement .................................5-35
Select or Bypass Records........................................5-37
Additional Settings.................................................5-38
Run the Transformer and View Results .....................5-39

CHAPTER 6

Standardizing Your Data ......................................... 6-1


Using the Customer Data Parser ................................. 6-3
Understanding Parsing Logic Flow ............................... 6-4
How the Customer Data Parser Identifies Business Names
6-5
CDP Parsing Process ............................................... 6-5
Customer Data Parser for China, Japan, Korea, and Taiwan
6-8
PREPOS ...............................................................6-12
Input and Output Settings ......................................6-14
Process Settings ....................................................6-16
Additional Settings.................................................6-22
Run the Customer Data Parser and View Results........6-25
Analyze Results .....................................................6-25
Statistics File ........................................................6-26
Using the Business Data Parser .................................6-27
BDP Parsing Process ..............................................6-27
Additional Settings.................................................6-34
Run the Business Data Parser and View Results .........6-36

CHAPTER 7

Tuning the Parsing Rules ........................................ 7-1


Understanding the Parser Definitions Tables ................. 7-3
Standard and User Definitions Tables ........................ 7-3

Creating and Working with TS Quality Projects

TOC-3
Syntax of Definitions............................................... 7-4
Synonym..............................................................7-12
Special Entries ......................................................7-14
Conventions in Parsing Customization ........................7-21
How to Customize the Parser Definition Tables for Japan.. 7-23
Clue Table ............................................................7-23
Name Tables.........................................................7-26
jp_bnp_name.txt...................................................7-27
jp_bnp_name_h.txt ...............................................7-28
jp_pnp_name.txt...................................................7-29
Using the Parser Customization Editor ........................7-31
View a Standard Definitions Table ............................7-31
View and Correct City Problems ...............................7-33
View and Correct Pattern Problems ..........................7-37
Save the Entries....................................................7-40
Re-Run Customer Data Parser .................................7-40
View Errors in Parsing Customization........................7-40
CHAPTER 8

Analyzing Single Data .............................................. 8-1


Using the TS Quality Analyzer .................................... 8-3
Start the TS Quality Analyzer ................................... 8-3
Data Entry and Cleansing ........................................ 8-4
Advanced Details.................................................... 8-7
Matching ............................................................... 8-8
Organize Database ................................................8-10

CHAPTER 9

Enriching Your Data .................................................. 9-1


Sorting for the Postal Matcher .................................. 9-2
Input and Output Settings ....................................... 9-2
Process Settings ..................................................... 9-5
Additional Settings.................................................. 9-6
Run the Sorting Utility and Check Results .................. 9-8
Using the Postal Matchers.......................................... 9-9
Input and Output Settings ....................................... 9-9
Process Settings ....................................................9-11
Additional Settings.................................................9-13
Match Levels.........................................................9-15
Dual Address Information .......................................9-17

TOC-4
Browsing the Postal Directory....................................9-20
City Level Directory ...............................................9-20
Street Level Directory ............................................9-21
Street Details........................................................9-22
CHAPTER 10

Linking Your Data .................................................... 10-1


Using the Window Key Generator...............................10-3
Input and Output Settings ......................................10-4
Process Settings ....................................................10-5
Run the Window Key Generator and View Results ......10-9
Sorting the Record by the Window Key ..................... 10-10
Input and Output Settings .................................... 10-10
Process Settings .................................................. 10-11
Run the Sorting Utility and Check Results ............... 10-12
Using Relationship Linker........................................ 10-13
Linking Examples .................................................. 10-14
Window Linking ................................................... 10-18
Run the Relationship Linker and View Results .......... 10-23
Reference Linking ................................................ 10-24
Run the Relationship Linker and View Results .......... 10-29

CHAPTER 11

Tuning the Linking Rules ....................................... 11-1


Using the Relationship Linker Results Analyzer ............11-3
View the Linking Results .........................................11-3
Edit Fields to Display..............................................11-7
Save Fields to Display ............................................11-8
View Records in a Range ........................................11-9
Using the Relationship Linker Rule Editor .................. 11-12
View the Linking Rules ......................................... 11-12
Customize the Field and Pattern Lists ..................... 11-15
Re-Run the Relationship Linker and View Results ..... 11-19
Using the Data Comparison Calculator ...................... 11-21

CHAPTER 12

Selecting the Best Record ..................................... 12-1


Using the Create Common Utility ...............................12-3
Input and Output Settings ......................................12-4
Process Settings ....................................................12-6
Additional Settings............................................... 12-10

Creating and Working with TS Quality Projects

TOC-5
Run the Create Common and View Results .............. 12-11
Create Common Decision Routines........................... 12-12
Decision Routine Selections for a Single Field .......... 12-14
CHAPTER 13

Manipulating Your Data ......................................... 13-1


Using the Data Reconstructor ....................................13-3
Rules File .............................................................13-3
Input and Output Settings .................................... 13-20
Settings for the Data Reconstructor ....................... 13-22
Setting the Rules File ........................................... 13-22
Setting the Use Rule ............................................ 13-22
Additional Settings............................................... 13-24
Run Data Reconstruction and View Results.............. 13-26
Bringing the Data Together ..................................... 13-27
Add a Global Transformer step .............................. 13-27
Input and Output Settings .................................... 13-29
Process Settings .................................................. 13-31
Run Transformer and View Results......................... 13-32

CHAPTER 14

Packaging Projects ................................................. 14-1


Batch Script............................................................14-3
Create a Script ......................................................14-3
Edit a Script..........................................................14-4
Run a Script .........................................................14-5
Create Multiple Batch Files ......................................14-6
Exporting/Importing Projects ....................................14-7
Export Projects......................................................14-8
Import Projects .....................................................14-9
Import Projects from Windows to UNIX................... 14-10
Real-Time Processing ............................................. 14-11
The Director ....................................................... 14-11
Moving From Batch to Real-Time ........................... 14-14
Linking Single Record Using the TS Quality Analyzer. 14-14

CHAPTER 15

Working from the Command Line ....................... 15-1


Executing TS Quality Modules....................................15-3
Syntax .................................................................15-3
Program Names ....................................................15-4

TOC-6
CHAPTER 16

Working with the TS Quality Utilities ................ 16-1


File Display Utility....................................................16-3
Input and Output Settings ......................................16-3
Outer Key and Inner Key ........................................16-3
Title and Delimiters................................................16-5
Field Settings ........................................................16-8
File Update Utility .................................................. 16-10
Match Keys and Fields .......................................... 16-10
Input and Output Settings .................................... 16-15
Match Key Settings .............................................. 16-15
Transaction Output Settings.................................. 16-16
Frequency Count Utility .......................................... 16-17
Input and Output Settings .................................... 16-18
Count Settings .................................................... 16-18
Merge Split Utility .................................................. 16-19
Input and Output Settings .................................... 16-19
Using Multiple Input Files to Create an Output DDL .. 16-19
Merge Files ......................................................... 16-21
Split a File .......................................................... 16-23
Merge and Split Files............................................ 16-25
Resolve Utility ....................................................... 16-27
Input and Output Settings .................................... 16-28
Link Field ........................................................... 16-29
Set Selection Utility ............................................... 16-30
Input and Output Settings .................................... 16-30
Select Records .................................................... 16-31
Sort Utility............................................................ 16-33

CHAPTER 17

Customizing the Control Center .......................... 17-1


Changing the Control Center Display Settings..............17-3

APPENDIX A

The Data Dictionary Language and DDL Types .A-1


The Data Dictionary Language.................................... A-2
Data Dictionary Language (DDL) Types ....................... A-3
Encoding (Code Page) ............................................. A-3
Trillium Types ........................................................ A-6
Date Format .......................................................... A-8

Creating and Working with TS Quality Projects

TOC-7
CLASS Keyword ....................................................A-10
APPENDIX B

Parser Review Code ..................................................B-1


Parser Results .......................................................... B-2
Parser Completion Codes (CDP/BDP) ......................... B-2
Customer Data Parser Review Code/Review Groups .... B-3
Review Group Hierarchy .......................................... B-8
Business Data Parser Review Code...........................B-11
Customer Data Parser Review Codes/Review Groups for
Asia-Pacific Countries .............................................B-12

TOC-8

Creating and Working with TS Quality Projects

CHAPTER 1

Introduction

This book is intended for users who wish to learn how to use TS
Quality. It provides step-by-step instructions to set up a project and
process data. The book assumes that the users have installed TS
Quality Server, TS Quality Client, TS Quality Country Template
Projects and Postal Tables according to Installing TS Quality, and
read the introductory book, Getting Started with TS Quality.
This book covers the basic functions of TS Quality, but users should
also consult companion materials, such as TS Quality Reference
Guide and TS Quality Online Help to utilize the full capabilities of
TS Quality.
See Getting Started with TS Quality for the complete
list of TS Quality documentation and materials.

Introduction

1-2

Sample Project

Sample Project
In this book, a global sample project (TMT project) is used to
illustrate various TS Quality functions. The TMT (TrilMedTech)
project contains customer data from the United States, United
Kingdom, Canada and Germany. The record data consists of typical
business database fields:

Customer business name

Contact name

Phone number

Address information

Product information

Various dates

Account representative

Account status

Customer identification numbers

The goal of this sample project is to create a consolidated customer


view and to eliminate the poor data quality and redundancy in the
sample data. Through this project, you will complete several tasks:
analyze the data and identify issues
cleanse and standardize data elements
enrich address information and identify duplicate records
link duplicate records
package processes and create a batch file
At the conclusion of this initial batch process, the output file will
contain one contact name per business location.

Creating and Working with TS Quality Projects

2-1

CHAPTER 2

Working with a Project

Working with a Project

2-2
In order to process your data, we strongly recommend that you first
create a project. A project includes a set of steps (core modules) for
centralized access and allows you to manage data processing tasks
easily. Projects are created in the Control Center, the graphical user
interface. Within a project, you can run processes, view data, create
and edit DDLs, modify settings, analyze output and tune the overall
process. Projects within the Control Center are mainly used to
create and test batch process flows for later use in a production
environment.
This chapter focuses on these topics:
Project types
Starting and setting up the Control Center
Creating and working with projects
For an overview of the TS Quality Control Center and
projects, refer to Getting Started with TS Quality.

Creating and Working with TS Quality Projects

Types of Projects

2-3

Types of Projects
A project is a combination of one or more modules and tasks that
process a particular set of data in a job flow. Each module in a
project is called a step. A project includes all required data files,
DDL files, settings files, output, statistics files, user-defined tables
and batch scripts for modules. Within a project, you can run the
entire job flow, from the Transformer to the Relationship Linker, or
only part of the flow.
There are two types of projects:
Standard Project - a basic project which includes
predefined modules
Custom Project - a complex project for advanced users
The Create New Project Wizard will guide you through creating a
project. You will be prompted to select a type at the beginning of
the Wizard. Both standard and custom projects may later be
modified by adding and deleting steps, or can be customized by
adding user-defined components.

Working with a Project

2-4

Using the Control Center

Using the Control Center


Start the Control Center
To Start the Control Center

Make sure that


TS Quality
Server has
been
configured
correctly. Refer
to Getting
Started with
TS Quality for
server
configuration.

1.

Double-click the TS Quality v10.5 Server icon on the


desktop or select Start, Programs, Trillium Software
System, TS Quality, v10.5, Start Server.

2.

Double-click the TS Quality v10.5 Control Center icon on


the desktop or select Start, Programs, Trillium Software
System, TS Quality, v10.5, Control Center. This starts
the TS Quality Client and the Control Center.

3.

The Start Up screen appears.

Figure 2.1 Start Up Screen

Creating and Working with TS Quality Projects

Set Up the Control Center

2-5

The Control Centers main window is behind the Start Up


screen. The main window contains tool bars and a tools
palette to give you quick access to the most commonly used
tools and applications.
Refer to Getting Started with TS Quality for an
overview of Control Center Tools on the Tools Palette.
Main Menu

Tools Palette

Tool Bar

Work Area

Figure 2.2 Control Center Main Window

Set Up the Control Center


When you start the Control Center for the first time, you should set
up General Preferences for the basic Control Center settings.
General Preferences include several options:
Working with a Project

2-6

Set Up the Control Center


Startup options
Default project directories
Project input staging area
Location of the Online Help directory and the Web browser
path
Editors and statistics viewer programs
TS Discovery launch directory
Text and color used within the Control Center
To set up General Preferences
1.

Select Setup from the main menu.

2.

Select Preferences. There are two tabs, General and


Display.

Figure 2.3 General Preference


Creating and Working with TS Quality Projects

Set Up the Control Center


3.

2-7

The General tab allows you to decide which applications or


functions to launch upon starting the Control Center. Select
or specify options based on the table below:

Option

Description

On Startup

Determines how the Control Center handles projects upon


startup.
Open the last project - The project that you were working
on in your previous session automatically opens upon
startup.
Default - No projects are launched upon startup.

Other Startup
Options

Select one or more of these check boxes to determine which


applications or features will be displayed upon startup.
Show Session Viewer- The Session Viewer opens upon
startup.
Show Toolbar - The Toolbar is displayed upon startup.
Show Tool Palette - The Tool Palette is displayed upon
startup.
Show Startup Page - The Start Up screen is displayed
upon startup.
Automatically Backup Projects (.prj only) - When
checked, a backup file of your .PRJ file is automatically
created.
Checking this option does NOT back up your entire project.
It simply creates a copy of your main .PRJ file.

Default Project
Directory

Enter the directory where project and step files will be stored.
Default: C:\TrilliumSoftware\tsq10r5s\mynewdir

Input Staging
Directory

Enter the directory where input data files for the project or step
will be stored.
Default: C:\TrilliumSoftware\tsq10r5p

Help Directory

Enter the directory where Help files are stored.


Default: C:\TrilliumSoftware\tsq10r5c\doc

My Editor

Enter the path and executable file of your text editor to display
and edit text files within the Control Center.

My Statistics Viewer

Enter the path and executable file of the application used to


display statistics files within the Control Center.

Working with a Project

2-8

Set Up the Control Center


My browser

Enter the path and executable file of your Internet browser,


used to display on-line documentation under the Help Menu.
Example: C:\Program Files\Internet Explorer\IEXPLORE.EXE
In order to access the online manuals, you must specify a
default web browser.

Discovery Launch
Directory

Enter the directory path used to launch TS Discovery.

4.

Click OK.

See Changing the Control Center Display Settings on


page 17-3 for display settings.
To get Help
Once you have specified a web browser in Control Center
Preferences, you may view the online help manuals.
1.

Select Setup, Preferences.

2.

On the General tab, set My browser to:


C:\Program Files\Internet Explorer\IEXPLORE.EXE

3.

Select OK to close the Preferences window.

4.

From the main menu, select Help. The TS Quality option


opens the home page of the TS Quality documentation set.

5.

The TS Quality Control Center Help opens the documentation


for the Control Center.

6.

TS Quality on the Web will automatically connect you to the


trilliumsoftware.com website if you are connected to a
network. Once on the website, you can access technical
support, software upgrades and downloads, educational
offerings and more.

7.

Program-specific help is also available on the Advanced tab


of each program step.

If you are a new user, be sure to register on the


www.trilliumsoftware.com website for a wealth of
technical user information and support.

Creating and Working with TS Quality Projects

Creating a Project

2-9

Creating a Project
The Control Center allows you to create a standard or a custom
project. The standard project option is recommended for new users
and may later be modified to meet your specific data cleansing
needs. The custom project option is used to create a more complex
project and is recommended for more experienced users. The
Project Wizard will guide you through the project creation
process.
In order to create a TS Quality project you will need certain
information:
The name and location of your input data file(s). The input
data file(s) should be either:

a fixed field file or

a delimited file

The name and location of your input Data Dictionary


Language (DDL) file(s). The input DDL file(s) should be in
either:

XML format (.ddx) or

text format (.ddt)


See The Data Dictionary Language (DDL) on page 233 for detailed information on DDL files.

To create a project, follow this process:


Select a project type
Specify project settings
Specify input data and input DDL files
Set up name and address format
Review the project summary

Working with a Project

2-10

Creating a Project
To select a project type
1.

From the main menu select File, New Project. The Create
New Project Wizard appears.

2.

On the Choose Project Type window, select either the


Create a standard project option or the Create a custom
project option.

3.

Select Next.

4.

In the Choose Project Option window, select one of the


following options:

Option

Description

Standardize

Identifies, verifies and normalizes data.

Standardize and
Enrich

Identifies, verifies and normalizes data. Improves


data using the Postal Matchers.

Standardize,
Enrich and Link

Identifies, verifies and normalizes data. Improves


data using the Postal Matchers. Groups data by
identifying relationships and by applying specific
linking rules.

Other Custom
Process

Include separate components comprising the options


above.

To specify project settings


1.

Select Next. On the Specify Project Settings window,


configure the following settings:

Settings

Description

Project Name

Name of the project.

Project Directory
Path

Project location on the server.


You can create a project anywhere but it must
be located on the server where the TS Quality
Server application has been installed.

Single or Multiple
Country Project

Specify whether the project contains data from one


or multiple countries.

Creating and Working with TS Quality Projects

Creating a Project
Input Files
Country of Origin

2.

2-11

Select a country for your input data.


If you selected Multiple-Country(Global) Project
in the option above, this option is not available.

Select Next.

Figure 2.4 Project Settings


Multi-country project
3.

If you select Multiple-Country (Global) Project, follow these


steps. If not, go to step 4.
The Select Global Project Countries window will
indicate what country template projects are installed on
the server. Select all countries you are using, and Add
them to the box on the right. Use the CTRL key to make
multiple selections.
Specify if you are using a single input file or multiple
country input files. If you are using multiple country
input files, you must select define input files now or
define input files later. If you define input files now,
Working with a Project

2-12

Creating a Project
provide the input file name, format, and DDL in the
Specify Multiple Inputs window. Click Next.

Valid
delimiters are
Tab, Space,
Semicolon,
Comma, and
Pipe.
Characters
other than
those listed
must be
enclosed by
quotation
marks.

Figure 2.5 Select Global Project


To specify input data and input DDL
1.

On the Specify Input Data and Format window, use the


File Chooser
to select the input data file name.

2.

Specify whether the input file format is Fixed or Delimited.


If the file is delimited, select the delimiter from the dropdown list and define whether the input file has a header or
not.
If you dont have a DDL file for your delimited input, it
will be created automatically using the header as field
names.

Creating and Working with TS Quality Projects

Creating a Project
3.

If you are
using a
delimited file
as input in the
Wizard, the
subsequent
input files and
all output files
in the project
become fixed
field files.

2-13

Use File Chooser


to enter the Data Dictionary Language
(DDL) file name. Select Next.

Figure 2.6 Specify Input Data and Format


4.

If you are creating a custom project, the Select Project


Components window now appears. Select the desired
project components. The order of your selections will
determine the sequence of steps in your project. Select
Next. (If you are creating a standard project, skip this step
and go to the next step.)

To set up name and address format


1.

At the end of the previous step, the Set Up Name and


Address Format window appears. Here, you can drag and
drop name and address field names onto the Name and
Address Palette as in a typical mailing label format. The
Dictionary Field Names box shows all of the field names
Working with a Project

2-14

Creating a Project
found on the input DDL file. Select the field and drag it to the
Name and Address Palette. The actual record data is
displayed in the Preview Name Address area.
After dragging selected fields to the palette, you can
make multiple fields single-line by editing them in the
palette.

The Apply
button must
be selected for
the Control
Center to
accept your
desired name
and address
format.

Figure 2.7 Set Up Name and Address Format


2.

Review your records in the specified format, using the View


Records buttons. Click Apply to accept the data format.

To review the summary of project


1.

Select Next. The Summary window indicates the options


that you have selected for this project.

2.

If you need to change these options, click Back to return to


the appropriate window. Click Finish to accept these settings
and create the new project.

3.

The status bar at the bottom of the Control Center will


indicate that it is copying the appropriate country templates
and building the project components.

Creating and Working with TS Quality Projects

Creating a Project
4.

2-15

When the process is complete, the Data Flow Architect


area of the Control Center will be populated with the new
project.

Figure 2.8 Data Flow Architect - New Project

Working with a Project

2-16

Understanding the Control Center Features

Understanding the Control Center


Features
The Control Center consists of three layers:
Project Panel
Project Viewer
Step Viewer

Project Panel
The Project Panel is displayed when the Control Center is opened.
Existing projects appear as a suitcase icon labeled with the users
hostname and the project name.
To explore the Project Panel
1. Click
to close an open project and to view the Project
Panel.

Project Panel

Figure 2.9 Project Panel


2.

Right-click the project icon. From this contextual menu you


can Open, Delete or view the projects Properties. Select
Properties. There are two tabs, General and Contents.
The General tab displays basic information about the
project such as Name, Type, Owner, Version, Creation
Date, Last Modified, Last Executed, and Location.

Creating and Working with TS Quality Projects

Project Viewer

2-17

The Contents tab displays content-related information


about the project such as Country List, Module List and
Comments.

Right-click and
select
Properties

Figure 2.10 Project Properties

Project Viewer
The Project Viewer displays all modules or steps within a project.
To explore the Project Viewer
1.

Double-click the project icon. The Project Viewer opens.

2.

The Project Viewer contains three views:

Working with a Project

2-18

Project Viewer

Project Components View

Project Viewer

Figure 2.11 Project Viewer


Project Components View lists the project steps: first
by country, and then by steps within that country.
Graphics View displays steps in order of processing,
using a graphical flowchart format.
List View lists steps in order of processing.
See Using the Data Flow Architect on page 2-20 for
more information about these views.

Creating and Working with TS Quality Projects

Step Viewer

2-19

Step Viewer
In the Step Viewer you can set up the module, specify input and
output files, modify program tasks and conditions, customize rules,
run the module, and view and analyze output files, statistics and
logs.
To open the Step Viewer
1.

Double-click either the module icon in the Graphics View,


the module in the List View, or the module in the Project
Components View.

Figure 2.12 Step Viewer

Working with a Project

2-20

Using the Data Flow Architect

Using the Data Flow Architect


Once your project has been created, the Data Flow Architect
(DFA) presents your project in the Graphics View. The DFA lets
you review and modify the data quality process. Step modules are
displayed in a flowchart model, with connection arrows used to
identify the flow of data. You can create step connections and job
flows to run in batches. These flow charts can be customized and
printed for easy illustration of the data quality process.

Figure 2.13 Data Flow Architect - Graphic View

Creating and Working with TS Quality Projects

Graphics View

2-21

Graphics View
In the Graphics View, you can perform various step-specific tasks:
run, rename, and move steps
delete and connect steps
copy steps
change settings files

Right-click on a
step

Figure 2.14 Menu from a Step


To run steps
1.

To run a single step, right-click it and select Run Selected


from the pop-up menu.

2.

To run multiple steps, use CTRL+click to select several steps.


Once the steps are selected, right-click and select Run
Selected.

3.

To run steps that are connected, right-click on the desired


starting point and select Select All Downstream, All
Dependencies or Whole Flow. Once you make the
appropriate selection, right-click and select Run Selected.
Working with a Project

2-22

Graphics View
To rename steps
1.

Right-click a step and select Rename from the pop-up menu.

2.

Enter a unique step name and click OK.

To move steps
1.

To move a single step, click and hold the step and drag it to a
new location.

2.

To move the entire job flow, click on the first step, hold down
the CTRL key, and click all the other steps in the job flow. You
may now drag the complete flow to a new location. Or, rightclick a step and select the Select All Downstream option,
then drag it to a new location.

To connect steps
1.

To connect two steps, right-click the first step and select


Start Connection, then click the second step. OR, click the
connection area on the first step and click the second step to
connect it.

To remove a connection
1.

To remove a connection, right-click the step and select


Remove Incoming Connection or Remove Outgoing
Connection.

To move a Connection Area


1.

Position your cursor over the connection area on the step


until it changes to a cross hair. Right-click and select Move
to Bottom, Move to Top, Move to Left or Move to Right.

Incoming
Connection Area

Outgoing
Connection Area

Figure 2.15 Step Connection Area

Creating and Working with TS Quality Projects

Graphics View

2-23

To copy a step module


1.

To copy a step module, right-click the module and select


Copy Selected.

2.

In the list view, select the module to copy from the list and
click the Copy Selected Step button in the toolbar menu
above.

To change a settings file


1.

To change a settings file, right-click a step module and


choose Change Settings File.

2.

Select the settings file you want to use to replace content in


the steps current settings file.
You must select a settings file of the same type, for example,
if the step is a Transformer step, you must select a
transfrmr.stx file.

3.

A confirmation dialog will appear. Click Yes to copy the


contents of the selected settings file.

Working with a Project

2-24

Data Flow Architect Settings

Data Flow Architect Settings


In addition to the step-specific tasks, you can make changes to the
Data Flow Architect itself:
Lock steps
Add comment
Select all steps
Add new steps
Print the Data Flow Architect
Set preferences
For Preferences settings, see Set Up the Control Center
on page 2-5. Also see Changing the Control Center
Display Settings on page 17-3.

Figure 2.16 Menu from the DFA


To lock steps
1.

Right-click anywhere inside the DFA (except on a specific


step) and select Lock. This will lock all steps into place.
Remember to unlock the DFA if you wish to add, delete or
move a step.

Creating and Working with TS Quality Projects

Data Flow Architect Settings

2-25

To add comment
1.

Right-click anywhere inside the DFA (except on a specific


step) and select Add Comment.

2.

Enter a comment in the Edit Comment window and click OK.


The comment is inserted in the DFA window. You can also
drag this comment to another location.

3.

To edit comments, right-click on the comment and select


Edit, Resize, Hide, or Delete. You can also select Show All
Comments, Show Comment Borders, and Delete All
Comments from the DFA menu.

To select all steps


1.

Right-click anywhere inside the DFA (except on a specific


step) and select Select All Steps. This will select all steps in
the DFA.

To add new steps


1.

Right-click anywhere inside the DFA (except on a specific


step) and select Add New Step from Palette. This will
open the Step Palette on the left side of the DFA.

2.

Select a step from the Step Palette: Drag and drop it on the
DFA. Choose a country, and provide a name for this step.
Then click OK.

To print Data Flow Architect


1.

Right-click anywhere inside the DFA (except on a specific


step) and select Print Data Flow Architect. The Page
Setup window opens.

2.

Specify the page settings and click OK.


You have several print options:
Display landscape paper boundary
Display portrait paper boundary
Display architect title imprint

Working with a Project

2-26

List View

List View
In the List View, you can view steps in the order in which they will
be processed. A step may be opened by double-clicking it. From the
List View you can perform several tasks:
Open, rename, add, delete, and reorder steps
Generate a batch script to run selected steps
For information on batch scripts, See Batch Script on
page 14-3.
To open the List View
1.

Select the List View tab to view the project steps.

2.

Click a step in the List View and the tool bar options become
available.
Tool bar

Figure 2.17 List View Tab


To open the step
1.

In the List View, highlight a step.

2.

Click

on the tool bar.

To rename the step


1.

In the List View, highlight a step.

2.

Click

3.

In the Provide a Unique Step Name box, enter the new


name for the step. Click OK.

on the tool bar.

To add steps

Creating and Working with TS Quality Projects

List View

2-27

1.

In the List View, highlight a step.

2.

Click
on the tool bar. Step Palette appears on the
left. Drag and drop the desired step into the List View.

3.

In the Choose Country Name box, select a country from


the drop-down list. Click OK.

4.

The new step is added after the step you highlighted.

To delete steps
1.

In the List View, highlight one or more steps.

2.

Click

on the tool bar.

To move steps
1.

In the List View, highlight one or more steps.

2.

Use the up and down arrow keys to move the steps into the
desired order for processing.
Move selected step(s) up
and down

Working with a Project

2-28

Using a Project Step

Using a Project Step


A Project contains a series of steps. The configuration of a step
window is the same for all steps. Therefore, the following steps are
the same for all modules.
To open a step
1.

Double-click the step in the Project Steps By Country list,


or double-click the step icon in the Data Flow Architect
pane. The Step Window appears. The Step Window
contains three tabs:
Input Settings
Output Settings
Results

The input, output and other settings are explained in detail


for each step in subsequent chapters. This section
provides information on the general procedures for a
project step.

Input
Settings
tab

Use the Input Settings tab to specify the Input File Name and
Input DDL Name.
To specify input files
1.

Type a file name in the Input File Name and Input DDL
Name text boxes. You can use the File Chooser button to
select the files.

2.

Click Add. The file name is dynamically added to the table in


the Input Data File Name and Input DDL Name columns.

To replace the input files


1.

Type a file name in the Input File Name and Input DDL
Name text boxes. You can use the File Chooser button to
select the input files.

Creating and Working with TS Quality Projects

Using a Project Step


2.

2-29

Click Replace. The file names in the Input Data File Name
and Input DDL Name column are replaced with the files you
just specified.

To delete the input files


1.

Highlight the row in the Input Data File Name and Input
DDL Name column that contains the file names you want to
delete.

2.

Click Delete.

The Data Browser


can be invoked to browse the
input file. The Dictionary Editor
can be invoked to
view or edit the DDL. The Comment icon
allows the
user to add comments and notes related to the step.
Input Settings Tab

Comment
Data Browser
DDL Editor

Entry List

Figure 2.18 Step Window - Input Settings

Working with a Project

2-30

Using a Project Step

Output
Settings
tab

The Output Settings tab lets you specify the Output File Name,
the Output DDL Name, the Statistics File Name, and the
Process Log Name.
To specify output files
1.

Type a file name in the Output File Name and Output DDL
Name text boxes. You can use the File Chooser button to
select the files.

2.

Type a file name in the Statistics File Name and Process


Log Name text boxes. You can use the File Chooser button
to select the files.

Output Settings Tab

Figure 2.19 Step Window - Output Settings


Advanced
Settings

Most step configurations are made in the Advanced Settings


Window. Advanced Settings options allow the user to customize
settings for each step. The appearance of the Advanced Settings
window varies depending on the step.
To open Advanced Settings

Creating and Working with TS Quality Projects

Using a Project Step


1.

2-31

Click the Advanced... button from the step.


Advanced Settings

Figure 2.20 Step Window - Advanced Settings


Results tab

The Results tab displays output information related to the steps


execution.
Statistics - The Statistics tab shows statistics from the
run, which may be viewed using the My Statistics Viewer
icon
or the Spreadsheet Viewer
. (You can specify
the editor to use as the My Statistics Viewer when setting
up Preferences for the Control Center.) The Spreadsheet
Viewer displays the statistics in an MS Excel format.
Process Log - The Process Log tab displays processing
statistics from the step run.
Error Log - The Error Log tab displays any errors
encountered during the step run. Process and Error Logs may
be viewed using the Text Viewer
.

Working with a Project

2-32

Save and Run a Step

Figure 2.21 Step Window - Results


If the Process Log exceeds the capacity of the window, you
can click the Text Viewer icon to display the entire file in
a separate window.

Save and Run a Step


After you finish configuring your settings, you can save your
settings without running the step, or run the step.
To save a step without running
1.

Click Save to save your settings.

To run a step
1.

Click Run at the bottom of the step, or right-click the step


icon and select Run Selected. Clicking the Run button saves
your settings by default and then runs the program.

Select the Save button on a step to save any changes


made to the settings if you are not going to Run the step.
Changes are automatically saved when a step is Run.

Creating and Working with TS Quality Projects

The Data Dictionary Language (DDL)

2-33

The Data Dictionary Language (DDL)


The Data Dictionary Language (DDL) is a collection of English
statements used to define file and record layouts. DDLs are used
throughout the TS Quality system. A file that contains DDL
components is called a DDL file. DDL files are either in xml format
or in text format.
XML Format
File extension is .ddx (Example: input.ddx)
Text Format
File extension is .ddt (Example: input.ddt)
See Chapter 2 in Getting Started with TS Quality for
the location of default DDL files in the directory structure.

Methods of Creating a DDL


You can create DDLs by the following methods:
Data Dictionary Editor (DDL Editor)
You can use the Data Dictionary Editor in the Control Center
to create a DDL or modify an existing DDL. The default
format for the DDL Editor is XML. The users can convert XML
files to text files or text files to XML files in the DDL Editor.
Any Text Editor
You can use any text editor to create DDLs in text format.
Special text formatting, such as underline or bold, should not
be applied because the software will be unable to read it.

Delimited Files Considerations


The input and output files for TS Quality can be fixed-field files or
delimited files. Internally, the delimited file's records will be put into
a fixed format for processing according to the DDL.

Working with a Project

2-34

Keywords in a DDL
For delimited files, every field in the DDL should reflect the
maximum field length.
For example, if you have a field on input called ADDR_LINE_1 and
the value is "10 Main St", then a field length of 10 bytes for that
field will be sufficient, but a field length of 8 bytes will truncate to
"10 Main ". If you have that field on output and the line was
changed to "10 Main Street" by processing, then a field length of 10
will truncate the output to "10 Main St". Make sure that you have
enough field length for each field on the DDL for delimited files.

Keywords in a DDL
A DDL uses the keywords shown in Table 2.1. Required keywords
are listed in bold.
Table 2.1 DDL Keywords
Keyword

Description

Record
Name

The name of a record in the DDL. 1 to 32 characters long. If it contains


embedded spaces, they must be enclosed in double quotes.

Record
Length

The total record length in bytes. The total length of the record must be
equal to the sum of the lengths of all fields.

Field Name

Name of the field. If it contains embedded spaces, they must be


enclosed in double quotes.
At least one field statement per file is required.
Maximum 32 bytes.
Field names should only contain letters, numbers, and
underscores.

Type

Data type for the field. You can specify the appropriate character
encoding or other type of value.
See Type Keyword on page 2-42.

Redefine

Redefine the field to a specific byte position in the record.


See The REDEFINE Function on page 2-40.

Start
Position

The relative byte position of a field within the record. DDLs are zerobased. Therefore, the first field of a record generally begins in column
zero.

Creating and Working with TS Quality Projects

Keywords in a DDL

2-35

Table 2.1 DDL Keywords (Continued)


Keyword
Length

Description
The length of a field in bytes. The number must be a positive integer
greater than zero.
If the entity is a field, the length must be less than the Record Length.
Two fields cannot occupy the same space, unless one field is a
redefinition of that field. If the entity is a subfield, the length must be
less than the parent field.
The sum of all field lengths must equal the length of the record.

Default

The default value for the field. The value must agree in type with the
Type. Numbers may be positive or negative.
Values:
SPACES fill the field length with spaces.
-1 for a numeric with a negative value.
0 for a numeric.
'0' for a character field.
"0" for a string field.

Comment

The comment for the field.

Attributes

This allows data in the field to be passed through a TS Quality step


without any data interpretation or translation. Any field type can be
used because there will be no data translation.
Value:
NONVALIDATION data in the field will remain as is.

Working with a Project

2-36

Creating a DDL Using the DDL Editor


Table 2.1 DDL Keywords (Continued)

Keyword
CLASS

Description
Converts any 2-digit year into a 4-digit year. If used, it must
immediately follow a Field statement.
Values:
DATE BACKWARD
DATE FORWARD
DATE WINDOW {nnn}
If used, CLASS is required to be on the input DDL. See CLASS
Keyword on page A-10 to learn more about the Class
specifications.

Creating a DDL Using the DDL Editor


You can create a DDL from a fixed-field data file or a delimited data
file. Complete the following steps to create a new DDL.
To create a DDL from a fixed field data file
1.

Open the DDL Editor from the Control Center. Select New
from the File menu. A new empty DDL opens.

Creating and Working with TS Quality Projects

Creating a DDL Using the DDL Editor


2.

2-37

Select DDL Builder from the Tools menu. In the Select


Data File section, enter the file name and record length, and
select the encoding for your data file.

Figure 2.22 DDL Builder


3.

In the Record section, highlight a portion of the record you


want to make a field in the DDL.The Start and End Position
automatically appear in the windows.

4.

Specify the Field Name and select the Field Type.

5.

Click Add to DDL.The new field will be added to the DDL


table.

6.

Repeat this process until all fields are defined in the DDL.

7.

Save the DDL, using a .ddx extension to the dictionary file


name.

Working with a Project

2-38

Creating a DDL Using the DDL Editor


To create a DDL from a delimited data file

If you are
using the
Project Wizard
to create a
project and
you dont have
a DDL file for
delimited
input, it will be
created
automatically
using the
header as field
names.

1.

Open the DDL Editor from the Control Center.

2.

Select New from the File menu. A new empty DDL opens.

3.

Select Tools, Create DDL from Delimited File. Select the


delimited filename and delimiter.

4.

Specify the output DDL filename. The first part of the


delimited file will be displayed in the Sample Data Preview
window.

Figure 2.23 Generate Dictionary from Delimited File


5.

Click Create.The new DDL will be automatically created.


Save the DDL using a .ddx extension to the dictionary file
name.

For delimited files, every field in the DDL will reflect the
Creating and Working with TS Quality Projects

Creating a DDL in a Text Editor

2-39

maximum field length.

Creating a DDL in a Text Editor


When creating a DDL in a text editor, make sure to include
keywords and follow the set grammar that must be used in creating
DDLs.

Syntax
Use the following syntax:
Keyword [is, are, in] Parameter
Keywords are case-insensitive.
For example, the following keywords all mean the same
thing:"Field", "FIELD", and "field".
Brackets
The actual brackets [ ] are not physically entered on a DDL
file. All punctuation and noise words such as "is", "are", and
"in" can be used. They are highly recommended to make
subsequent reading more understandable.
Parameters are case sensitive.
All name and string value parameters are case sensitive.
String values are enclosed within double quotes. (example,
Hello World)
Tab characters are not allowed in a DDL.
Always define until the last carriage return.
Comments can be enclosed between the string pairs "/*"
and "*/", or can be indicated by the prefix string "//".

Working with a Project

2-40

DDL Components in Text Format

Example
/* This is a comment that extends over two lines
delimited by the slash and asterisk pairs */
//This is a comment to the end of this line

DDL Components in Text Format


A text DDL consists of two main sections: Record and Body
information.
Type is FIXED
Record Information
Length is 200

Field is input_line_1
Type is ASCII
Starts in column 0
Length is 50
Field is input_line_2
Type is NOTRANS
Starts in column 50
Length is 50
Default is 0
Field is input_line_3
Class is DATE FORWARD
Type is ASCII
Starts in column 100
Length is 50

Body Information

Field is input_line_4
Type is ASCII
Starts in column 150
Length is 50
Attributes are NOVALIDATION

The REDEFINE Function


By redefining fields with the REDEFINE keyword, you can use part

Creating and Working with TS Quality Projects

The REDEFINE Function

2-41

of the field or the same field with a different name in the output.
Redefining fields requires listing two fields: the field to be redefined,
followed by a field listing that is the redefinition.
The Starts in position may be maintained manually. However, the
automatic renumbering of Starts in position is facilitated through
the //REDEFINE statement. When the Recalculate Positions
function in the DDL Editor encounters the string //REDEFINE ahead
of a pair of field definitions, it will not increment the Starts in
number for the second field definition.
Type is FIXED
Length is 200
//REDEFINE
Field is ORIGINAL_RECORD
Type is ASCII
Starts in COLUMN
0
Length is 200

If you are using a


delimited file for
input, you cannot use
the Redefine function
on the output DDL.

Field is input_line_1
Type is ASCII
Starts in COLUMN
0
Length is
100
Field is input_line_2
Type is ASCII
Starts in COLUMN
100
Length is
100

Working with a Project

2-42

Type Keyword

Type Keyword
The Type is required for every field entity. There are two Type
categories: encoding (code page), and date format. The
following list shows the main values used for the Type keyword.
Encoding (Code Page)
Encoding is a mapping of binary values to code position to
represent characters of data. It is also called a code page.
The main character encoding used in TS Quality includes
ASCII, Latin1 and Latin2.
See Appendix A for the complete list of Encoding.
Date format
Date format is a type of data which may contain only valid
dates.
See Appendix A for the complete list of Date format.
Class keyword
Class keyword specifies the format to be used for the date
field. By using the class keyword, you can convert any 2-digit
year into a 4-digit year.
See Appendix A for the complete list of Class
keywords.

Creating and Working with TS Quality Projects

3-1

CHAPTER 3

Investigating Your Data

Investigating Your Data

3-2
After you create a project, you must investigate your data before
working with any processes. Investigation helps you determine how
well your data conforms to rules that govern acceptable limits and
requirements for data elements, and helps you understand what
data quality processes need to be put in place. Investigate your
data with the Data Browser, DDL Editor, and TS Discovery.
This chapter focuses on four tasks:
View data using the Data Browser
View DDL using the DDL Editor
Analyze data using TS Discovery
Identify problems with the data

Creating and Working with TS Quality Projects

View Data Using the Data Browser

3-3

View Data Using the Data Browser


For detailed
information
on the Data
Browser, see
the Online
Help.

The Data Browser lets you view a data file to verify its format as
described by the data dictionary language (DDL) file. You can verify
the format on either a record-by-record or on a field-by-field
basis.
To open the Data Browser and view the input data
1.

Double-click the projects suitcase icon in the Data Flow


Architect and open the project. Existing projects are shown
as a suitcase icon with the users hostname and project
name.

Project

Figure 3.1 Project in the Data Flow Architect


2.

Double-click on the first step (for example,


inputTransformer), and open the step.

First step

Figure 3.2 inputTransformer Step

Investigating Your Data

3-4

View Data Using the Data Browser


3.

On the Input Settings tab, select the first entry in the entry
listing options. The input file name and corresponding DDL
file name will already be populated. These files were
specified during the Creating Project Wizard process (See
Creating a Project on page 2-9).

Entry Listing

Figure 3.3 Input Settings Tab


4.

Select the Data Browser


icon next to the input file
name. The Data Browser opens the input file with its
corresponding DDL.
You can also open the Data Browser from the Tools
Palette by double-clicking the Data Browser icon. In
this case, you must select the input file and DDL to
view. Opening the Data Browser from within a step
automatically opens the tool, the file, and its
corresponding DDL.

You can sort the


fields. Click on
the Field
Name, Start
Position,
Length or Type
column
headers.

5.

The Field Selection window opens. This window shows all


the fields that exist in the input DDL.

Select the fields you want to display in the upper pane and
click Add. To select all the fields, click Add All.
7. After the fields appear in the Selected Fields list box, you
have several options:
6.

Clear all the fields by clicking Clear

Creating and Working with TS Quality Projects

View Data Using the Data Browser

3-5

Change the position of a field(s) or delete it by selecting


the field, and then clicking the Arrow button
or
to
move it up or down, or the Delete button
to delete it

The order of the


fields
determines the
order in which
they will be
displayed when
you browse the
records.

Save the selected fields (called the view) in a file by


clicking the Save button

See the next procedure To save the view for more details
on saving the fields.

Figure 3.4 Field Selection Window


8.

Click Display.

Investigating Your Data

3-6

View Data Using the Data Browser


9.

You can display


the data by
Record
Numbers or
Byte Offsets.
Select either
option in the
Options menu.

Browse the data and verify that the field names reflect the
data contained within them.

Figure 3.5 Input Data


To save the view
You can save or store a view of data in the Data Browser. If you
frequently look at the same fields in a file, saving a view can
save time.
1.

In the Field Selection window, select the fields you want to


view using the CTRL Key. For example, select Phone,
Country, Start_date and Product_type.

2.

Select Add to add the selected fields.

3.

Click Save

to save the selected fields.

Creating and Working with TS Quality Projects

View Data Using the Data Browser


4.

3-7

The Save window opens. Name this view and save it in the
desired directory.

The view file will


have the
extension of
.cuv.

Figure 3.6 Save View File


To view a stored view
1.

To view a stored view, click Load


in the Field Selection
window. The Customized View window will show all stored
views.

Investigating Your Data

3-8

View Data Using the Data Browser


2.

Click the view name and select OK. The fields will be loaded
in the Selected Fields window. Select Display to view the
stored fields.

Figure 3.7 Customized View Window


3.

Select File, Exit and close the Data Browser.

Creating and Working with TS Quality Projects

View DDLs Using the DDL Editor

3-9

View DDLs Using the DDL Editor


For detailed
information
on the DDL
Editor, see
the Online
Help.

The Data Dictionary Editor (DDL Editor) lets you view existing
data dictionary language (DDL) files.
To open the DDL Editor and view a DDL
1.

On the Input Settings tab, click Dictionary Editor


next
to the input DDL name. The DDL Editor will open the input
DDL file. The DDL is displayed in a table.

Figure 3.8 Data Dictionary Editor

Investigating Your Data

3-10

View DDLs Using the DDL Editor


2.

Window/Option

Description

Record Name

The records name.

Record Length

Total length of the record represented by this DDL in


bytes.

Update
ORIGINAL_
RECORD Length

The ORIGINAL_RECORD length update option. See the


Online Help for details.

3.

You can edit


all items in
the column
from this
table. See
the Online
Help for
details.

The upper frame shows the Record Name, Record Length and
Update ORIGINAL_RECRD_LENGTH option:

The lower spreadsheet shows the details of the selected DDL.


Refer to the following table and verify each item in the DDL:

Column

Description

Field Name

DDL fields listed row by row, in the order that they appear
in the DDL. The standard field names are displayed in blue.
Other unique field names are displayed in black.

Type

Field type (encoding).


See Encoding (Code Page) on page A-3 for details of
encoding.

Redef
(Redefine)

Indicates whether the field is redefined to a specific byte


position in the record.
Y = field is redefined
blank = field is not redefined

Start Pos.

The zero-based byte position where the field begins in the


record.

Length

The length of the field in bytes.

Creating and Working with TS Quality Projects

View DDLs Using the DDL Editor

3-11

Default

Default value for the field.

Comment

The comments for a field.

Attribute

Indicates whether the data is to be passed through a step


without any validation.
NOVALIDATION = data in the field remain as is
even if it is in a different field type.

Class

Converts a 2-digit year into a 4-digit year.

4.

Select File, Exit and close the DDL Editor.

5.

Click

in the upper right-hand corner to close the step.

Investigating Your Data

3-12

Analyze Data Using TS Discovery

Analyze Data Using TS Discovery


TS Discovery is a data profiling tool used to discover and analyze
data quality. If you want to analyze data in more detail to reveal
data anomalies, broken data rules, misaligned data relationships,
and other characteristics, we recommend using TS Discovery before
running other TS Quality processes.
One (1) license for TS Discovery is included with the TS Quality
Client. You can launch TS Discovery by clicking the TS Discovery
icon
from the Control Center Toolbar.
Instructions for TS Discovery are not included in this book. Refer to
TS Discovery manuals for more information.

Figure 3.9 TS Discovery

Creating and Working with TS Quality Projects

Identify the Problems with Data

3-13

Identify the Problems with Data


By browsing the input data and input DDL files, you can identify
many issues with data such as misspelling, inconsistent format,
incorrect entries and duplicate records. Depending on the problems
with data, you must decide what cleansing and standardization
processes are necessary.
For example, the following issues have been identified in the data in
the sample TMT project:
The input file contains data from multiple countries (US, CA,
DE, GB)
The Phone number field has variations in the phone formats
The country names in the Country field are inconsistent
The current data format in the Start_date field is different
than the date format in the Last_contact_date field
There are different values for the same products in the
Product_type field
There seem to be misspelled addresses in the data
There are duplicate records across the data
Those issues will be corrected in the subsequent chapters of this
guide. First the global data is separated to four (4) input data files.
Next, data cleansing and standardization are performed at each
country level. After the addresses are validated and corrected, the
records are linked to identify the duplicate data. At the end of the
process, the best records with most recent information will be
output and a batch script for the entire process will be created for
production use.

Investigating Your Data

3-14

Identify the Problems with Data

Creating and Working with TS Quality Projects

4-1

CHAPTER 4

Using the Global Steps

Using the Global Steps

4-2
After you have investigated the data and identified the issues, you
can begin to process the data. First, use the Global Data Router to
separate the multi-country input file into country-specific files. One
advantage to running the Router step before cleansing and
standardizing your data is that it enables data to be standardized at
the country level. This ensures that further processing is done at a
country-specific level.
In this chapter, you will perform these tasks:
Specify the input and output files
Identify the rules files used to determine the country of origin
Identify the Global Geography table, which contains state,
city, locality, post code and word/pattern structures
Define the settings for the Global Data Router. These include
the ability to:

Use a Country Code field to identify country of origin

Review the country list and determine the countries which

are available to the Global Data Router

Modify the default list of fields to scan for the country of


origin

Run the Global Data Router and view results.

Creating and Working with TS Quality Projects

Using the Global Data Router

4-3

Using the Global Data Router


The Global Data Router scans an input file that contains record
data from more than one country, identifies the country-specific
data, and then creates one output file per country that contains only
the data specific to the country you selected.
The Global Data Router uses Rules Files that contain countryrelated word definitions and tables. These rules specify how many
output files to generate and which countries are identified. The
Router supports input data from most countries.

Input and Output Settings


Tip:
You can either
edit the file names
manually or click
the File Chooser
icon to browse to
the appropriate
file and
select it.
To view the
contents of your
data file,
click the
Data
Browser
icon.
Use the
Dictionary
Editor to view the
contents
of the
DDL file.

Since the Global Data Router step is usually the first step in the
project, it uses the Input File Name and Input DDL Name specified
in the Project Wizard as the default inputs.
To specify input and output files
1.

Open the Global Data Router step and click the Input
Settings tab.

2.

Enter file names in the Input File Name and Input DDL
Name text boxes.

3.

Click the Output Settings tab.

4.

Enter file names in the Output File Name and Output DDL
Name text boxes.
Separate Output
If you want a separate output file for each country, select
Generate a separate output file per country. When
this option is selected, an underscore(_) and an asterisk
(*) will be added automatically to the filename you
specified in the Output File Name text box. After
processing, each output filename will include a country
suffix in lower case. For example, the US data will be
named <filename>_us, and the Canadian data will be
named <filename>_ca.
Using the Global Steps

4-4

Input and Output Settings


Single Output
If you are generating a single output file for all countries,
deselect Generate a separate output file per country.
In this case, all data, separated by country, will be
written to the single output file you specified.
If you provided Name and Address data in the Project
Wizard, the output DDL will contain a series of redefines for
the Name and Address data. Redefines are used to map the
customer-defined field name to the TS Quality Name and
Address reserved DDL field name. The input fields are
mapped to the reserved TS Quality name and address field
names INPUT_LINE_01 through INPUT_LINE_10. If a
name or address line contains multiple fields, the input fields
are mapped to INPUT_LINE_02a, INPUT_LINE_02b, etc.
5.

A red flag
indicates a
REQUIRED field
for this operation.

Enter file names in the Statistics File Name and Process


Log Name text boxes.

To specify the input/output file qualifiers


A File Qualifier is a unique name given to a data file. Each
input and output data file must have a unique file qualifier.
1.

Click Advanced and navigate to Input, Settings.

2.

Select Input Data File Qualifier (default is INPUT).

3.

Click Advanced and navigate to Output, Settings.

4.

Select Output Data File Qualifier (default is OUTPUT).

You may also specify the following settings:


Tip:
You can either
edit the file names
manually or click
the File Chooser
icon to
browse
for and
select the file.

To specify the NOMATCH file


The NOMATCH file contains records where the Global Data
Router was not able to determine country of origin.
1.
2.

Click Advanced and navigate to Process, Settings.


Locate Nomatch File and specify the file.

To specify the starting record


1.

Click Advanced and navigate to Input, Settings.

Creating and Working with TS Quality Projects

Process Settings
2.

4-5

Enter a numeric value in Start at Record. This specifies the


record in the input data file at which the Global Data Record
will begin processing (default is 1).

To specify the maximum number of records to process


1.

Click Advanced and navigate to Input, Settings.

2.

Enter a numeric value in Process a Maximum of. This


specifies the maximum number of records to process. By
default, all records will be processed.

To process every nth record only


1.

Click Advanced and navigate to Input, Settings.

2.

Enter a numeric value in Process Nth Sample. This


specifies that only every Nth record will be processed. By
default, all records will be processed.

To use a delimited file


Valid delimiters
are Tab, Space,
Semicolon,
Comma, and
Pipe. Characters
other than those
listed must be
enclosed within
quotation marks.

If you are using a delimited file for input and/or output, you
must specify delimited settings.
1.

Click Advanced and navigate to Input, Settings.

2.

Select Input Data File Delimiter Encoding and Input


Data File Delimiter from the drop-down list.

3.

For output, click Advanced and navigate to Output,


Settings.

4.

Select Output Data File Delimiter Encoding and Output


Data File Delimiter from the drop-down list.
See Encoding (Code Page) on page A-3 for more
information on encoding.

Process Settings
Once you have specified input and output files, you are ready to
specify the settings to process your data. Do this in the Advanced
Settings window.

Using the Global Steps

4-6

Rules Files

Rules Files
The Global Data Router uses two rules files to determine country of
origin. Rules files contain entries that define the resource tables
used by the Global Data Router program, as well as country-specific
data.
Global Rules FileDefines rules that apply to all
countries. It also contains translation tables, street types,
city definitions, and other rules that require lengthy entries.
Country Rules FileDefines rules that apply to specific
countries.
See Global Data Router in the TS Quality Reference
Guide for details of these rules files.
To specify the Rules Files

Tip:
You can either
edit the file names
manually or click
the File Chooser
icon to
browse
for and
select the file.

1.

Open the Global Data Router step.

2.

Click Advanced and navigate to Process, Settings.

3.

Locate the Global Rules File and Country Rules File and
specify the files.
Default Global Rules File:
\TrilliumSoftware\tsq10r5s\tables\
general_resources\rtrules1.win
Default Country Rules File:
\TrilliumSoftware\tsq10r5s\tables\
general_resources\rtrules2.win

You can edit the Rules Files. You may also use the
Customer Rules File which allows you to add your own
user-definied rules. See Global Data Router in the TS
Quality Reference Guide for details.

Global Geography Table


In addition to the Rules Files, the Global Data Router uses a Global
Geography Table that contains state, city, locality, post code and
word/pattern structures.
Creating and Working with TS Quality Projects

Country Settings

4-7

This table is read-only and may not be changed.


To specify the Global Geography Table
1.

Click Advanced and navigate to Process, Settings.

2.

Locate the Global Geography Table and specify the file.


Default Global Rules File:
TrilliumSoftware\tsq10r5s\tables\
general_resources\GLOBRTR.tbl

For China, Japan, Korea, and Taiwan, you must specify


the APGLBRTR.tbl geography file using the Global Geog
APAC File Name settings. If you want to include other
countries such as the US, you must also specify the
regular GLOBRTR.tbl geography file.

Country Settings
If the data has a country code field, you must specify the field name
for the country code. This ensures that the Global Data Router uses
the data in this field to identify and score country of origin.
To specify a Country Code Field
1.

Click Advanced and navigate to Process, Settings.

2.

Locate the Country Code Field. Select the appropriate field


name from the drop-down list.

Country
Code
Field

Using the Global Steps

4-8

Country Settings

Figure 4.1 Country Code Field


To review the country list
Make sure the Country List identifies the valid country
choices for your data.
1.

Navigate to the Country List, Country settings. The


Country Names are automatically entered based on your
selection in the Project Create Wizard.

2.

Review the list and confirm that the Country List identifies
the valid country choices for your data.

Country List

Figure 4.2 Country List

Creating and Working with TS Quality Projects

Fields Settings

4-9

Fields Settings
You must tell the Global Data Router which fields contain country of
origin codes. When there is no valid country code or the country
code is suspect, the Field Settings will determine which fields the
GDR will inspect.
To specify fields to scan for country of origin data
Navigate to Fields, Field. Select the field name that
contains information for country of origin.
If you have a valid country code field, you can select that
field. This means that the program will only scan that field
for country of origin data.

Figure 4.3 Field Settings

DDL Settings
If you choose, you can specify separate output DDLs for each
country. If this is not specified, the output DDL specified in the
Output Settings will be used.
To specify a separate DDL for each country
1.

Click Advanced and navigate to DDL, Settings.

2.

Select the DDL file for each country from the drop-down list.

Using the Global Steps

4-10

Additional Settings

Additional Settings
You can specify the following additional settings:
See Global
Data Router in
TS Quality
Reference
Guide for the
complete
settings
information.

To enable debug function


1.

Click Advanced and navigate to Process, Settings.

2.

Select Enable Debug Output.

3.

In the Debug File text box, accept the default path and file
name, or enter a new file name. Debugging information will
be written to this file.

To count number of records processed


1.

Click Advanced and navigate to Process, Settings.

2.

Enter a value in the Sample Count text box. This value


determines how frequently TS Qualty will report while
processing data. The number that you enter is the number of
records that TS Quality will process before printing a
progress report to the screen. For example, if you enter 50,
TS Quality will print a message after processing 50, 100, 150
and so on.
This count will be written to the Process Log file. To
display the Log file, select the Results tab and
navigate to the Process Log tab after the program is
run. The default is always 1.

To specify settings file encoding


1.

In Settings File Encoding, select the correct encoding from


the drop-down list.
See Encoding (Code Page) on page A-3 for more
information on encoding.

Creating and Working with TS Quality Projects

Run the Global Data Router and View Results

4-11

Run the Global Data Router and View Results


To run the Global Data Router and view results

When you click


Run,
TS Quality
automatically
saves your
settings.
To save your
settings
without running
the step, click
Save.

1.

Click OK to close Advanced Settings.

2.

Click Run to run the Global Data Router.


You can also right-click on a step and select Run Selected.

3.
4.

Select OK.
On the Results tab, select the Statistics sub-tab. The
Statistics sub-tab will show the number of records included in
each country-specific file. The NOMATCH file contains any
records where the Global Data Router was unable to
determine country of origin.

Figure 4.4 Global Data Router Statistics


Using the Global Steps

4-12

Run the Global Data Router and View Results

Creating and Working with TS Quality Projects

5-1

CHAPTER 5

Cleansing Your Data

Cleansing Your Data

5-2
After you separate the input data into country-specific data, you can
start the cleansing process. This chapter explains how to cleanse
the data using the Transformer.
In this chapter, you will perform these tasks:
Specify the input and output files
Use character translation to convert particular hexadecimal
values
Use field scanning to change field values
Use table recoding to recode the values in a field using a
literal or mask shape
Use conditionals to control the field scan and table recode
settings
Run the Transformer and review the results

Creating and Working with TS Quality Projects

Using the Transformer

5-3

Using the Transformer


The Transformer converts input data from one or more files and
formats to a single output, based on fields specified by one or more
Data Dictionary Language (DDL) files. The Transformer lets you
convert and merge records from up to ten input files into a single,
standard format.
The Transformer performs several functions:
Scan data records for defined shapes (masks) and literal
values, and then move, recode, or delete the data
Apply sophisticated conditional logic to perform an unlimited
number of data transformations
Modify field lengths
Recode character fields, based on a user-defined external
table
Identify and separate records that reject the conversion
process so that they can be more closely examined

Input and Output Settings


The Transformer uses the output from the Global Data Router step
as input. If the Transformer step is the first step in your project, it
will use the Input File Name and Input DDL Name specified in the
Project Wizard as the default inputs.
To specify input and output files
1.

Open the Transformer step and select the Input Settings


tab.

2.

Specify a file name in the Input File Name and Input DDL
Name text boxes.

3.

Click Add. The file name is dynamically added to the table in


the Input Data File Name and Input DDL Name columns.
- OR Cleansing Your Data

5-4

Input and Output Settings

Tip:
You can either
edit the file names
manually or click
the File Chooser
icon to
browse
for and
select the file.
To view the
contents of your
data file,
click the
Data
Browser
icon.

Click Replace. The default file names in the Input Data File
Name and Input DDL Name column are replaced with the
files you just specified.
The Transformer can use up to ten input files
simultaneously.
4.

Navigate to the Output Settings tab.

5.

Specify a file name in the Output File Name and Output


DDL Name text boxes.

6.

Specify a file name in the Statistics File Name and


Process Log Name text boxes.
If you provided the Name and Address data during the
Project Wizard, the output DDL will contain a series of
redefines for the Name and Address data. Redefines map the
customer defined field name to the TS Quality Name and
Address reserved DDL field name. The input fields are
mapped to reserved TS Quality name and address field
names INPUT_LINE_01 through INPUT_LINE_10. If a
name or address line consists of multiple fields, the input
fields are mapped to INPUT_LINE_02a, INPUT_LINE_
02b, etc.

Use the
Dictionary
Editor to view the
contents
of the
DDL file.

To specify the input/output file qualifiers


A File Qualifier is a unique name given to a data file. Each
input and output data file must have its own unique file
qualifier.
1.

Click Advanced and navigate to Input, Settings.

2.

Specify Input Data File Qualifier (default is INPUT).

3.

Click Advanced and navigate to Output, Settings.

4.

Specify Output Data File Qualifier (default is OUTPUT).

You can also specify the following settings:

Creating and Working with TS Quality Projects

Input and Output Settings

5-5

To specify multiple input files


If you have multiple input files, make sure that the settings
will be applied to your desired input file.
1.

Click Advanced and navigate to Input, Setting.

2.

Select the appropriate input file from the Input Files text
box on top.

Figure 5.1 Transformer Multiple Input Files


3.

Specify your settings for the desired input file.

To specify an exceptions file


1.

Click Advanced and navigate to Input, Settings.

2.

In the Exceptions File text box, accept the default file or


specify the path and name of the file that contains
exceptions records. Exceptions records contain data such as
incorrect records or field types.

To use a delimited file


Valid
delimiters are
Tab, Space,
Semicolon,
Comma, and
Pipe.
Characters
other than
those listed
must be
enclosed by
quotation
marks.

If you are using a delimited file for input and/or output, you
must specify delimited settings.
1.

Click Advanced and navigate to Input, Settings.

2.

Select Input Data File Delimiter Encoding and Input


Data File Delimiter from the drop-down list.

3.

For output, click Advanced and navigate to Output,


Settings.

4.

Select Output Data File Delimiter Encoding and Output


Data File Delimiter from the drop-down list.
See Encoding (Code Page) on page A-3 for more
information on encoding.
Cleansing Your Data

5-6

Input and Output Settings


To specify the origin of record
1.
2.
3.

The File
Source and
Source Field
work together.
If you specify
one of these
values, you
must specify
the other
value. If you
delete one of
these values,
you must
delete the
other value.

4.

Click Advanced and navigate to Input, Settings.


In File Source, enter text to specify the origin of the data
file.
Select File Source Encoding from the drop-down list.
Navigate to Output, Settings. In Source Field, select the
DDL field to receive the origin of record you specified in File
Source.

To specify the starting record


1.

Click Advanced and navigate to Input, Settings.

2.

Enter a numeric value in Start at Record. This specifies the


record in the input data file at which to begin processing
(default is 1).

To specify the maximum number of records to process


1.

Click Advanced and navigate to Input, Settings.

2.

Enter a numeric value in Process a Maximum of. This


specifies the maximum number of records to process. By
default, all records will be processed.

To process every nth record only


1.

Click Advanced and navigate to Input, Settings.

2.

Enter numeric value in Process Nth Sample. This specifies


that only every Nth record will be processed. By default, all
records will be processed.

Creating and Working with TS Quality Projects

Using Multiple Input Files to Create an Output DDL

5-7

Using Multiple Input Files to Create an Output


DDL
You can specify up to a maximum of ten (10) input files and their
associated DDLs and use these to create a common output file for
later processing by modules downstream in your workflow. This
process requires that after you specify the input files, you map
input fields from the associated DDLs to a common output DDL file.
To add multiple input files and map
1.

Double-click a Transformer step to open the Transformer


Input Settings window.

2.

In the Input Data File field, type or browse to the input file
you wish to use.

3.

In the Input DDL File field, type or browse to the inpt DDL
file associated with the input data file you specified in Step 2.

4.

Click Add.

5.

Repeat Steps 2-3 until youve added all DDL files you want to
use to create the common output format.

6.

Click the Define Output DDL button (bottom left).

7.

The Define Output DDL dialog appears.

Cleansing Your Data

5-8

Using Multiple Input Files to Create an Output DDL

Figure 5.2 Define Output DDL dialog


8.

Use the Input DDL drop-down menu to select the DDL file
you want to use to map fields to an output DDL file. The
input DDL fields appear in the left pane and the final output
DDL fields appear in the right-pane.

9.

Use the buttons in the center panel to refine the output DDL
list of fields. You can choose from these options:
Addadds the selected input DDL field to the output DDL
list.
Deletedeletes a selected output DDL field from the list.

Creating and Working with TS Quality Projects

Process Settings

5-9

Move Upmoves the selected field in the output DDL list up


one row.
Move Downmoves the selected field in the output DDL list
down one row.
Redefineredefines an input field as a portion of an output
field. Use this option to map multiple input fields to the same
redefined output field.
Consolidateconsolidates an input field with an existing
output field. Use this option when two or more fields have
different names but contain the same data, such as zipcode,
ZIP5, and postal_code.
For Redefine and Consolidate, make sure that the
lengths of the input fields do not exceed the overall
length of the redefined or consolidated output DDL
field.
10.

When you are ready, click Save to save the output DDL field
mapping. When the Transformer step runs, it will create an
output DDL file that uses this mapping.

Process Settings
Once you have specified the input and output files, you can
configure the settings to process your data. The settings for
processing are managed in the Advanced Settings window.

Character Translation
The Transformer lets you convert the original hexadecimal value to
another hexadecimal value.
To convert the hex value
1.

Click Advanced and navigate to Input, Character


Translation.

Cleansing Your Data

5-10

Field Scanning
2.

Specify a value for Input Field Name. This is the field to


which the hex translation is applied.

3.

Specify a value for From Hex Value. This is the original hex
value which will be translated to another hex value.

4.

Specify a value for To Hex Value. This is the hex value to


which the original value is translated.

Field Scanning
The Field Scanning function converts the
values in the field. You can scan values and
then Change, Copy, Cut, and Flag the
values.

Scan and Change


To scan a field and change its value

A red flag
indicates a
REQUIRED
field for this
operation.

1.

Click Advanced and navigate to Output, Field Scanning.


Select the Change tab.

2.

Refer to the following table and specify values for Change in


the Field Scanning window:

Setting
Scan Field

Description
Field in the DDL file that specifies the location in which
to perform the scan.

Creating and Working with TS Quality Projects

Scan and Change


Setting

5-11

Description

Field Justification

Specifies how data contained in the field is aligned:


Left/Right Adjust- remove all spaces around the
value, pad the field with spaces, and change
multiple spaces between the value to a single
space.
Left/Right Trim - remove all spaces around the
value and pad the field with spaces
Left/Right Pack - remove all spaces, pack left/
right, and pad the field with spaces
No Justification (default) - no action is taken
Note for Asian Character Data: There is no
distinction between full-width spaces and halfwidth spaces in the Field Justification operation.
Full-width spaces within the text are converted to
half-width spaces.

Scan Format

Indicates the format of the value for which to scan:


either a Literal value (the actual value) or a Mask
value (the shape of the value)

Scan Value

User-defined value for which to scan in a specified scan


field

Change Value

User-defined value that replaces the scan value

Change Occurences Numeric value that indicates how many times to scan
for a value in a particular word or field
Scan Position

The physical location in the field at which to begin


scanning for the value: the exact Beginning of the
field, anywhere in the field (Default), or the exact End
of the field

Scan Level

Indicates whether to scan for a value at either the


Field level or at the Word level

Scan Direction

Indicates the direction of the scan: Right-to-Left or


Left-to-right

Between Substring

String of user-defined characters between which to


scan

And Substring

Ending substring between which to scan

Retain Between
Characters

Whether to retain the scanned-for value between


characters (check box)

Scan Value
Encoding

Specifies the code page used by the scan value

Cleansing Your Data

5-12

Scan and Change


Setting

Description

Change Value
Encoding

Specifies the code page used by the change value

Between Substring
Encoding

Code page used by a string of characters between


which to scan

Example
In this example, the phone number currently has dashes and
spaces. To match more accurately, you should remove the dashes
and spaces from the phone number. To change the phone number
format, scan the Phone field for the Literal value - (a dash) using
the following criteria:
Scan Field

Phone

Field Justification

Left Pack

Scan Format

Literal Value

Scan Value

- (a dash)

Scan Position

Default

Scan Level

Field

Change Value

(two sets of double quotes)

Change Occurrences

A (for All)

These settings will cause the Transformer to scan the Phone field
for the literal value - at the Field level. If the value is found, the
Transformer will left-pack the value and change it to nothing.
Phone Field
207-555-4423

Phone Field
207555442

Two sets of double quotes as the Change Value will change


the value to nothing.

Creating and Working with TS Quality Projects

Scan and Copy/Cut

5-13

Scan and Copy/Cut


To scan a field and copy or cut its value

A red flag
indicates a
REQUIRED
field for this
operation.

1.

Click Advanced and navigate to Output, Field Scanning.


Select the Copy or Cut tab.

2.

Refer to the following table and specify values for Copy or


Cut in the Field Scanning window:

Field

Description

Scan Field

Field in the DDL file that specifies the location in which to


scan

Target Field

Specifies the field in which to store the scan result

Field
Justification

Specifies how data contained in the field is aligned:


Left/Right Adjust- remove all spaces around the
value, pad the field with spaces, and change multiple
spaces between the value to a single space.
Left/Right Trim - remove all spaces around the
value and pad the field with spaces
Left/Right Pack - remove all spaces, pack left/right,
and pad the field with spaces
No Justification (default) - no action is taken
Note for Asian Character Data: There is no distinction
between full-width spaces and half-width spaces in
the Field Justification operation. Full-width spaces
within the text are converted to half-width spaces.

Scan Format

Indicates the format of the value for which to scan: a


Literal value or a Mask value

Scan Value

User-defined value for which to scan in a specified field

Retain Scan
Value

When checked, retains the scanned-for value in the target


field

Scan Level

Indicates whether to scan at either the Field level or the


Word level

Scan Position

Indicates the physical location in the field at which to


begin scanning: the exact Beginning of the field,
anywhere in the field (Default), or the exact End of the
field

Scan Direction

Indicates the direction of the scan: Right-to-Left, or Leftto-right

Cleansing Your Data

5-14

Scan and Flag


Field

Description

Scan Capture

Indicates the data to capture, based on the position of the


scanned-for value in the word or field

Word Delimiter

Specifies the delimiter used to separate words within a


field

Between
Substring

String of user-defined characters between which to scan

And Substring

Ending substring between which to scan

Retain Between
Substring

When checked, retains the scanned-for value between


substrings

Scan Value
Encoding

Specifies the code page used by the scan value

Word Delimiter
Encoding

Specifies the code page used by the word delimiter

Between
Substring
Encoding

Code page used by a string of characters between which


to scan

Scan and Flag


To scan a field and flag its value
1.

Click Advanced and navigate to Output, Field Scanning.


Select the Flag tab.

2.

Refer to the following table and specify values for Flag in the
Field Scanning window:

Setting

Description

Scan Field

Field in the DDL file that specifies the location in which to


scan

Target Field

Field that stores the result of the scan

A red flag
indicates a
REQUIRED
field for this
operation.

Creating and Working with TS Quality Projects

Scan and Flag


Setting

5-15

Description

Field Justification Specifies how data contained in the field is aligned:


Left/Right Adjust- remove all spaces left/right to
the value, pad the field with spaces, and change
multiple spaces between the value to single space
Left/Right Trim - remove all spaces left/right to
the value and pad the field with spaces
Left/Right Pack - remove all spaces, pack left/
right, and pad the field with spaces
No Justification (default) - no action is taken
Note for Asian Character Data: There is no
distinction between full-width spaces and half-width
spaces in the Field Justification operation. Full-width
spaces within the text are converted to half-width
spaces.
Scan Format

Indicates the format of the value for which to scan:


either a Literal value (the actual value) or a Mask value
(the shape of the value)

Scan Value

User-defined value for which to scan in a specified field

Retain Scan Value When checked, retains the scanned-for value in the
target field
Scan Level

Indicates whether to scan for a value at the Field level


or the Word level

Scan Position

Indicates the physical location in the field at which to


begin scanning for the scan value: the exact Beginning
of the field, anywhere in the field (Default), or the exact
End of the field

Scan Direction

Indicates the direction of the scan, either from


Right-to-Left, or from Left-to-right

Word Delimiter

Specifies the delimiters used to separate words within a


field

Flag Value

Specifies the user-defined value for a flag

Between
Substring

String of user-defined characters between which to scan

And Substring

Ending substring between which to scan

Retain Between
Substring

When checked, retains the scanned-for value between


substrings

Scan Value
Encoding

Specifies the code page used by the scan value

Cleansing Your Data

5-16

Scan and Flag


Setting

Description

Word Delimiter
Encoding

Specifies the code page used by the word delimiter

Flag Value
Encoding

Indicates the code page used for a flag value

Between
Substring
Encoding

Code page used by a string of characters between which


to scan

Example
For example, to flag the Doctor_flag field in this example, scan the
Title field for the Literal value DR using the following criteria
Literal values are always case sensitive.
Scan Field

Title

Target Field

Doctor_flag

Field
Justification

No Justification

Scan Format Literal Value


Scan Value

DR

Retain Scan
Value

Check on

Scan
Position

Default

Scan Level

Field

Flag Value

These options direct the Transformer to scan the Title field for the
literal value DR at the Field level. If the value is found, the
transformer will retain the scan value (DR) in the source field, and
place the flag value Y in the Doctor_flag field.
Title Field
DR

Creating and Working with TS Quality Projects

Doctor_flag Field
Y

Table Recoding

5-17

Table Recoding
The Transformers Table Recoding function converts the values in a
field using an external user-defined recode table. You can recode
literal or mask values.

Mask
Masks are character representations of a data value which define
each character in the data value as follows:
Code

Represents

Represents any letter (a-z, A-Z)

Represents a numeral (0-9)

explicit

Any data value element that is not a number or


a letter is shown exactly as it appears in the
data value, including spaces.

Value

Pattern shown in TS Quality

Jane Smith

aaaa aaaaa

5.00E+02

n.nna+nn

$400.00

$nnn.nn

05/31/2005

nn/nn/nnnn

jane_smith@abc.com

nnnn_nnnnn@nnn.nnn

Example

To perform table recoding


1.

Create a user-defined recode table. You can create a recode


table in any text editor.

2.

Table recoding uses a comma-delimited file with one


column for the original value and a second column for the
recoded value.
You can use any filename or suffix that you want, as
long as the file itself is comma-delimited.
Cleansing Your Data

5-18

Mask

Example
In this example, the Start_date field has a variety of data shapes
and formats, such as 1/1/2005 and 1/01/2005. Create a recode
table as shown to change the mask shapes for the Start_date field,
so that every Start_date has the format of MM-DD-YYYY.

Original Mask

Recode Mask

N = Numeric

Figure 5.3 Sample Recode Table


3.

The table requires a DDL that assigns field names to the two
columns. Create a DDL file that corresponds to the recode
table. For example, a DDL file for the table above would look
like this:

Figure 5.4 Sample DDL for the Recode Table

Creating and Working with TS Quality Projects

Mask

You can specify


up to five (5)
fields for Lookup
Table Fields,
Lookup Output
Fields and Recode
Output Fields.
When specify
multiple fields,
separate them by
commas.

5-19

4.

After you create a recode table and associated DDL, click


Advanced in the Transformer step and navigate to
Output, Table Recoding.

5.

Enter a Table Qualifier. A table Qualifier is an unique name


given to a table file. Each table file must have its own unique
file qualifier.

6.

Enter names for the Table File and Table DDL File.

7.

Specify Table File Delimiter.

8.

Specify Lookup Table Fields. These fields are a list of DDL


field names in the recode table where the original values are
described.

9.

Specify Lookup Output Fields. These fields are a list of DDL


field names in the output file where the program looks for
the original values.

10.

Specify the Lookup Output Fields Format: Literal or


Mask.

11.

Specify Recode Table Fields. These fields are a list of DDL


field names in the recode table where the recoded values are
described.

12.

Specify the Recode Table Fields Format: Literal or


Mask.

13.

Specify Recode Output Fields. These fields are a list of


DDL field names from the output DDL which are used to
store the recode value from the recode table.

Below are the sample settings for the Start_date field.


Table Qualifier

TBL1

Table File

datamask.csv

Table DDL File

datamask.ddx

File Delimiter

Comma

Lookup Table Fields

originalmask

Lookup Output Fields

Start_date

Lookup Output Fields Format

Mask

Reocode Table Fields

recodemask

Cleansing Your Data

5-20

Mask
Reocode Table Fields Format

Mask Value

Recode Output Fields

Start_date

These settings tell the Transformer to scan the Start_date


field for Mask, and recode the value according to the recode
table (datamask.csv). After running the Transfomer, every
Start_date will have the format of MM-DD-YYYY.
The .ddx and .csv suffixes are not required for the files
to work, however, we recommend that you use them
to avoid confusion.
14.

You can also specify the following setting:

Setting
Lookup Fields Case-Sensitive

Creating and Working with TS Quality Projects

Description
Enables or disables the case-sensitive table
lookup. By default, the lookup is caseinsensitive. For example, Rick will match
either RICK or riCK.

Conditionals

5-21

Conditionals
Conditionals control the flow of TS Quality processes by
performing specific operations on data records, or by running
functions. In the Transformer, the Conditionals function controls all
other functions including character translation, field scanning and
table recoding. The conditionals settings are specified in the
Advanced Settings, Conditionals window. This section explains
the conditionals syntax and sample usage, and then teaches you
how to build a conditional statement.
If you are using translation, recode, and/or scan
functions in the Transformer, you must specify
Conditionals. See Build a Conditional Statement on
page 5-35 for instructions.
In addition to the Transformer, you can use conditional statements
for the following TS Quality modules:
Customer Data Parser
Business Data Parser
File Display Utility
File Update Utility
Set and Selection Utility

Syntax
An IF/ELSE statement is used to describe the condition. The
following syntax must be used to build the conditional statement:
The IF keyword allows you to conduct conditional tests on values in
the field. When conditions are True, the RUN and/or SET keywords
following IF are executed. When condition(s) are False, the RUN
and/or SET keywords following the ELSE keyword are executed. A
conditional statement always closes with ENDIF. Refer to the

Cleansing Your Data

5-22

IF Statement
IF [condition]
RUN [function1]
SET [function2]
ELSE
RUN [function3]
SET [function4]
ENDIF
following table for a list of keywords used in conditional statements.
Table 5.1 Keywords of Conditional Statements
Keyword

Description

IF

Begins a statement. When conditions are True, the action


statements following the IF keyword are executed. Required.

RUN

Precedes action commands.

SET

Precedes assignment commands.

ELSE

When conditions are False, the action statements following the


ELSE keyword are executed.

ELSEIF

When IF conditions are False, ELSEIF condition is evaluated.

ENDIF

Closes out a conditional statement. Required.

IF Statement
The IF statement sets the condition. The IF statement is defined by:
DDL field names
Operators (arithmetic/comparison/logical)
Field value(s)
Literal field values such as Boston must be enclosed in
double quotation marks. Field names and numeric values
do not need the quotation marks. If numeric values such
as 123 are enclosed in the quotation marks, they are
read as literal values instead of numeric values.

Creating and Working with TS Quality Projects

RUN/SET Statements

5-23

Example
IF (age > 18 AND state IN (NY, MA) ) OR first_name LIKE *ob
The IF statements can be
nested as long as the
corresponding ENDIF
statement closes out the each
IF statement. See the nested
IF sample at right.

IF [condition1]
IF [condition2]
SET [function1]
ELSE
RUN [function2]
ENDIF
SET [function3]
ELSE
RUN [function4]
ENDIF

RUN/SET Statements
The RUN/SET statement contains the function to perform.

RUN
The RUN statement is defined by:
Function names as defined in the Transformers settings file
Entry ID (list of entries) to be executed (comma-delimited
values or ranges of values)

Example
IF (age > 18)
RUN FIELD_SCANNING(2,3)
RUN CHARACTER_TRANSLATION(3-5)
ENDIF
In the first RUN statement of this example, the numbers in
parentheses (2,3) apply to ENTRY_ID 1 and ENTRY_ID 2 under
FIELD_SCANNING. In the second RUN statment in this example, the
numbers in parentheses (3-5) apply to ENTRY_ID 3, 4, and 5 under
CHARACTER_TRANSLATION.
Cleansing Your Data

5-24

SET

SET
The SET statement takes as arguments:
DDL field names
The equal sign assignment operator (=)
Value or field data arithmetic

Example
IF (age > 18)
SET age = processing_date birth_date
ENDIF

ELSE Statement
The ELSE statement will run certain statements if a specified
condition is False. In other words, you can use an IF/ELSE
statement to define two blocks of executable statements: one block
to run if the condition is True, the other block to run if the condition
is False.

Example
IF (age >
RUN
SET
ELSE
SET
ENDIF

18)
FIELD_SCANNING(2, 3)
age = processing_date birth_date
record_notes = "Invalid"

In this example, if (age > 18) evaluates as True, FIELD_SCANNING


(2, 3) and SET age = processing_date - birth_date are executed. If
(age > 18) evaluates as False, then statement SET record_notes =
Invalid is executed.

Creating and Working with TS Quality Projects

ELSEIF Statement

5-25

ELSEIF Statement
A variation on the IF/ELSE statement allows you to choose from
several alternatives. Adding ELSEIF clauses expands the
functionality of the statement so you can control program flow
based on different possibilities.

Example
IF (age > 21)
RUN FIELD_SCANNING(2,3)
ELSEIF (age > 18)
RUN CHARACTER_TRANSLATION(3-5)
ELSE
RUN FIELD_SCANNING(1)
ENDIF
In this example, if (age > 21) evaluates as True, FIELD_SCANNING
(2, 3) is executed. If (age > 21) evaluates as False, the ELSEIF
(age > 18) condition is performed. If ELSEIF condition (age > 18)
evaluates as True, CHARACTER_TRANSLATION (3-5) is executed. If
all conditions (age > 21) and (age > 18) evaluate as False, then the
statement RUN FIELD_SCANNING (1) is executed.
You can add as many ELSEIF statements as you need to
provide alternative choices. However, note that extensive
use of ELSEIF clauses often becomes cumbersome.

Cleansing Your Data

5-26

Operators in Conditional Statements

Operators in Conditional Statements


The following operators can be used in conditional statements:
Table 5.2 Operators in Conditional Statements
Operator

Description

ALL

Perform every defined function entry.


Example: CHARACTER_TRANSLATION (ALL)

ALWAYS

Always returns True; always performs the specified operation.


Example: IF (ALWAYS)

AND

Connects two action statements (both should be True)


Example: IF (age>18 AND gender = M)

OR

Connects two action statements (at least one should be True)


Example: IF (age>18 OR year_of_birth > 1987)

UCASE

Converts a literal value or field data to uppercase to evaluate the IF statement.


Example: IF (UCASE(last_name)=SMITH)

SET last_name=UCASE(NAME)
This example tests the field for the literal of any case combination of SMITH,
and if TRUE, it makes the string in the field uppercase.
LCASE

Converts a literal value or field data to lowercase to evaluate the IF statement.


Example: IF (LCASE(last_name)=smith)

SET last_name=LCASE(name)
This example tests the field for the literal of any case combination of smith,
and if TRUE, it makes the string in the field lowercase.
=

Is equal to

!=, <>

Is NOT equal to

>

Is greater than

<

Is less than

>=

Is greater than or equal to

<=

Is less than or equal to

Creating and Working with TS Quality Projects

Operators in Conditional Statements

5-27

Table 5.2 Operators in Conditional Statements


Operator

Description

LIKE

Links a literal with a wild card asterisk (*) in a field that is used to look for a
match. You can place the asterisk before the literal (for example, *LE) to
search for all matches to the beginning of a string, or place it after the literal
(for example LE*) to search for matched endings. You cannot place an
asterisk in the middle of a literal, however, for example L*E.
Example: IF first_name LIKE *OB

IN

Means field value is in


Example: IF house_number IN 1,2,3,4

BETWEEN

Means field value is between


Example: IF house_number BETWEEN 12,34

Sum of

Difference of

||

String concatenation

Divided by

Multiplied by

Cleansing Your Data

5-28

Operators for Asian Characters

Operators for Asian Characters


In addition to the operators in the previous section, TS Quality
supports a wide range of operators specific to Asian character data.
The following table shows the list of operators that you can use in
conditional logic statements for Asian data.

Table 5.3 Operators for Asian Characters


Name

Description

JTOKATAKANA

Transforms Hiragana characters to full-width


Katakana.

(Japan)
Example:


JTOHIRAGANA

Transforms full-width Katakana characters to


Hiragana.

(Japan)
Example:


CJKTOHALF
(China, Japan,
Korea, Taiwan)

Transforms full-width characters to their half-width


form. This operator automatically processes Katakana
accent marks (dakuten and handakuten)
appropriately.
Example:
Harte-hanks

Creating and Working with TS Quality Projects

Operators for Asian Characters

5-29

Table 5.3 Operators for Asian Characters


Name

Description

CJKTOFULL

Transforms half-width characters to their full-width


form. This operator automatically processes Katakana
accent marks (dakuten and handakuten)
appropriately.

(China, Japan,
Korea, Taiwan)

Example:
Harte-hanks
JKANATOROMAN
(Japan)

Transform Hiragana and full-width Katakana


characters to Hebon style Romaji.
Example:

jouzousho
JROMANTOKANA

Transforms Romaji (Kunrei or Hebon) characters to


full-width Katakana.

(Japan)
Example:

haatohankusu
KTOROMAN

Transforms Korean Hangul characters to their


Romanized forms.

Korea
Example:

daechidong

Cleansing Your Data

5-30

Operators for Asian Characters


Table 5.3 Operators for Asian Characters
Name

Description

HIRAGANASTOL

Transforms small size yo-on and soku-on to its large


equivalent.

(Japan)

Zenkaku

Hankaku

Example:


CTOTRADCHINESE

Transforms all Simplified Chinese characters to their


Traditional Chinese equivalent.

(China, Taiwan)
Example:


CTOSIMPCHINESE

Transforms all Traditional Chinese characters to their


Simplified Chinese equivalent.

(China, Taiwan)
Example:

CJKTOARABICNUM
(China, Japan,
Korea, Taiwan)

Transforms Chinese number symbols to their Arabic


decimal equivalents.
Example:

150
Please make sure that you are applying this
operator to the field where Chinese numbers
only represent NUMBERS. Otherwise, following
may happen. ---->

Creating and Working with TS Quality Projects

Full-width (Zenkaku) and half-width (Hankaku) Japanese Characters

5-31

Full-width (Zenkaku) and half-width (Hankaku)


Japanese Characters
The following list shows Japanese full-width and half-width
characters that can be converted using these operators.
Blank character
Romaji

ABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz

Number

0123456789

Symbol

~ ! @ # $ % ^ & * _ + ` - = { } | [ ] \ : ;
<>?,./

Katakana

Cleansing Your Data

5-32

Full-width (Zenkaku) and half-width (Hankaku) Japanese Characters

Table of Katakana/Hiragana and Their Hebon/


Kunrei Romaji Equivalents
Hira gana

Katakana Hebon

a
i
u
e
o
ka
ki
ku
ke
ko
sa
shi
su
se
so
ta
chi
tsu
te
to
na
ni
nu
ne
no
ha
hi
fu
he
ho
ma
mi
mu
me
mo
ya

Kunrei
a
i
u
e
o
ka
ki
ku
ke
ko
sa
si
su
se
so
ta
ti
tu
te
to
na
ni
nu
ne
no
ha
hi
fu
he
ho
ma
mi
mu
me
mo
ya

Creating and Working with TS Quality Projects

Hira gana

Katakana Hebon

Kunrei

ga
gi
gu
ge
go
za
ji
zu
ze
zo
da
di
du
de
do
ba
bi
bu
be
bo
pa
pi
pu
pe
po

ga
gi
gu
ge
go
za
zi
zu
ze
zo
da
di
du
de
do
ba
bi
bu
be
bo
pa
pi
pu
pe
po

sha
shu
sho
cha
chu
cho
ja
ju
jo

sya
syu
syo
tya
tyu
tyo
zya
zyu
zyo

How to Use Operators for Asian Characters

yu
yo
ra
ri
ru
re
ro
wa
n
wo

5-33

yu
yo
ra
ri
ru
re
ro
wa
n
wo

How to Use Operators for Asian Characters


Asian Pacific (APAC) operators:
JTOKATAKANA, JTOHIRAGANA, CJKTOHALF, CJKTOFULL,
JKANATOROMAN, JROMANTOKANA, KTOROMAN, HIRAGANASTOL,
CTOTRADCHINESE, CTOSIMPSCHINESE, CJKTOARABICNUM.
The following are some simple examples of conditional statements
for APAC operators.

Syntax 1
This syntax is used to convert a literal value or field data in the DDL
field.
IF [condition]
SET [DDL field name] = [Operator](DDL field name)
ENDIF

Example 1
In this example, all full-width characters in the INPUT_LINE_01 field
are converted to their half-width form.
IF (ALWAYS)
SET INPUT_LINE_01 =
ENDIF

CJKTOHALF(INPUT_LINE_01)

Cleansing Your Data

5-34

Syntax 2

Syntax 2
This syntax is used to convert a literal value or field data in the DDL
field 2 and compare it against the value in the DDL field 1 to
evaluate the IF statement.
IF [DDL field name 1] = [Operator] (DDL field name 2)
RUN [function]
ENDIF

Example 2
In this example, the program converts the Traditional Chinese
characters in the CUSTOMER_NAME field to Simplified Chinese, and
compares it against the value in the INPUT_LINE_01 field. If that
value is equal to the value in INPUT_LINE_01, it will run
FIELD_SCANNING. If the value is not equal, it will run the
TABLE_RECODING function.
IF INPUT_LINE_01 = CTOSIMPCHINESE (CUSTOMER_NAME)
RUN FIELD_SCANNING(ALL)
ELSE
RUN TABLE_RECODING(ALL)
ENDIF

Creating and Working with TS Quality Projects

Build a Conditional Statement

5-35

Build a Conditional Statement


The Conditional Statements are built in the Conditionals Logic
Builder in Advance Settings.You can specify conditional settings
for your input data or output data.
To build a Conditional Statement
1.

Click Advanced and navigate to Output, Conditionals.

2.

Click Edit Condition to open the Logic Builder window.


Notice that the default setting IF (ALWAYS), RUN FIELD_
SCANNING (ALL) has been specified. This means that the
Field Scanning function will always run for all records.

Figure 5.5 Conditionals Logic Builder

Cleansing Your Data

5-36

Build a Conditional Statement


3.

Click the
button on the upper right and select your
Qualifiers For Input Data Files from the pop-up list.

4.

Select condition encoding from the Condition Encoding


drop-down list.

5.

In the middle pane, place the cursor after RUN FIELD_


SCANNING (ALL).

6.

In the Key Words box in the lower pane, double-click RUN.


The keyword RUN is inserted into the expression at the
cursor location.

7.

In the Function box in the lower pane, double-click TABLE_


RECODING. The function TABLE_RECODING is inserted
into the expression at the cursor location.

8.

In the middle pane, place the cursor after TABLE_


RECODING and type in the opening parenthesis.

9.

In the Operators box in the lower pane, double-click ALL


and close the parentheses. Your expression should now look
like this:

10.

Click Apply and Close. In this example, the field scanning


and table recoding will always run for all records.

Creating and Working with TS Quality Projects

Select or Bypass Records

5-37

Select or Bypass Records


While the Conditionals function is applied to perform specific
operations on a record, the Select/Bypass Records function can
be used to either Select or Bypass input or output records under
certain conditions. The Select/Bypass function uses the Logic
Builder located in the Advanced Settings window.
To build a Select/Bypass Condition
You can use
Select/Bypass
Conditions for
most of the TS
Quality
modules.

1.

Click Advanced and navigate to Input or Output,


Settings.

2.

Select the Select Record Conditions or Bypass Record


Conditions tab.

3.

Click Edit Condition. The Logic Builder window displays.

Figure 5.6 Select and Bypass Condition Logic Builder


4.

In the upper pane, select Condition Encoding from the


drop-down list.

Cleansing Your Data

5-38

Additional Settings
5.

In the list of DDL Fields in the right pane, double-click a DDL


field name. This is the field to which you will apply the select/
bypass conditions.

6.

In the Operators box in the lower right pane, double-click


an operator.

7.

When you have finished, click Apply and Close.


In the example above, LINE_01<18 indicates that only
records in which the value in the DDL field Line_01 is less
than 18 will be included and selected for further processing.

You can use any of the operators for the conditional


statements to create a select/bypass definition. See
Operators in Conditional Statements on page 5-26 for
the conditional operators.

Additional Settings
You can also specify the following additional settings:
See
Transformer
in the TS
Quality
Reference
Guide for the
complete
settings
information.

To include record sequence on output


1.

Click Advanced and navigate to Output, Settings.

2.

In File Sequence Field, select the DDL field which will


receive the record sequence number.

To enable debug function


1.

Click Advanced and navigate to Additional....

2.

Select Enable Debug Output.

3.

In the Debug File text box, accept the default or specify the
path and file name of the debug file.

4.

Optionally, you can enable the File Trace Key function. If


the File Trace Key is specified (field name), the debug file
uses the value of that field when reporting.

5.

Click Advanced and navigate to Input, Settings. Specify a


DDL field name for File Trace Key.

Creating and Working with TS Quality Projects

Run the Transformer and View Results

5-39

For example, if a record has a Field Scan performed on it,


then a line is added to the debug file describing the recode.
The value of the specified field is used to identify the record
in the report. If this function is not used, each record that is
read gets a record number assigned based on the order in
which it was read.
To specify a mask file
1.

Click Advanced and navigate to Process, Settings.

2.

In the Mask File text box, specify the path and file name for
the mask file.

To count the number of records processed


1.

Click Advanced and navigate to Additional....

2.

In the Sample Count text box, specify the number that


indicates the increment sample of records to read and
attempt to process from an input data file.
This count will be written to the Process Log file. To
display the Log file, select the Results tab and
navigate to the Process Log tab after the program is
run. The default is always 1.

To specify settings file encoding


1.

Click Advanced and navigate to Additional....

2.

In Settings File Encoding, select the appropriate encoding


from the drop-down list.
See Encoding (Code Page) on page A-3 for more
information on encoding.

Run the Transformer and View Results


To run the Transformer and view results
1.

Click OK to close the Advanced settings.

Cleansing Your Data

5-40

Run the Transformer and View Results


2.

When you click


Run,
TS Quality
automatically
saves your
settings.
To save your
settings
without running
the step, click
Save.

Click Run to run the Transformer.


You can also right-click a step and select Run
Selected.

3.

Select OK.

4.

On the Results tab, view the Statistics sub-tab. Notice the


records affected by the Field Scan and Table Recode.

5.

On the Output Settings tab, use the Data Browser


view the Phone and Start_date fields.

6.

Run the inputTransformer step. On the


Results>Statistics tab, review the output statistics using
My Statistics Viewer
and the Spreadsheet Viewer
. Both viewers allow the user to print statistics for other
applications.

7.

View the fields on the Output Settings tab using the Data
Browser
to be sure the scan and recode occurred.

Creating and Working with TS Quality Projects

to

6-1

CHAPTER 6

Standardizing Your Data

Standardizing Your Data

6-2
In this chapter, you will standardize the name and address elements
using the Customer Data Parser, then standardize the non-name
and address elements using the Business Data Parser.
This chapter explains the parsing logic used to standardize data
elements. You will perform these tasks:
Specify input and output files
Define the settings for the Customer Data Parser and
Business Data Parser
Use name generation to determine how many additional
records are generated
Set line definitions for input data
Run the Customer Data Parser and Business Data Parser and
view results
For Asia-Pacific countries (China, Japan, Korea, and
Taiwan), the Customer Data Parser identifies and
standardizes the name elements only. Parsing and
standardization of address elements for those countries'
data is performed by country-specific Postal Matchers.

Creating and Working with TS Quality Projects

Using the Customer Data Parser

6-3

Using the Customer Data Parser


The Customer Data Parser (CDP) identifies freeform name and
address data. The CDP identifies elements of data from the input
data file using the data in the fields INPUT_LINE_01 through
INPUT_LINE_10.
Only the data in the fields INPUT_LINE_01 through INPUT_
LINE_10 will be parsed.
The CDP uses country-specific tables in order to verify and identify
data according to each countrys postal rules and idioms. Once the
data is identified, output is generated.
The following two field data types are output:
original data
recoded or standardized data
The parsing process is highly table-driven. This allows users to
customize name and address identification for specific business
requirements.
The CDP identifies and standardizes name and address
data elements. To parse non-name and address data
elements, such as product name, use the Business Data
Parser (BDP).
If the CDP cannot identify a piece of data in the record, an
exception is written to the exception file.This file can then be used
to add customized entries to the Parser Definitions Table.
See Chapter 7, Tuning the Parsing Rules for instructions
on analyzing exceptions and customizing the Parser
Definitions Tables.

Standardizing Your Data

6-4

Understanding Parsing Logic Flow

Understanding Parsing Logic Flow


The CDP assigns all possible attributes to the input name and
address data. Based on the attributes, the CDP identifies line types
and assigns final attributes based on known patterns. The CDPs
output includes original data as well as recoded or standardized
values. The Customer Data Parser follows this process:
1.

Assign all possible attribute(s) to the word/phrase (tokens) in


the Input Name and Address Area, such as Title-Prefix,
Given-Name1, Surname, etc.

2.

Identify line types according to attribute weights and counts.

3.

Search for known patterns and assign final word/phrase


attributes.

4.

Generate output.

Example
Assume that you have the following name and address data in an
input file:
INPUT_LINE_01

Lexington Drug

INPUT_LINE_02

Ben K Pike MD

INPUT_LINE_03

10 Lois Lane

INPUT_LINE_04

Lexington 02420

In the above example, INPUT_LINE_01 will be defined as a


BUSINESS NAME line because the word Drug has a BUSINESS
definition in the word/pattern table, and because Lexington is the
same as the city name.
INPUT_LINE_02 will be defined as a PERSONAL NAME line
based on its relation to the other lines in the input area and because
of the name definitions found on the line. (A detailed explanation of
how this particular line was processed follows this section.)

Creating and Working with TS Quality Projects

How the Customer Data Parser Identifies Business Names

6-5

INPUT_LINE_03 will be defined as a STREET line based on its


relation to other lines in the input area and because of the HSNO
and STR-TYPE attributes found on that line.
INPUT_LINE_04 will be defined as a GEOGRAPHIC line because
of the POST CODE mask found in the line, and because the
combination of POST CODE and CITY are found in the parsing city
table. In this example, the CDP will add the state abbreviation of MA
to the output record.

How the Customer Data Parser Identifies


Business Names
The Customer Data Parser uses the following criteria to
determine if a name is a business name: a line...
Contains at least one word of attribute BUSINESS
Pattern
processing
provides the final
attribute
assignment for a
line, enabling
compound
business and
personal names
to be displayed
on the same line.

Does not contain a word of an attribute of personal nature


(for example, GIVEN-NAME1 or SURNAME)
Begins with the same value as the city and is not further
qualified
Contains a word that uses an apostrophe followed by the
letter s (possessive form)
Contains an unidentified word that consists of all consonants
and is at least four characters long
Does not pass Name Pattern Validation (it will have a reject
name form, but will be stored in the PREPOS business name
field)
Contains more than one comma on the name line

CDP Parsing Process


This section details the specific processing for INPUT_LINE_02.

Standardizing Your Data

6-6

CDP Parsing Process

Step 1

Assign all possible attributes

Name

BEN

PIKE

MD

GVN-NM1

1ALPHA

ALPHA

TITLE-SUFFIX

RELATIONSHIP
Street

ALPHA

1ALPHA

TYPE

ALPHA

Geog

COUNTRY

1ALPHA

ALPHA

STATE

First, the CDP assigned all possible attributes for each components
of data in INPUT_LINE_02.

Step 2

Line type and specific attribute assignment

Name

BEN

PIKE

MD

GVN-NM1

1ALPHA

ALPHA

TITLE-SUFFIX

The CDP identified this line as a Name line because it had more
name definitions than street or geography definitions. BEN is no
longer considered a RELATIONSHIP attribute since it is not located
at the END of the name line.

Step 3

Pattern lookup and assign final word/phrase attributes

Name

BEN

PIKE

MD

GVN-NM1

1ALPHA

ALPHA

TITLE-SUFFIX

GVN-NM1

GVN-NM2

SURNAME

TITLE-SUFFIX

Once the CDP identified the line types and the attributes on those
lines, a pattern was created. The CDP then looks the pattern up in
the Parser Definitions Table. If the pattern is found, the recode
value is returned, as in this example. If the pattern is not found, the
CDP will not be able to recode the unknown attributes and it will

Creating and Working with TS Quality Projects

CDP Parsing Process

6-7

send the bad name pattern to the parsing exception file for review.
Entry from Parser Definitions Table (using allowable abbreviations):
GVN-NM1 1ALPHA ALPHA TITLE-SUFFIX PATTERN NAME
RECODE=GVN-NM1(1) GVN-NM2(1) SURNAME(1) TITLE-SUFFIX(1)

The numbers after the attributes in the recode line are


referred to as Name Numbers, indicating that the CDP
identified one person on this record.

Step 4

Generate Output
Original Input Data

Standardized Output Data

BUS-NAME:

Lexington Drug

BUS-NAME:

LEXINGTON DRUG

GVN-NAME1:

Ben

GVN-NM1:

BENJAMIN

GVN-NAME2:

GVN-NM2:

SURNAME:

Pike

SURNAME:

PIKE

TITLESUFFIX:

MD

TITLE-SUFFIX:

MD

HSNO:

10

HSNO:

10

STREETNAME:

Lois

STREET-NAME:

LOIS

STREET-TYPE:

Lane

STREET-TYPE:

LN

CITY:

Lexington

CITY-NAME:

LEXINGTON

STATE:

MA

POST CODE:

02420

STATE:
POST CODE:

02420

The CDP can identify name and address elements for many
countries, using country-specific definitions tables. The CDP
identifies up to ten lines (100 bytes each) of input Name/Address
data. It can also identify up to ten names per input record.

Standardizing Your Data

6-8

Customer Data Parser for China, Japan, Korea, and Taiwan

Customer Data Parser for China, Japan, Korea,


and Taiwan
The Customer Data Parser (CDP) identifies personal and business
names for China, Japan, Korea, and Taiwan as follows.

Step 1 - Token Identification


The first step in parsing Asian words is to isolate words and phrases
into tokens. Tokens may contain one or more characters (and/or
symbols) that are identifiable as a word or word/phrase element. If
commas or spaces are present, these are used to determine where
one token ends and another begins.

Step 2 - Parsing Definition Table Lookup


The second step is to scan each token against one or more parsing
definition tables (also known as a lookup or word/phrase and
pattern table). This process verifies which tokens are a personal or
business name. It also identifies the surname character(s) and
uncovers new tokens based on the look-up results.

How CDP Identifies Tokens for China, Korea, and


Taiwan
The first step in parsing is to isolate all words and phrases by
breaking up the input field(s) into recognizable tokens. During the
initial scan, the Parser uses commas or space characters in the
input field to determine where one token ends and the next begins.

Example - China
Input data: 22 135800
Initial token results: (1 name token)

Creating and Working with TS Quality Projects

How CDP Identifies Chinese and Korean Names

6-9

Example - Korea
Input data: , 973-2 3 , 135-280
Initial token results: (1 name token)

Example - Taiwan
Input data: 2 3 106
Initial token results: (1 name token)

How CDP Identifies Chinese and Korean Names


The Customer Data Parser (CDP) uses parsing definition tables
(also known as word/phrase and pattern tables) to identify each
name element.
After initial tokens are created, the Parser scans each token against
the appropriate parsing definition tables. During this process, all
word elements that can be further identified as part of a name, for
example, a surname and given name, are created as separate
tokens.

Example - China
Token results: 2 tokens)
Previous Results

New Results

Reasoning

Based on surname lookup

Standardizing Your Data

6-10

How CDP Identifies Names for Japan

Example - Korea
Token results: 2 tokens
Previous Results

New Results

Reasoning

Based on surname lookup

Example - Taiwan
Token results: 2 tokens
Previous Results

New Results

Reasoning

Based on surname lookup

How CDP Identifies Names for Japan


The basic functionality of the Parser consists of the following three
parsing methods.
Personal name parsing (PNP)
Business name parsing (BNP)
Personal/Business parsing (BNP_CLUE)

Personal Name Parsing (PNP)


PNP separates personal names. The Parser separates the input field
into a last name, a first name, a title and an honorific. It is assumed
that the input data contains only the name of one person, but you
can create multiple output records when you encounter multiple
personal names.
Input data:

Creating and Working with TS Quality Projects

How CDP Identifies Names for Japan

6-11

Token results: 3 tokens


Last Name

First Name

Honorific

Business Name Parsing (BNP)


BNP separates business names. The Parser separates the input field
into a business name, a business type and a branch name. You can
create a consistent business name by registering the business name
pattern to the principal business name table. See How to
Customize the Parser Definition Tables for Japan in Chapter 7.
Input data: ( )AB
Token results: 4 tokens
Business Name

Principle Name

Business
Type

Branch

AB

()

Principal business name table: B,AB, ,

BNP_CLUE (Personal/Business Name Parsing)


BNP_CLUE determines personal/business category and separates
the input record accordingly.
Input data 1:
Input data 2: ( )AB
Token results: 7 tokens
Last
Name

First
Name

Honorific

Business
Name

Principle
Name

Business
Type

Branch

AB

()

Standardizing Your Data

6-12

Zenkaku and Hankaku Parse

Zenkaku and Hankaku Parse


The Parser for Japan can take both zenkaku (full-width) and
hankaku (half-width) fields as input. The zenkaku and hankaku
input fields are specified by the Pr Inp Field Name (zenkaku) and Pr
H Inp Field Name (hankaku) settings in Advanced Input Settings.
You must have zenkaku data in the zenkaku field and hankaku data
in the hankaku field. Then you can specify whether to parse only
the zenkaku field or hankaku field, or both fields using the Field
Type Parsing Mode settings in Advanced Process Settings.

Example:
Zenkaku field

Hankaku field

INPUT_LINE_01

FURIGANA_NAME

Zenkaku/Hankaku Mixed: The Parser cannot


process the fields where Zenkaku and Hankaku data
are mixed (except for spaces).
NULL Mixed: The data with null value in the input
field cannot be processed correctly.

PREPOS
The CDP then passes a comprehensive data block called the
PREPOS (Parser Repository). The PREPOS contains fixed-fielded
character data including error codes, identification indicators, name
information, street information and geographic information. The
Output DDL determines which of these fields are returned to the
Output file.
See Appendix B of TS Quality Reference Guide for a
complete list of PREPOS fields and descriptions.
Creating and Working with TS Quality Projects

PREPOS

6-13

Example (PREPOS Fields)

Figure 6.1 Sample PREPOS Fields

Standardizing Your Data

6-14

Input and Output Settings

Input and Output Settings


The CDP uses the output from the Transformer step as its input.
Tip:
You can either
edit the file names
manually or click
the File Chooser
icon to
browse
for and
select the file.
To view the
contents of your
data file,
click the
Data
Browser
icon.
Use the
Dictionary
Editor to view the
contents
of the
DDL file.

To specify input and output files


1.

Open the Customer Data Parser step and select the Input
Settings tab.

2.

Specify a file name in the Input File Name and Input DDL
Name text boxes.

3.

Click the Dictionary Editor icon and view the input DDL.

4.

The CDP only scans the fields defined as INPUT_LINE_01


through INPUT_LINE_10. These mappings were provided
from the Project Wizard when you specified the Name and
Address data. These are reserved field names and they
represent the Input data to the CDP.

5.

Close the DDL Editor.

6.

Select the Output Settings tab.

7.

Specify a file name in the Output File Name and Output


DDL Name text boxes.

8.

Enter a file name in the Statistics File Name and Process


Log Name text boxes.

To specify the input/output file qualifiers


A File Qualifier is a unique name given to a data file. Each
input and output data file must have its own unique file
qualifier.
1.

Click Advanced and navigate to Input, Settings.

2.

Specify Input Data File Qualifier (default is INPUT).

3.

Click Advanced and navigate to Output, Settings.

4.

Specify Output Data File Qualifier (default is OUTPUT).

To check the Repository DDL File


1.

Click Advanced and navigate to Additional....

Creating and Working with TS Quality Projects

Input and Output Settings


2.

6-15

In Repository DDL File, make sure the correct repository


DDL file is specified. This DDL contains the layout of the
PREPOS fields.
Country-specific repository DDLs are provided with the
program. We recommend that you not change these
default PREPOS DDL files. See Appendix B of TS
Quality Reference Guide for a complete list of
PREPOS fields and descriptions.

You can also specify the following settings:


To specify the starting record
1.

Click Advanced and navigate to Input, Settings.

2.

Enter a numeric value in Start at Record. This specifies the


record in the input data file at which to begin processing
(default is 1).

To specify the maximum number of records to process


1.

Click Advanced and navigate to Input, Settings.

2.

Enter a numeric value in Process a Maximum of. This


specifies the maximum number of records to process. By
default, all records will be processed.

To process every nth record only


1.

Click Advanced and navigate to Input, Settings.

2.

Enter numeric value in Process Nth Sample. This specifies


that only every Nth record will be processed. By default, all
records will be processed.

To use a delimited file


If you are using a delimited file for input and/or output, you
must specify delimited settings.
1.

Click Advanced and navigate to Input, Settings.

2.

Select Input Data File Delimiter Encoding and Input


Data File Delimiter from the drop-down list.
Standardizing Your Data

6-16

Process Settings

Valid delimiters
are Tab, Space,
Semicolon,
Comma, and
Pipe. Characters
other than those
listed must be
enclosed by
quotation marks.

3.

For output, click Advanced and navigate to Output,


Settings.

4.

Select Output Data File Delimiter Encoding and Output


Data File Delimiter from the drop-down list.
See Encoding (Code Page) on page A-3 for more
information on encoding.

To specify an exceptions file


1.

Click Advanced and navigate to Process, Settings.

2.

In the Exceptions File text box, enter or specify the path


and name of the file that contains exception records.
Exception records contain data such as incorrect records or
field types.

To specify a mask file


1.

Click Advanced and navigate to Process, Settings.

2.

In the Mask File text box, enter or select the path and file
name for the mask file.

You can specify records to either Select or Bypass under


certain conditions in both input and output files. See
Select or Bypass Records on page 5-37 for instructions
on how to specify select and bypass definitions.

Process Settings
Once you have specified input and output files, you can specify the
settings used to process your data. The settings for processing are
managed in the Advanced Settings window.
The navigation pane of the Advanced Settings window contains
two tabs:
Parser
Prcustom

Creating and Working with TS Quality Projects

Parser Tables

6-17

The Parser tab contains settings for the Customer Data Parser. The
Prcustom tab is used to define settings for the Parser
Customization process. The Parser Customization process is
explained in the next chapter.
The settings for China, Japan, Korea, and Taiwan differ
slightly from other countries. Refer to the online Help or
the TS Quality Reference Guide for those countries'
settings.

Parser Tables
The Customer Data Parser uses two table files to parse the name
and address elements of the input data.
Word Pattern Definition FileDefines word patterns for a
given country. It contains standard definitions for words and
phrases (tokens), and the patterns associated with each line
type.
City Directory FileDefines state names, city names, and
postal codes for a given country.
To specify the Parser tables
1.

Click Advanced and navigate to Process, Settings.

2.

Specify the Word Pattern Definition File and City


Directory File.
Default Word Pattern Definition File (US):
\TrilliumSoftware\tsq10r5s\<project>\tables
\USCDPDEF.len
Default City Directory File (US):
\TrilliumSoftware\tsq10r5s\<project>\tables
\USCITY.len
These files are read-only. These tables and parsing city
directories carry a two letter prefix to indicate the
country: (US = United States, CA = Canada, GB =
United Kingdom, DE = Germany and so forth.)

Standardizing Your Data

6-18

Business Attribute

Business Attribute
You must specify whether to enable or disable the business
assignment function.
To specify the business attribute
1.

Click Advanced and navigate to Process, Settings.

2.

Refer to the table below and select one of these options for
Assigned Business Attribute:

Setting

Description

Automatic
Business

For any word assigned a BUSINESS attribute, the entire


line becomes BUSINESS.

Business via
Pattern

Business, possible business, business-descriptive, and


business-redefine attributes are all turned off. Business
names are generated only from patterns.

No Business
Assignment

Turns off the setting of token meanings of business


attributes and possible business attributes.

Preprocess House Number


The parser normally pre-processes house numbers before
processing street patterns. You must choose whether to preprocess
the house number.
To specify whether house numbers are preprocessed
1.

Click Advanced and navigate to Process, Settings.

2.

Refer to the table below and select one of these options for
Preprocess House Number:

Setting

Description

No Preprocessing

Disable preprocessing.

Minimum
Preprocessing

A fractional number like 1 1/2 becomes 11/2. Note


that "1 1/2" becomes a HSNO token (the fraction portion
must be 3 characters in length and include the /).

Creating and Working with TS Quality Projects

Line Definitions

6-19

Setting

Description

Maximum
Preprocessing

Option 1: A number like 1 1/2 becomes 11/2. Note


that "1 1/2" becomes a HSNO token (the fraction portion
must be 3 characters in length and include the /).
Option 2: A number like 2420-36 becomes 2420 36
(this option does not work for New York, New Jersey and
Hawaii).

Line Definitions
In this example, the input file has two names on each record. The
first is a business name, and the second is a personal name. The
first input line consists of a business name and the second line is a
personal name (contact name). This is a very common data
structure. You can pre-define these two line types to the CDP, thus
allowing the CDP to work more efficiently.
To set line definitions
1.

On the Advanced settings, navigate to Input, Settings.

2.

Select the Line Definitions tab. No Pre-definition is set by


default. This setting allows the CDP to determine the line
type.

3.

For each Input Address Line, choose one of the following


options:

Setting

Description

Name Line

Input Address Line that contains personal names

Business Name
Line

Input Address Line that contains business names

Street Line

Input Address Line that contains street components

Geography Line

Input Address Line that contains geography


components (neighborhood, city, state, county, and
postal code)

No Pre-definition

None: the Parser determines the line type (default)

Prohibit Name
Line definition

The Parser will not determine the line type

Standardizing Your Data

6-20

Generate Name Sections


4.

On Input Address Line 1, select Business Name Line


from the drop-down list. This pre-defines the line as a
business name line.

5.

Select Name Line from the drop-down list for Input


Address Line 2. This pre-defines the line as a name line.

Figure 6.2 CDP Line Definitions

Generate Name Sections


By default, the CDP is set to generate a record for each name
found. If you do not want to generate additional output records and
would like to identify all names found on the input records, you can
create an additional name section so that all names are stored in
the same record. In this case, the output DDL must be modified to
store the information from the second name identified by the CDP:
the consumer name. In this example, you will create two name
sections.
To create two name sections in the output DDL
1.

In the DDL Editor, select Tools, Parser Output DDL


Generator.

2.

In the Country box, select the appropriate country from the


drop-down list.

3.

In the Number of Name Segments box, select the number


of name sections you want to generate.

Creating and Working with TS Quality Projects

Name Generation
4.

6-21

Specify the ORIGINAL_RECORD DDL and the Output DDL


file.

Figure 6.3 Parser Output DDL Generator


5.

Select Create.

6.

Select Yes to redefine the section.

7.

Select Yes to see the update.

8.

Select File, Exit to close the DDL Editor.

Name Generation
After the Parser processes the input data, it generates name and
address records. This process is called name generation. In many
cases, one record in the input data contains multiple business or
personal names. You must specify how many records to generate
when more than one business/personal names are found in the
input data.
To define name generation settings
1.

Select Advanced and navigate to Output, Name


Generation.

2.

The right pane of the Customer


Data Parser Output Name
Generation window contains
two tabbed dialog boxes:Field Settings and Entry Settings.

Standardizing Your Data

6-22

Additional Settings
3.

Click the Field Settings tab. Refer to the following table and
specify the values for these settings.

Setting

Description

Generate Business
Records for
Additional Names

Numeric value between 0-9 that specifies how many


business records to generate when more than one
business name is present on the input (whether on the
same record or on generated name records).

Generate Personal
Records for
Additional Names

Numeric value between 0-9 that specifies how many


personal records to generate when more than one
personal name is present on the input (whether on the
same record or on generated name records).

Max Original Lines


to Generate Names
For

Numeric value that indicates the maximum number of


original lines for which to generate names. The default
is to process all records.

4.

For example, the settings below instruct the CDP not to


generate additional records for personal or business names.

Figure 6.4 CDP Field Settings

Additional Settings
You can also specify the following settings:
See Customer
Data Parser in
TS Quality
Reference
Guide for the
complete
settings
information.

To join name lines


You can join the second name line (INPUT_LINE_02) to the
first name line (INPUT_LINE_01) for re-parsing purposes.
Both lines must have a valid pattern identified for this to
work.
1.

Click Advanced and navigate to Input, Join Lines.

Creating and Working with TS Quality Projects

Additional Settings

6-23

2.

Specify the From Line Index and To Line Index. From


Line Index is the number (1 to 10) of the name line to be
joined. To Line Index is the number (1 to 10) of the name
line to which the line specified in From Line Index will be
joined.

3.

Specify the From Line Begin Value and To Line End


Value. From Line Begin Value is the character string that
is to be at the beginning of the joined line. To Line End
Value is the character string that is to be at the end of the
joined line.

4.

Select either Literal or Mask for From Line Begin Value


Format and To Line End Value Format. They are the
format for the value specified in From Line Begin Value
and To Line End Value.

To split address lines


You can split the address line before parsing. The CDP works
more efficiently if the two address lines are split into two
physical lines, rather than storing two addresses on one line.
1.

Click Advanced and navigate to Input, Split Lines.

2.

Select either First Occurrence or Last Occurrence for


Split Occurrence. First Occurrence splits on the first
occurrence of the matching From Line End Value and To
Line Begin Value. Last Occurrence splits on the last
occurrence of the matching From Line End Value and To
Line Begin Value.

3.

Specify the From Line Index and To Line Index. From


Line Index is the number (1 to 10) of the address line from
which to split. To Line Index is the number (1 to 10) of the
line where the new line will be inserted.

4.

Specify the From Line End Value and To Line Begin


Value. From Line End Value is the character string that is
to be at the end of the split line. To Line Begin Value is the
character string that is to be at the beginning of the split line.

5.

Select either Literal or Mask for From Line Begin Value


Format and To Line End Value Format. They are the

Standardizing Your Data

6-24

Additional Settings
format for the value specified in From Line End Value and
To Line Begin Value.
If an address has ten lines and a split line is perfomed,
then the last line will be dropped.
To enable debug function
1.

Click Advanced and navigate to Process, Settings.

2.

Select Enable Debug Output.

3.

In the Debug File text box, accept the default path and file
name, or specify the name of the file to which debugging
information will be sent.

To count number of records processed


1.

Click Advanced and navigate to Process, Settings.

2.

In the Sample Count text box, specify the number that


indicates the increment sample of records to read and
attempt to process from an input data file.
This count will be written to the Process Log file. To
display the Log file, select the Results tab and
navigate to the Process Log tab after the program is
run. The default is always 1.

To specify settings file encoding


1.

Click Advanced and navigate to Additional....

2.

In Settings File Encoding, select the correct encoding from


the drop-down list.
See Encoding (Code Page) on page A-3 for more
information on encoding.

Creating and Working with TS Quality Projects

Run the Customer Data Parser and View Results

6-25

Run the Customer Data Parser and View Results


To run the Customer Data Parser and view results

When you click


Run,
TS Quality
automatically
saves your
settings.
To save your
settings
without running
the step, click
Save.

1.

Click OK to close Advanced settings.

2.

Click Run to run the CDP.


You can also right-click on a step and select Run
Selected.

3.

Select OK.

4.

On the Results tab, select the Statistics sub-tab. The


Statistics file indicates the number of records read into and
out of the CDP and displays name, street and geographic
review information.

5.

Review the Statistics file. Verify that no additional names


were generated by the CDP. Be sure to use My Statistics
Viewer
and the Spreadsheet Viewer
to review the
CDP output statistics.

Analyze Results
After running the CDP, the Parser generates Completion Codes
and Review Codes to identify specific conditions which occurred
for each record being parsed. You can review those codes to
analyze the Parser results.
The completion codes are written to the CDP Repository Output
Record (PREPOS) in the following field:
pr_completion_code
The review codes are written to the CDP Repository Output Record
(PREPOS) in three character pairs in the following fields:
pr_name_review_codes
pr_street_review_codes
pr_geog_review_codes

Standardizing Your Data

6-26

Statistics File
pr_misc_review_codes
pr_global_review_codes

To change the
review group
order, Review
Group Order
(Process,
Settings) can
be used to
specify the
review group
hierarchy.

When a record receives a review code, Review Groups are also


written to the following field:
pr_rev_group
For multiple review codes, the review group is determined by a
default hierarchy table.
See Appendix B for the complete list of Completion Codes,
Review Codes and Review Group for the Customer Data
Parser.

Statistics File
The Parsing Statistics Report is generated by the CDP and
summarizes the number and percentage of records distributed over
each review group. A brief description of each review group also
appears on the statistics report.
Review Groups# of Records
0
945
1
0
2
22
3
0
4
0
5
0
6
0
7
2
8
4
9
8
10
11
11
1
12
0
13
0
14
3

%
Descriptions
94.5%No Targeted Conditions Found
0.0% Unidentified Item
2.2% Mixed Name Forms
0.0% Hold Mail
0.0% Foreign Address
0.0% No Names Identified
0.0% No Street Identified
0.2% No Geography Identified
0.4% Unknown Name Pattern
0.8% Derived Genders Conflict
1.1% More Than One Middle Name
0.1% Unknown Street Pattern
0.0% Invalid Directional
0.0% Unusual or Long Address
0.3% No City or County Identified

Figure 6.5 Sample CDP Statistics


Creating and Working with TS Quality Projects

Using the Business Data Parser

6-27

Using the Business Data Parser


For parsing of
names and
addresses, use
the Customer
Data Parser.

The Business Data Parser (BDP) uses pattern-recognition


technology to identify, verify, and standardize non-name and
address components of free-form text. The parsing process is
driven by business rules that you can customize to meet your
specific business requirements.
Use the Business Data Parser to perform several tasks:
Identify words and phrases in free-form text
Produce standardized and identified output in useful formats
Use customized user-defined attributes
Offers flexibility through an externally-edited set of tables
for business rules
Identify words and phrases by their values or their masks
Correct misspellings and enable word or phrase recodes
using external tables
Categorize any unique words and phrases using
user-defined conditional text
Identify data for review by numerous methods
Produce standard output, so that applications may easily
choose needed data elements
Display results in a log file to use for tuning business rules
Collect run statistics to quickly identify development areas
Produce a log that identifies problems to help refinement of
the external word, phrase, and pattern tables

BDP Parsing Process


The Business Data Parser parses data and identifies patterns
based on the following criteria:

Standardizing Your Data

6-28

Step 1

BDP Parsing Process

Assign all possible attributes


The BDP identifies each word and phrase and compares them to the
business rule table supplied by the Parsing Customization
process. When the BDP finds a word or phrase in the table, it
assigns the associated specific attribute for that table entry. For
example:

Attribute

1995

Toyota

Camry

YEAR

MAKE

MODEL

If a word or phrase isnt specified in the table, the BDP


assigns it an intrinsic attribute, such as ALPHA or
NUMERIC.

Step 2

Pattern lookup and assign final word/phrase attributes


The BDP looks up the entire combination of words, called a
pattern, in the pattern list.
If a match to the pattern list exists, then the BDP assigns a
final attribute to all words, based on the pattern.
If no match exists, then the BDP writes the pattern details to
the log file for further review and tuning.

Step 3

Line type and specific attribute assignment


Each line is then assigned a line type. The default line type of M
(Miscellaneous) is assigned to a line unless both of the following
conditions are true:
The line matches a pattern in the word and pattern table,
and
A line attribute for that pattern is defined.
For example:

1995

Toyota

Camry

YEAR

MAKE

MODEL

Creating and Working with TS Quality Projects

BDP Parsing Process

6-29

You can assign up to fifty user-defined attributes, named


USER-NN, where NN = a numeric value between 1-50,
inclusive.

Step 4

Generate Output
The BDP produces a comprehensive data block called the BPREPOS
(Business Data Parser Repository). The BPREPOS consists of fixedfielded character data including error codes and identification
indicators. The Output DDL determines which of these fields are
returned to the Output file, and can be customized by the user.
See Appendix B of TS Quality Reference Guide for a
complete list of BPREPOS Fields and descriptions.

Example (BPREPOS Fields)

Figure 6.6 Sample BPREPOS Fields


Standardizing Your Data

6-30

Input and Output Settings

Input and Output Settings


To specify input and output files
Tip:
You can either
edit the file names
manually or click
the File Chooser
icon to
browse
for and
select the file.
To view the
contents of your
data file,
click the
Data
Browser
icon.
Use the
Dictionary
Editor to view the
contents
of the
DDL file.

1.

Open the Business Data Parser step and select the Input
Settings tab.

2.

Specify a file name in the Input File Name and Input DDL
Name text boxes.

3.

Navigate to the Output Settings tab.

4.

Specify a file name in the Output File Name and Output


DDL Name text boxes.

5.

Enter a file name in the Statistics File Name and Process


Log Name text boxes.

To specify the input/output file qualifiers


A File Qualifier is a unique name given to a data file. Each
input and output data file must have its own unique file
qualifier.
1.

Click Advanced and navigate to Input, Settings.

2.

Specify Input Data File Qualifier (default is INPUT).

3.

Click Advanced and navigate to Output, Settings.

4.

Specify Output Data File Qualifier (default is OUTPUT).

To specify the parser field


You must specify the input DDL field that contains the data to
be parsed.
1.

Click Advanced and navigate to Input, Settings.

2.

Select Parse Field from the drop-down list.

To check Repository DDL File


1.

Click Advanced and navigate to Additional....

2.

In Repository DDL File, make sure the repository DDL file


is specified. This DDL contains the layout of the BPREPOS
fields.

Creating and Working with TS Quality Projects

Input and Output Settings

6-31

The country-specific BPREPOS DDL is provided with


the program.
You can also specify the following settings:
To specify the starting record
1.

Click Advanced and navigate to Input, Settings.

2.

Enter a numeric value in Start at Record. This specifies the


record in the input data file at which to begin processing
(default is 1).

To specify the maximum number of records to process


1.

Click Advanced and navigate to Input, Settings.

2.

Enter a numeric value in Process a Maximum of. This


specifies the maximum number of records to process. By
default, all records will be processed.

To process every nth record only

Valid delimiters
are Tab,
Space,
Semicolon,
Comma, and
Pipe.
Characters
other than
those listed
must be
enclosed by
quotation
marks.

1.

Click Advanced and navigate to Input, Settings.

2.

Enter numeric value in Process Nth Sample. This specifies


that only every Nth record will be processed. By default, all
records will be processed.

To use a delimited file


If you are using a delimited file for input and/or output, you
must specify delimited settings.
1.

Click Advanced and navigate to Input, Settings.

2.

Select Input Data File Delimiter Encoding and Input


Data File Delimiter from the drop-down list.

3.

For output, click Advanced and navigate to Output,


Settings.

4.

Select Output Data File Delimiter Encoding and Output


Data File Delimiter from the drop-down list.

Standardizing Your Data

6-32

Process Settings
See Encoding (Code Page) on page A-3 for more
information on encoding.
You can specify records to either Select or Bypass under certain
conditions in both input and output files. See Select or Bypass
Records on page 5-37 for instructions on how to specify select or
bypass definitions.

Process Settings
Once you have specified input and output files, you can specify
settings used to process your data. The settings for processing are
managed in the Advanced Settings window.
The navigation pane of the Advanced Settings window contains
two tabs:
Parser
Prcustom
Settings for the Business Data Parser are shown on the Parser tab.
The Prcustom tab contains settings for the Parser Customization
process. The Parser Customization process is explained in the next
chapter.

Parser Tables
The Business Data Parser uses the Word Pattern Definition table file
to parse the non-name and address elements of the data. The Word
Pattern Definition table for the Business Data Parser is created from
the Parsing Customization process.
For instructions on the Parsing Customization process,
see Chapter 7, Tuning the Parsing Rules and
Appendix B.
Word Pattern Definition FileDefines word patterns for a
given country. It contains definitions for words and phrases
(tokens), and the patterns associated with each line type.
These tables use a two letter prefix to indicate the
Creating and Working with TS Quality Projects

Parser Tables

6-33

country: US = United States, CA = Canada, GB =


United Kingdom, DE = Germany, and so forth.
To specify the Parser table
1.

Click Advanced and navigate to Process, Settings.

2.

Specify the Word Pattern Definition File.


Default Word Pattern Definition File (US):
\TrilliumSoftware\tsq10r5s\<project>tables
\USBDPRUL.win

Example
For example, you can create a Word Pattern Definitions table for
automobile classification. At least one definition and one pattern
entry must be present in the Word Pattern Definitions table.
Entry from Word Pattern Definitions Table

'ACURA'
INSERT MISC DEF ATT=MAKE
'ALFA'
INSERT MISC DEF ATT=MAKE,RECODE='ALFA ROMEO'
'ALFA ROMEO' INSERT MISC DEF ATT=MAKE
'AMC'
INSERT MISC DEF ATT=MAKE
'AUDI'
INSERT MISC DEF ATT=MAKE
'BERTONE'
INSERT MISC DEF ATT=MAKE
'BMW'
INSERT MISC DEF ATT=MAKE
'BUICK'
INSERT MISC DEF ATT=MAKE
'CADDY'
INSERT SYNONYM='CADILLAC'
'CADI'
INSERT SYNONYM='CADILLAC'
'CADILLAC'
INSERT MISC DEF ATT=MAKE,RECODE='CADILLAC'
'CADY'
INSERT SYNONYM='CADILLAC'
'CHEVROLET'
INSERT MISC DEF ATT=MAKE
'CHEVY'
INSERT MISC DEF ATT=MAKE,RECODE='CHEVROLET'

Figure 6.7

Sample BDP Word Pattern Definition Table

Standardizing Your Data

6-34

Additional Settings

Additional Settings
You can also specify the following settings:
See Business
Data Parser in
TS Quality
Reference
Guide for the
complete
settings
information.

To retain original values


You can specify whether you want to retain the original input
data in the Parser output field. The field must be defined as
ORIGINAL by the output DDL. If this setting is not checked,
the Parser formats the data as uppercase and removes
erroneous punctuation.
1.

Click Advanced and navigate to Process, Settings.

2.

Select Retain Original Value.

To include Unknowns in Standard Original Field


This setting controls whether unknown or undefined tokens
are populated into label lines. When checked, label lines are
populated with the complete input lines, including unknown
or undefined words/tokens. These tokens are standardized
and appear in the same left-to-right order as in the input
line.
1.

Click Advanced and navigate to Process, Settings.

2.

Select Include Unknowns in Std Original Field.

To populate unknown patterns


When checked, this setting ensures that the Parser populates
user fields with known attributes, even in the event of a
pattern failure.
1.

Click Advanced and navigate to Process, Settings.

2.

Select Populate Unknown Patterns.

To enable debug function


1.

Click Advanced and navigate to Process, Settings.

2.

Select Enable Debug Output.

Creating and Working with TS Quality Projects

Additional Settings
3.

6-35

In the Debug File text box, accept the default path and file
name, or enter a file name where debugging information will
be written.

To count number of records processed


1.

Click Advanced and navigate to Process, Settings.

2.

In the Sample Count text box, specify the number that


indicates the increment sample of records to read and
attempt to process from an input data file.
This count will be written to the Process Log file. To
display the Log file, select the Results tab and
navigate to the Process Log tab after the program is
run. The default is always 1.

To specify settings file encoding


1.

Click Advanced and navigate to Additional....

2.

In Settings File Encoding, select the correct encoding from


the drop-down list.
See Encoding (Code Page) on page A-3 for more
information on encoding.

Standardizing Your Data

6-36

Run the Business Data Parser and View Results

Run the Business Data Parser and View Results


To run the Business Data Parser and view results

When you click


Run,
TS Quality
automatically
saves your
settings.
To save your
settings
without running
the step, click
Save.

1.

Click OK to close the Advanced settings.

2.

Click Run to run the BDP.


You can also right-click on a step and select Run
Selected.

3.

Select OK.

4.

On the Results tab, select the Statistics sub-tab. The


Statistics file indicates the number of records read into and
out of the BDP. It also displays number of records that
contain blank data, other lines, and unknown lines.

5.

Review the Statistics file. Be sure to use My Statistics Viewer


and the Spreadsheet Viewer
to review the output
statistics of the BDP.

After you run the BDP, the Parser generates Completion Codes
and Review Codes to identify specific conditions which occurred
for each record being parsed. You can review those codes to
analyze the parser results.
The completion codes are written to the BDP Repository Output
Record (BPREPOS) in the following field:
bp_completion_code
The review codes are written to the BDP Repository Output Record
(BPREPOS) in three character pairs in the following fields:
bp_misc_review_codes
See Appendix B for the complete list of Completion
Codes and Review Codes for the Business Data Parser.
There are no Review Groups for the Business Data
Parser.

Creating and Working with TS Quality Projects

7-1

CHAPTER 7

Tuning the Parsing Rules

Tuning the Parsing Rules

7-2
If the Customer Data Parser cannot recognize the name or address
component such as city name or surname on a record, an exception
is reported. When that occurs, you must change the parsing rules
using Parsing Customization.To use Parsing Customization, you
must first understand how the parser definition tables work.
This chapter explains the parser definition tables. You will also
perform these tasks:
View parser exceptions
Identify and create an entry for a misspelled city name
Identify and create an entry for a bad name pattern
Review the new entries in the Customized Definitions file
Run Parsing Customization and re-run the Customer Data
Parser
Check errors in the Parsing Customization process
This chapter focuses on the Parsing Customization process
for the Customer Data Parser. See Online Help to tune the
parsing rules for the Business Data Parser.

Creating and Working with TS Quality Projects

Understanding the Parser Definitions Tables

7-3

Understanding the Parser Definitions


Tables
Standard and User Definitions Tables
For the
Business Data
Parser, the
default
standard
definitions
table is empty.
You must
create a table
for the
Business Data
Parser to run.

The Parser Definitions tables contain both definitions and word/


phrase pattern information.These files are used by the Parser to
identify the components of the input data.
Standard Definitions Table
Standard Definitions tables include all standard definitions
for titles, first names, business names, street components
(type and direction) as well as patterns and masks for other
name and address components. They are supplied with the
program.
Default Standard Definitions Table:
\TrilliumSoftware\tsq10r5s\tables
\parser_rules\xxCDPRUL.win
(xx = 2-digit country code)
Standard Definitions tables are identified by a two letter
prefix to indicate the country. (Example: US = United
States, CA = Canada, GB = Great Britain and DE =
Germany)
Customized Definitions Table
Customized Definitions tables contain user-created
definitions.
Default Customized Definitions Table:
\TrilliumSoftware\tsq10r5s\<project>\tables
\xxUSERCDP.win (CDP, xx = 2-digit country code)
\xxBDPRUL.win (BDP, xx = 2-digit country code)

Tuning the Parsing Rules

7-4

Syntax of Definitions

Syntax of Definitions
Entries in Standard and User Definitions tables require a special
syntax.This section describes the syntax for definition entries.

Syntax
TOKEN [OPERATION] LINE-TYPE [POSITION] KEYWORD=ATTRIBUTE, [ATTRIBUTE MODIFIER]

An entry is composed of Token, Operations, Line-type, Position,


Attributes and Attribute Modifiers. Brackets [] indicate the
enclosed item is optional. The brackets [] are NOT typed on an
actual entry line.

Example
MARY

INS NAME

BEG ATT=GVN_NM1, GEN=F

MARY

INS

NAME

BEG

ATT=GVN_NM1

GEN=F

Token

Operation

Line Type

Position

Keyword=Attribute

Attribute
Modifier

Token
A token is any word or phrase in the data, or a mask of any word, or
phrase. Tokens are informally called the left side of the equation
in a definition table entry. In this example, the token is Mary.
MARY
Token entries
can be no
more than 100
characters in
length.

INS NAME BEG ATT=GVN-NM1,GEN=F

Tokens cannot wrap to a second line. This also affects


word and phrase definitions, masks, and pattern entries.
The Parser identifies four different types of token structures:
Token

Creating and Working with TS Quality Projects

Token

7-5

Sub-token
Phrase
Mask
The table below describes the token structures and provides
examples.
Table 7.1 Parser Token Structures
Token
The smallest entity that has a meaning by itself.
A token may or may not contain one or more sub-tokens.
Example: 'PIZZA' NAME ATT=BUS
Sub-token
String entity that has a meaning within a token (e.g. strasse;). A sub-token may appear at the
beginning or end of the token.
If your data contained BERGENSTRASSE:
Example: STRASSE STREET END-TKN ATT=STR-TYPE
Where:
STREET
the line type
END-TKN
location of the sub-token within the word (also indicates this is a sub-token)
ATT=STR-TYPE
the attribute assignment for table lookup
Beginning-Token (BEG-TKN)
Used for the sub-token position. This keyword indicates that the sub-token position lies at the
beginning of a token.
Example: STRASSE STREET BEG-TKN ATT=STR-TYPE
Ending-Token (END-TKN)
Used for the sub-token position. This keyword indicates that the sub-token position lies at the
end of a token.
Example: STRASSE STREET END-TKN ATT=STR-TYPE
BEG-TKN and END-TKN are only allowed on street lines. See line types in the following
section for more information.
Phrase
One or more tokens grouped together that have a meaning.
Example: 'HOLD MAIL' STREET ATT=HOLD

Tuning the Parsing Rules

7-6

Operations
Table 7.1 Parser Token Structures

Mask
A mask is a description of a word or phrase, using alpha, numeric or special characters to
represent letters, numbers, and special characters. Masks define characters of data elements
using:
n to represent a number (0 -9)
a -z to represent alphabetic letters (lowercase only)
Every character that is not a letter or number is represented by the character itself:
/ (forward slash), @ (at symbol), and so forth.
For example, a mask can define any series of five numerals as a ZIP code, instead of entering
each of the 99,999 possible combinations in the table. This mask entry looks like:
nnnnn

MASK GEOG DEF ATT=POSTCODE

Masks may include special characters if they are part of the word representation. For example,
a mask for a nine-digit ZIP code is:
nnnnn-nnnn
MASK GEOG DEF ATT=POSTCODE
Appendix D in the TS Quality Reference Guide lists the valid token tags for Asia-Pacific
countries.

Operations
The Parser identifies three types of operations:
Insert
Modify
Delete
In this example, the operation is INS (INSERT).
MARY

INS NAME BEG ATT=GVN-NM1,GEN=F

Creating and Working with TS Quality Projects

Operations

7-7

Underlined letters indicate allowable abbreviations.


Table 7.2 Parser Operations
INSERT
This operation inserts an entry in a table.
MARY
INSERT NAME BEG ATT=GVN-NM1,GEN=F
If omitted, INSERT is assumed by default.
MODIFY
This operation replaces an existing entry in a table. The original entry is deleted and the
modified entry is inserted.
Example:
MARY MODIFY NAME BEG ATT=GVN-NM1,GEN=F
Modify is used to change definitions in the standard definition table by creating the
entry in the user definitions file. The Parsing Customization process will combine
entries from the two tables into one output (to be used by the Parser).
DELETE
This operation deletes an entry from a table.
Deleting Definitions:
Example:
MARY

DELETE

Deleting Synonyms: With the SYNONYM keyword, you must enter the actual synonym:
Example:
BV
DELETE SYNONYM=BOULEVARD
Deleting Patterns: You must enter the actual pattern followed by DELETE PATTERN.
Example:
GVN-NM1 1ALPHA ALPHA
DELETE PATTERN

Tuning the Parsing Rules

7-8

Line Types

Line Types
Each definition entry requires a line type assignment. The Parser
identifies four types of lines:
Name
Note that
attributes do
not cross line
types. For
instance, an
attribute of
GVN-NM1
cannot be
used with a
line type of
STREET.

Street
Geography
Micellaneous
In this example, the line type is NAME.
MARY INS NAME BEG ATT=GVN-NM1,GEN=F

Underlined letters indicate allowable abbreviations.

Line Type

Description

NAME

Name of a person or business. Names are usually the first one or two
lines in an address record.
BOOKSTORE

STREET

All descriptions of streets and numeric addressing, including box


numbers, rural routes, and apartment numbers. A street line is usually
in the middle of a record, and may be one or more lines.
LANE

GEOGRAPHY

NAME DEF ATT=BUS,CAT=S5942

STREET END ATT=STR-TYPE,REC=LN

The city, state, postal code, and country in the address. Geography
line(s) are usually at the end of an address record.
MASSACHUSETTS GEOG END ATT=STATE,REC=MA

MISCELLANEOUS

Information that does not fit into the other line types, such as account
name or a comment.
HOLD MAIL

INSERT MISC DEF ATT=HOLD

Creating and Working with TS Quality Projects

Positions

7-9

Positions
A token may be defined in relation to its position within the name or
address line. There are three types of positions:
Beginning
Ending
Default
In this example, the position is BEG (BEGINNING).
MARY INS NAME BEG ATT=GVN-NM1,GEN=F

Underlined letters indicate allowable abbreviations.


BEGINNING
This includes the first word in a line, any word that follows a title, or any words that appear
before a first name, including the first name. For example, consider the line:
MR JOSEPH SMITH
Every word except Smith is considered to be at the beginning of the line.
DEFAULT

(optional)

When the physical location of the word in the line is irrelevant, Default is used.
A default word may appear anywhere on the line, including the beginning or end. If this
keyword is omitted from the entry, Default is assumed.
ENDING
The last word and any further non-alphabetic characters are the ending of a line. For
example, consider the line
BRIARWOOD ESTATES APT 3
Both APT and the apartment number 3 are considered to be at the end of the line.

Tuning the Parsing Rules

7-10

Attributes

Attributes
Attributes (ATT=) are line-specific definitions and assign a specific
meaning to a word or mask shape. The following table lists available
attributes organized by line type.
For the complete list of Attributes, see Appendix D in the
TS Quality Reference Guide.
Note that
attributes do
not cross line
types. For
instance, an
attribute of
GVN-NM1
cannot be
used with a
line type of
STREET.

UserDefined
Attributes

Attribute

Description

Name Line Attributes Attributes used in NAME lines of patterns


Street Line
Attributes

Attributes used in STREET lines of patterns.

Geography Line
Attributes

Attributes used in GEOGRAPHY lines of


patterns.

Miscellaneous Line
Attributes

Attributes used in MISCELLANEOUS lines of


patterns.

If a particular word or phrase does not meet any of the pre-defined


attributes, you may assign it a user-defined attribute. For example:
n-nn-nan

MASK NAME DEF ATT=USER1

Once a user defined attribute is assigned in the User Definitions


table, the corresponding field name must be included in the CDP
output DDL. For instance, if a USER1 attribute is assigned a value in
the User Definitions table, the field name PR_USER_FIELD_01 must
be added to the CDP output DDL.

Attribute Modifiers
Attributes can be further described by various Attribute Modifiers.
The following section lists all definition modifiers that can be used
after the attribute assignment. All modifiers must be separated
from the attribute by a comma. Valid attribute modifiers are
Gender, Category, Function and Recode.

Creating and Working with TS Quality Projects

Attribute Modifiers

Gender

7-11

The Gender (GEN=) keyword assigns a gender to a name


component. It applies only to definitions for name lines and is
required if the attribute used is GVN-NM1, 2, 3, or 4.
Valid gender codes:
M = Male
F = Female
N = Neuter (gender unknown)
MARY

Category

NAME BEG ATT=GVN-NM1,GEN=F

The Category (CAT=) keyword is a user-defined, free-form means


of categorizing data elements. Categories should be limited to six
characters (based on assigning multiple categories throughout a
record) with a maximum of 50 bytes per record for all categories.
A category can be any value that may prove useful as a group
during Parsing of name and address components. For example,
assigning SIC codes to company names allows the distribution of
customer business verticals to be analyzed after the parsing
process is complete.
BOY SCOUTS

Function

NAME DEF ATT=BUS,CAT=S8641

The Function (FUNC=) keyword is used when special functions


should be performed on the entry. This keyword specifies a certain
subroutine, and the functions of that subroutine act on the entry.
BOY SCOUTS

NAME DEF ATT=BUS,FUNC=BES01

There are Special Functions used with the FUNCTION


keyword. See Appendix D in the TS Quality Reference
Guide.

Tuning the Parsing Rules

7-12

Synonym

Recode

The Recode (REC=) keyword is used to recode the value. The


value assigned after REC= is the value the Parser will assign to the
recode output field when the defined word is encountered on input.
ROAD

STREET END ATT=STR-TYPE,REC=RD

In the above example, the parser recodes the word ROAD to RD.
ROAD would be the value stored in the original data field
on Parser output. This is the pr_street_type1_original
field in the Parser repository.
RD would be the value stored in the recoded data field on
Parser output. This is the pr_street_type1_recoded field
in the Parser repository.

Recode for
Masks

Masks may be used to introduce and/or exclude literals and special


characters in their recodes. For example, a mask for a telephone
number is entered in this manner:
nnn nnn-nnnn MASK MISC DEF ATT=IGN,REC=(nnn)nnnnnnn
This entry would recode the entry 978 663-9955 to (978) 6639955.

Synonym
A synonym is a shortcut for defining a token entry with the same
value as a prior entry. For example:
PBOX

SYNONYM=PO BOX

This entry identifies PBOX as a synonym of PO BOX.


Synonyms are used to correct common spelling errors. The two
fields in the Parser affected by synonym entries in the definitions
table are called the original and recoded output fields.
Creating and Working with TS Quality Projects

Synonym

7-13

It is important to understand the behavior of synonym entries in


conjunction with the recode entry of the resulting definition in
Parser output. See the example below.

Example
The definitions table contains the following entry:
'CENTRE COMMERCIAL'

STREET DEF ATTRIBUTE=TYPE,


REC=CCAL

The Parser knows this entry is a TYPE, with a recode value of


CCAL.
Parser puts CENTRE COMMERCIAL in the original output
field pr_street_type1_original and
puts recoded value CCAL in recoded output field pr_
street_type1_recoded

Now a synonym entry is added to use the original definition entry:


'CENTRE COMMERC'

SYNONYM=CENTRE COMMERCIAL

The Parser knows that this is a synonym for CENTRE


COMMERCIAL. It places CENTRE COMMERCIAL (NOT CENTRE
COMMERC) in the original data output field, and places the recoded
value of CCAL in the recoded data output field. This ensures that
you have the correct spelling in the entry.
Manage this behavior through the Retain Original Data settings
(click Advanced, Process, Settings in the Customer Data Parser
step). If this setting contains a value of 1, the original data output
field would contain the original value (not the synonym value as
shown above). See page 6-32 for information on Retain Original
Data settings.

Tuning the Parsing Rules

7-14

Special Entries

Special Entries
In addition to the basic syntax described in the previous sections,
the Parser uses some special entries. This section explains special
entries including:
US city name changes
Non-US city name changes
Multiple definitions for one entry
Patterns

US City Name Changes


City name change entries are entered with an underscore (_) as
the last character of the entry. This notifies the Parser that this is a
city-change, and tells the program to look up the recoded entry in
the City Directory Table.
This directory is used for city verification and correction, and is
based on a primary geography, secondary geography lookup (such
as state or city).

Example
MABEVERLEY_ GEOG DEF ATT=CITY-CHG,
REC=MABEVERLY
CASAN FRAN_

GEOG DEF ATT=CITY-CHG,


REC=CASAN FRANCISCO

Non-US City Name Changes


Countries other than the US use another level of city-name
changes. It allows for additional city verification and correction
based on a complex City Directory Table. An underscore is required

Creating and Working with TS Quality Projects

Multiple definitions for one entry

7-15

as the last character of the entry.


Level

Description

Post
Town

In this example, the program looks up Cheltenham as a valid post town in


Gloucestershire county. Note that the recode contains only the corrected
spelling of the post town.
GLOUCESTERSHIRE CHELTENHAN_ GEOG DEF ATT=CITY-CHG,
REC=CHELTENHAM

Locality

In this example, the program looks up Gotherington as a valid locality in the


post town of Cheltenham. Note that the recode contains only the
corrected spelling of the locality.
CHELTENHAM GOTHERINGTEN_

GEOG DEF ATT=CITY-CHG,


REC=GOTHERINGTON

Multiple definitions for one entry


Occasionally, an entry may contain multiple meanings. This is often
the case when a word has a meaning for more than one line type.
The first definition is entered in the standard way previously
described. Subsequent definitions must be INDENTED under the
initial operational value.
CENTER
NAME DEF ATT=BUS
STREET END ATT=SEC-TYPE,REC=CTR
GEOG DEF REC=CENTER
Note that for Geography definitions, tokens are allowed
without an attribute.

Patterns
A pattern consists of attributes and/or intrinsic attributes, which
include any alpha, numeric, or special character representation of a
data element.
Changes can be made to an existing pattern by adding another tag
to the first line, using the MODIFY operation.

Tuning the Parsing Rules

7-16

Patterns
'ALPHA ALPHA' MODIFY PATTERN NAME
REC=GVN-NM1(1) SRNM(1)
See MODIFY on page 7-7 for details.
Token identification is converted into meaningful information
through pattern processing. Patterns are created in the same text
file as the Definition entries. The Parser understands the difference
between a definition and a pattern and processes each
appropriately. Because of this, it is not necessary to create the
various entries in any particular order. For organizational purposes,
however, it makes sense to organize the entries by type.

Pattern
Structure

The pattern structure uses one or two lines, using the following
structure.

FIRST LINE:
Inbound combination of
tokens

This is the combination of attributes the Parser program will


attempt to find in the table. If the exact match of attribute
combination is found, the program changes the attribute
values on output to match the values defined in the RECODE
portion of the pattern (see following information on
RECODE).
In this example, two words containing letters only are
present, such as two names. The actual data entry could be
John Smith and the required association to the pattern
would be:
John
ALPHA

Smith
ALPHA

Here, both words are identified as ALPHA attributes. See the


section Intrinsic Attributes for more information. This
portion of the entry must be enclosed in single quotes:
ALPHA ALPHA
Keyword indicating this is a pattern

ALPHA ALPHA PATTERN NAME

Creating and Working with TS Quality Projects

Intrinsic Attributes
Keyword indicating to which line
type this pattern entry applies

7-17

ALPHA ALPHA PATTERN NAME

Valid line type keywords


NAME
STREET
MISC
SECOND LINE: (Optional: both sets of elements can be on one line.)
The recode keyword followed by an
= symbol

The attribute values that follow this keyword redefine the


tokens from their inbound values.
REC=GVN-NM1(1) SRNM(1)

The outbound pattern recode


values

This is the combination of attribute values the Parser will use


on output for the data provided on this line. The values that
follow the recode value must be enclosed in single quotes.
Name lines require the name number following each attribute
name. Please see the section on Constructing Name
Patterns for details.
REC=GVN-NM1(1) SRNM(1)

Intrinsic Attributes
An intrinsic attribute is one that represents an individual entity
that did not have a definition entry in the table. This table lists the
main intrinsic attributes used for patterns.
For the complete list of Intrinsic Attributes, see Appendix D
in the TS Quality Reference Guide. .
Only the inbound portion of the pattern entry may contain
intrinsic attributes. All outbound portions (recode line)
must contain only non-intrinsic attribute values.
INTRINSIC
ATTRIBUTE

ABBR.

DESCRIPTION

ALPHA

Letters only

HYPHEN

A hyphen ()

Tuning the Parsing Rules

7-18

Controlling meanings when a sub-token is present

INTRINSIC
ATTRIBUTE
NUMERIC

ABBR.

DESCRIPTION

Numerals only

Controlling meanings when a sub-token is


present
Assume your data contains BERGENSTRASSE 12. A definition
entry might exist in this format:
STRASSE

STREET ENDING-TOKEN ATT=STR-TYPE-S

The following pattern is required in order to separate the subtoken


from the word:
ALPHA STR-TYPE-S
NUMERIC PATTERN STREET
REC=STR-NM STR-TYPE HSNO
Or, the following pattern is required in order to keep the subtoken
attached:
ALPHA

NUMERIC PATTERN STREET


REC=STR-NM HSNO

Assigning a Line Type Through a Pattern


Line Type
A

Apartment or house name lines can be set to line type A by the


Parser to represent an apartment line. This allows separate storage
of street components in the Parser output, such as street name,
house number, and apartment or house name information. Do this

Creating and Working with TS Quality Projects

Constructing Patterns for Name Lines

7-19

simply by adding ATT=APT on any street pattern definition, as in:


'ALPHA COMPLEX-TYPE ALPHA-1NUMERIC' PATTERN STREET ATT=APT
REC='COMPLEX-NAME COMPLEX-TYPE APT-NUM'

If the above pattern had been entered as just a street pattern


(without using the APT attribute) then the following would have
occurred:
Original data:
HAWTHORNE COTTAGE
B1F
10 MAIN STREET

Original street only


pattern:
(Z) HAWTHORNE COTTAGE
B1F
(S) 10 MAIN STREET

New pattern:
(A) HAWTHORNE COTTAGE
B1F
(S) 10 MAIN STREET

Where the Z line sets all data to IGNORE attributes and no


individual storage of the tokens occurs, the A line identifies the
tokens properly and parses them into the appropriate Parser output
fields:
pr_dwelling1_number
pr_complex1_name_recoded
pr_complex1_type_recoded

Constructing Patterns for Name Lines


Unlike street patterns that simply convert an inbound attribute
combination to another version on output, name patterns perform
an additional function. They often contain multiple individual names
on the same line. In some cases, only one last name may have
been given along with three first names, and it is implied that the
last name should be associated with all three first names.
One of the powerful features of the parsing engine uses parsing
customization pattern structures to understand these relationships.
Assume you have this record:
JOHN SMITH & MARY & ROBERT
Tuning the Parsing Rules

7-20

Constructing Patterns for Name Lines


There are three individuals given but only one last name. In order
to ensure that each first name receives a last name on output, a
pattern can be constructed to perform this association:
GVN-NM1 ALPHA CTR GVN-NM1 CTR GVN-NM1 PATTERN NAME
REC=GVN-NM1(1) LAST(123) CTR(2) GVN-NM1(2) CTR(3)
GVN-NM1(3)

The numbers in the parentheses following each attribute value in


the recode line indicate the physical name to which that particular
token value is associated. For the last name attribute, the values in
parentheses indicate that this token is associated with all three
individuals.

Creating and Working with TS Quality Projects

Conventions in Parsing Customization

7-21

Conventions in Parsing Customization


This section lists elements the user needs to be aware of in order to
ensure that the Parsing Customization process functions properly.

Comment
Lines

Comment lines are specified in entries in two different ways:


using an asterisk (*) in column 1
AARON NAME BEG ATT=GVN-NM1,GEN=M
* Gender is required with a GVN-NM1 attribute.
using a double forward slash (//) on the same line as the
entry.
AARON NAME BEG ATT=GVN-NM1,GEN=M // Gender is required
with a GVN-NM1 attribute.
Everything following the // will be ignored.
There must be a space after the double forward slash for
the comment to be valid.
Comments may only contain alpha-numeric characters.

Line
Lengths

Table entries longer than one line may span multiple lines. Each
additional line within each entry must be indented. Each new entry
must begin in column 1.
The maximum line length for entries is 189 characters, including the
newline character. The entry definition length may not exceed 100
characters. Components of an entry may not exceed more than one
line.

Quotation
Marks

Entries enclosed by single quotation marks are processed as one


entity. If you wish to include a single quotation mark within a
SYMBOL or VALUE, use double quotation marks.
Double quotation marks () specified within a SYMBOL or within a
Tuning the Parsing Rules

7-22

Conventions in Parsing Customization


VALUE are converted to single quotation marks() by the system.
For example:
OBRIENNAME END ATT=SRNM
If a recode string contains more than one word, the entire string
must be entered in single quotes.
AS TRUSTEE FOR

SYNONYM=TRUSTEE FOR

MEBAR HARBER_ GEOG DEF ATT=CITY-CHG,


REC=MEBAR HARBOR

Creating and Working with TS Quality Projects

How to Customize the Parser Definition Tables for Japan

7-23

How to Customize the Parser Definition


Tables for Japan
For Japan, special Parser Definition tables are used by the Parser in
addition to the built-in personal and business name dictionaries.
There are two types of tables (Clue Tables and Name Tables) and
they are stored in the ..\tables\aptables\ directory. If the Customer
Data Parser cannot recognize the name component on a record, you
can create an entry to those tables.

Clue Table
The Clue table (jp_clue.txt) is used to store keywords that the
Parser uses to separate input text into tokens and to determine
business/personal classification. You can customize this table. The
following types of keywords are included in the Clue table.
Table 7.3 Tokens for jp_clue.txt
Token Type

Item

Description

Business Keyword
T

Business Type

Words to describe business type.


Ex. ,.

Business Name

Parse as business name if this token is


found at the beginning of the string
(excluding business type).

Business Name Suffix

Words such as , .

Branch Name

It can be a branch name by itself.


Ex. , .

Branch Name Suffix

Usually this token is merged into the


previous token and constitutes a branch
name.
Ex. , .

Tuning the Parsing Rules

7-24

Clue Table
Table 7.3 Tokens for jp_clue.txt

Business Keyword

Words that can be part of business


name or branch name.
Ex. , .

Honorific

Words for honorific.


Ex. ,

Title (position)

Words for Title.


Ex. ,

Region

Ex. ,

Format
The table consists of the following 4 items. The delimiter for each
item is a comma. If the format is not correct, that line will be
ignored and the subsequent lines will not be recognized properly.
Table 7.4 Format for jp_clue.txt
Position

Item

(Not set) NULL

Token type

Not allowed

Zenkaku field

Allowed

Hankaku field

Allowed

User comment

Allowed

Creating and Working with TS Quality Projects

Clue Table

7-25

Example:
D, , , user comment
T,( ),( )
If the user comment is null, the comma between the third
item and the fourth item can be omitted.
Input

After token separation

Output

Business
type
(T)

Unknown
word

Branch
name
(D)

Business
type
Field

Business name
Field

Branch name
Field

In this case, in the input text


matches one of business type keywords (T type), and
matches one of branch name keywords (D type), therefore the
token type for the each word was determined.
In the final output, the unknown word was
recognized as business name and each word was written out in the
proper output field.
If there are unregistered keyword found, the users can add that
word to this table.
Duplicate words: when you register the new keyword,
try not to register duplicate words in different type.
Character Code: use CP932 for registration.
Words with Spaces: The only keyword that can include
Tuning the Parsing Rules

7-26

Name Tables
spaces is N type. If you register the keyword that includes
spaces, delete all spaces before and after the entry and
change all spaces within the entry to one hankaku space.
Ex. N, ,Hart Hanks

Name Tables
Name tables contain additional personal and business names that
are not included in the personal and business name
dictionaries.They also include principal business names. You can
customize these tables.
Table 7.5

List of Name Tables

File

Description

jp_bnp_name.txt

Contains business principal name patterns and business type standard patterns (for zenkaku field)

jp_bnp_name_h.txt

Contains business principal name patterns and business type standard patterns (for hankaku field)

jp_pnp_name.txt

Contains last and first names that are


not in the name dictionary (initial status
is blank)

Creating and Working with TS Quality Projects

jp_bnp_name.txt

7-27

jp_bnp_name.txt
This table is used to register principal business names and principal
business type for zenkaku field. It is not used to separate business
name and business type.
Table 7.6 Token for jp_bnp_name.txt
Token Type

Item

Description

Business name

This is used to obtain principal business


name for the business name field, and
write out the principal business name in
the output field.

Business type

This is used to obtain principal business


type for the business type field, and
write out the principal business type in
the output field.

Format
The table consists of the following 4 items. The delimiter for each
item is comma.
Table 7.7 Format for jp_bnp_name.txt
Position

Item

Not set (NULL)

Token type

Not allowed

Business name

Not allowed

Principal name

Not allowed

User comment

Allowed

Tuning the Parsing Rules

7-28

jp_bnp_name_h.txt

Example:
B,JR ,
T,, ,
If the user comment is null, the comma between the third
item and fourth item can be omitted.
Input

JR

Output

Business
type
Field

Principal
business type
Field

JR

Business
name
Field

Principal business
name
Field

Branch
name
Field

By standardizing the business data using this table, you can achieve
more accurate matching.
Duplicate words: when you register the new keyword,
try not to register duplicate words in different type.
Character Code: use CP932 for registration.

jp_bnp_name_h.txt
This table is used to register principal business names and principal
business type for hankaku field. It is not used to separate business
name and business type. The usage and function of this table is as
same as jp_bnp_name.txt except that the field for this table is
Kana.

Creating and Working with TS Quality Projects

jp_pnp_name.txt

7-29

jp_pnp_name.txt
This table is used to register additional personal names. If you
found last names or first names that are not in the personal name
dictionary, you can add them in this table.
Table 7.8 Token for jp_pnp_name. txt
Token Type

Item

Description

Last name

Register additional last names.


Reading of Kanji name can be
registered.

First name

Register additional first names.


Reading of Kanji name can be
registered.

Format
This table consists of the following 5 items. The delimiter for each
item is comma.
Table 7.9 Format for jp_pnp_name.txt
Position

Item

Not set (NULL)

Token type

Not allowed

Last name or First name (zenkaku).

Allowed

Last name in Kana or First


name in Kana (hankaku)

Allowed
When parsing zenkaku first/last
name, use this field as reading
of the name.

Not used

Allowed

User comment

Allowed

Tuning the Parsing Rules

7-30

jp_pnp_name.txt

Example:
F, , ,,user comment
Duplicate words: when you register the new keyword,
try not to register duplicate words in different type.
Character Code: use CP932 for registration.
Words with Spaces: for integration purposes, small
characters must be converted to large characters when
adding hankaku kana last and first names.

Creating and Working with TS Quality Projects

Using the Parser Customization Editor

7-31

Using the Parser Customization Editor


Parsing Customization is the process of creating entries for words
and phrases in the Customized Definitions Table. Those entries are
created using the Parser Customization Editor. After the new
entries are created and saved, you must re-run the Customer Data
Parser to apply the new parsing rules.

View a Standard Definitions Table

For detailed
information
on the Parser
Customization
Editor, see the
Online Help.

Before making entries to the Customized Definitions Table, take a


look at the Standard Definitions Table to see how it is constructed.
Standard Definitions Table vary from country to country.
To view a standard definitions table
1.

Open the Customer Data Parser step and click


Customization Editor. The Parsing Customization
Editor opens.

Customization
Editor button

Figure 7.1 Opening Customization Editor


2.

Select File, Open Standard Definitions.

Tuning the Parsing Rules

7-32

View a Standard Definitions Table


3.

Locate the Standard Definitions Table in the Open dialogue


box: for example, c:\TrilliumSoftware\tsq10r5s\tables
\parser_rules\USCDPRUL.win.

4.

Click Open. The Standard Definitions table for the selected


country appears.

Figure 7.2 Standard Definitions Table (US)


5.

From the Main Menu select Search, Find Entry to review the
entries in this file.

6.

Select File, Exit to leave the Customization Editor.

Creating and Working with TS Quality Projects

View and Correct City Problems

7-33

View and Correct City Problems


City problems are reported to the exceptions file any time a US city/
state combination cannot be verified. The usual cause is a
misspelled city name.
To view and correct city problems
1.

Open the Customer Data Parser step and click


Customization Editor. The Parsing Customization
Editor opens. When the Customization Editor opens, the
country-specific Customized Definitions file and the country
specific Word/Pattern Problems file will also open.

2.

The left window of the Customization Editor contains a


Navigation area which allows the user to move from
Customized Definitions to specific Word/Pattern Problems.
Click Customized Definitions in the Navigation area. The
screen will show the current customized definitions file,
which is empty by default.

3.

Click below the line of asterisks. This will position your cursor
to enter customized definitions.
Be sure to position the cursor below the line of
asterisks before applying an entry.

Tuning the Parsing Rules

7-34

View and Correct City Problems

Navigation Area
Cursor position
Cursor Position

Figure 7.3 Parsing Customization Editor


4.

Click US City Problems in the Navigation area. The screen


will display city problems found in the US data.

Figure 7.4 US City Problems


The Frequency column lists the number of times this city
problem occurred, followed by the percentage of total

Creating and Working with TS Quality Projects

View and Correct City Problems

7-35

occurrences this entry represents. Zip, State and City data


is listed as it appears on the input record.
In this example, the cities FAIRBANK and BAR HARBOR
are misspelled. The right side window displays the record
number(s) for the selection.
5.

Right-click FAIRBANK to start the new entry process. The


cursor appears in the Input Correct City Name box.

Cursor position

Figure 7.5 New Entry Box


6.

Enter the correct spelling of this city as FAIRBANKS. Click


OK. The two letter state abbreviation followed by the
corrected city name will appear after the RECODE = in the
New Entry box.

Figure 7.6 Input Correct City Name


Tuning the Parsing Rules

7-36

View and Correct City Problems


7.

Click Apply. The entry will be added to the Customized


Definitions file wherever the cursor is positioned in the
Customized Definitions file.

8.

In the Navigation area, click Customized Definitions to


view the new entry. The entry would look like this:

Figure 7.7 US City Entry in the Customized


Definitions File

If you
accidentally
hit the Apply
button or if
an entry is
incorrect, you
can modify or
delete entries
directly in the
Customized
Definitions
file.

9.

In the Navigation area, click US City Problems. Repeat the


correction steps for the city BAR HARBER.

10.

Click Apply. In the Navigation area, click on Customized


Definitions to view the new entry.

Figure 7.8 Multiple US City Entries in the


Customized Definitions File
Creating and Working with TS Quality Projects

View and Correct Pattern Problems

7-37

View and Correct Pattern Problems


Bad name patterns occur when data that the CDP cannot identify
appears on a name line. Any pattern of data that cannot be
completely identified is written to the exceptions file for review.
To view and correct pattern problems
1.

In the Navigation area, click Bad Name Patterns.

2.

Click on the first Bad Name Pattern. The data appearing in


the lower portion of the screen corresponds to the bad
pattern selected. If the Frequency for the pattern is 2 then
the data for the two corresponding records is displayed in the
Pattern Examples window.

Unknown attribute

Figure 7.9 Bad Name Patterns


3.

To correct the Bad Name Pattern you must change any


unknown attributes to a known attribute. Unknown attributes
are displayed in red.

4.

To change the unknown attribute, right-click on the attribute


name (ALPHA). A pop-up list of possible attributes will
appear.
Tuning the Parsing Rules

7-38

View and Correct Pattern Problems


5.

If there are
elements of
data that you
do not wish
to maintain,
assign an
IGNORE
attribute to
the piece of
data.

Double-click on the desired attribute (for example,


SURNAME) in the list. The ALPHA attribute will be replaced
with SURNAME and will appear in italics and blue.

Corrected attribute

Figure 7.10 Corrected Name Pattern


6.

Click the Confirm button


to verify the entry before it is
placed into the Customized Definitions file.

Creating and Working with TS Quality Projects

View and Correct Pattern Problems


7.

7-39

Click Apply to add this pattern to the Customized Definitions.

Confirm button

Figure 7.11 Name Pattern New Entry


If there are
multiple
names on
one name
line, the
number in
parentheses
will
determine
which data
element goes
with which
person.

8.

In the Navigation area, click on Customized Definitions to


view the new entries.

Figure 7.12 Complete New Entries in the Customized


Definitions File

Tuning the Parsing Rules

7-40

Save the Entries

Save the Entries


After the correction have been completed, save the entries in the
Customized Definitions file. These entries will be merged with the
Standard Definitions entries before the Parser step is run.
To save the entries
1.

Select File, Save from the Main Menu.

Re-Run Customer Data Parser


After the new entries are created and saved, you must re-run the
Customer Data Parser to apply the new parsing rules.
To apply new parsing rules
1.

Run the Customer Data Parser step. When asked Would you
like to run parsing customization prior to running the
step?, select Yes.

2.

The Customized Definitions will be merged with the Standard


Definitions and the Parser will run using the complete set of
parsing rules.

3.

When the Parser step has run, click on the Customization


Editor button. Navigate to US City Problems and then to
Bad Name Patterns. Notice that the exceptions are no
longer displayed. The entries in the Customized Definitions
file have instructed the Customer Data Parser on how to
handle these situations.

4.

Close the Customization Editor.

View Errors in Parsing Customization


When the Customer Data Parser has run, any errors in the Parsing
Customization process will be identified with the following message:

Creating and Working with TS Quality Projects

View Errors in Parsing Customization

7-41

Figure 7.13 Parsing Customization Error Message


If you get this message, view and correct the errors using the
following steps:
To view errors in Parsing Customization
1.

Open Customization Editor and select File, Open Error


log.

2.

The log displays the error message and indicates the line
number, as well as the entry where the error occurred. A
sample error log is shown below:

Figure 7.14 Parsing Customization Sample Error Log


Tuning the Parsing Rules

7-42

View Errors in Parsing Customization


3.

To correct errors, edit the entry in the Customized Definitions


file. The error in Figure 7.14 indicates that the entry was
duplicated in the Customized Definitions file. One entry
should be deleted from this file.

4.

Save the Customized Definitions file and re-run the Customer


Data Parser.

Creating and Working with TS Quality Projects

8-1

CHAPTER 8

Analyzing Single Data

Analyzing Single Data

8-2
Sometimes users need to test and analyze the results of cleansing,
standardization and linking on a single data record. TS Quality
Analyzer allows the user to parse, geocode, and match name and
address data interactively. It is a useful way to test and view
modifications you make to the parsing rules.
In this chapter, you will perform these tasks:
Start the TS Quality Analyzer
Input name and address data
View the cleansed results
Show details for name/address parsing and standardization
Show details for address validation
Match data against your database
Review results of matching
The TS Quality Analyzer is not available for Asia-Pacific
countries.

Creating and Working with TS Quality Projects

Using the TS Quality Analyzer

8-3

Using the TS Quality Analyzer


The TS Quality Analyzer processes a single data record for a
specific country. The Analyzer processes each countrys data using
the appropriate parsing and geocoding tables. If you have changed
the Customer Data Parsers parsing rules, using the TS Quality
Analyzer is a particularly effective way to test the new rules.
Use the TS Quality Analyzer for several functions:
Review details for name/address parsing and standardization
Review details for address validation
View Customer Data Parser and Postal Matcher details for
name/address record data
View Customer Data Parser Review Group descriptions
View Postal Matcher Return Code descriptions
Add the record to a database file for interactive reference
matching
Match a transaction record to records in database file
See Linking Single Record Using the TS Quality Analyzer
on page 14-14 for reference matching processing.

Start the TS Quality Analyzer


To start the TS Quality Analyzer
1.

From the Tools palette, select TS Quality Analyzer.

2.

In the Select a Country window, select the country you wish


to work with. Click OK.

Analyzing Single Data

8-4

Data Entry and Cleansing


3.

The TS Quality Analyzer application opens.

Main Menu
Tool Bar

Figure 8.1 TS Quality Analyzer


There are two tabs in the TS Quality Analyzer:
Standardization - The Standardization tab is used for
cleansing, parsing and postal matching processes. Customer
Data Parser and Postal Matcher will be automatically run for the
selected record and standardization results will be displayed.
Matching - The Matching tab is used for the matching process.
Relationship Linker will be automatically run for the selected
record and matching results will be displayed.
You must first run the Standardization and then run the
Matching. The Matching process takes as input the
cleansed data generated by the Standardization process.

Data Entry and Cleansing


To enter a new record and cleanse the data
1.

Select the Standardization tab.

Creating and Working with TS Quality Projects

Data Entry and Cleansing

8-5

2.

Select File from the main menu to select the input and
output mode for the record. For input, select from Input
Mode, Input Fields or Free Form Input. For output, select
Output Mode, Output Fields or Free Form Input.

3.

If you select Input Fields mode, enter the record line by line.
If you select Free Form mode, you can enter the record in
free text format.

4.

Enter the new record data in the Input frame.

To clear the
Input frame,
click Clear or
select Input
from the
Reset menu.

Input Fields mode

Free Form Input Mode

Figure 8.2 Input Mode


5.

Click Cleanse to parse and geocode the data.


Analyzing Single Data

8-6

Data Entry and Cleansing


You can also click the Cleanse button
bar.
6.

in the tool

The cleansed data will appear in the Cleansed window on


the Standardization tab.

Figure 8.3 Cleansed Data


7.

Look at Customer Parser Message and Postal Matcher


Message under the Cleansed window. These messages
indicate whether the data entered is valid or not.
If the data is valid, you will see the following messages:

If the data is invalid, you will see messages like this:

8.

Click Show Details to see the parsing, standardization, and


validation details. The results of the Customer Data Parser

Creating and Working with TS Quality Projects

Advanced Details

8-7

are shown in the lower left window and the results of the
Postal Matcher are shown in the lower right window.

Figure 8.4 Parsing, Standardization, and Validation


Details

Advanced Details
In addition to the parsing, standardization, and validation details,
you can review the advanced details of the results of the Customer
Data Parser and Postal Matcher results.
To review advanced details of data
1.

Click Advanced Detail from the main menu. Refer to the


table below and select the desired information:

Select...

To...

Customer Data Parser


and Postal Matcher
Details

Review the PREPOS information returned from the


Customer Data Parser and Postal Matcher

Customer Data Parser


Review Group
Descriptions

Look up the description of the Customer Data Parser


Review Group returned

Postal Matcher Return


Code Descriptions

Look up the description of the Postal Matcher Return


Code

DPV Return Code


Descriptions

Look up the description of the DPV Return Code

Analyzing Single Data

8-8

Matching

Matching
Once the Cleansing step has run, you can match the record against
records in your database.
To match data against database
1.

Select the Matching tab. Notice the window key for the
cleansed record is shown.

Window Key

Figure 8.5 Matching Tab


2.

Click the Plus sign (+) to show the Master Database. The
records in the database are shown in the lower window.

Figure 8.6 Records in Master Database


Creating and Working with TS Quality Projects

Matching

8-9

3.

Click either Match Individual or Match Household to set


the level of matching.

4.

Click Match.
You can also click the Match button
bar.

5.

in the tool

The match results are displayed in the Window Key


Matched Records from the Master Database and
Relationship Linker Matched Records on the right side of
the window.
Window Key Matched Records from the Master
Database shows all records in the database with the same
window key as the input record.

Figure 8.7 Window Key Matched Records


Relationship Linker Matched Records shows all matched
records from the Window Key Matched records.

Figure 8.8 Relationship Linker Matched Records


To view and edit linking rules
If you want to see and edit the field and/or pattern list files for
this matching, launch the Relationship Linker Rule Editor within
the TS Quality Analyzer.

Analyzing Single Data

8-10

Organize Database
1.

Select Match Rules from the Tuning menu.

2.

Select either Consumer or Business, and then select either


Level1 or Level2.

3.

The Relationship Linker Rule Editor opens with the field and/
or pattern files for this matching process.

4.

Review, edit and save the field and/or pattern files.

5.

To match the data again using the updated field and/or


pattern files, go back to the Standardization tab and cleanse
the data again.

6.

Go to the Matching tab and re-run matching by clicking


Match.

Organize Database
To add data to database
At this point, if you decide to keep the input record in the
database, you can add the record.
1.

Click Add to DB. The cleansed and matched input record is


added to your database.

To remove data from database


1.

In the master database, highlight the data you want to


remove.

2.

Click

to remove that data.

To reset database
3.

Select Master Database from the Reset menu.

4.

At the confirmation message, select Yes.

Creating and Working with TS Quality Projects

9-1

CHAPTER 9

Enriching Your Data

Enriching Your Data

9-2

Sorting for the Postal Matcher


Once the name and address data is parsed, the address data must
be verified and enriched by the Postal Matchers. With the Postal
Matchers, data is matched to directories and appropriate
geographic fields are populated with postal geocoding data. The
Postal Matchers help you locate customers, verify address data, and
improve that data. All Postal Matchers rely on output from the
parsing process to provide addresses for linking purposes.
In this chapter, you will perform these tasks:
Sort the output file from the Customer Data Parser
Specify input, output, and the postal tables for the Postal
Matcher
Run the Postal Matcher and view results
Identify the match level code for a record
View the record and analyze the match to the Postal
Directory
Browse the postal directories for each country
We strongly recommend that the output file from the CDP
be sorted by geographic fields so that the records will be in
geographic order to permit the Postal Matchers to work
most efficiently.

Sorting for the Postal Matcher


The Postal Matchers use output from the parsing process as inputs.
To obtain optimum performance, the input files to the Postal
Matchers must first be sorted in geographic order, using the Sort
Utility. The output file will have the extension .srt to indicate that
the data have been sorted.

Input and Output Settings


The Sort Utility uses the output from the Customer Data Parser step
as inputs.

Creating and Working with TS Quality Projects

Input and Output Settings

9-3

To specify input and output files

Tip:
You can either
edit the file names
manually or click
the File Chooser
icon to browse to
the
appropriate
file and
select it.
To view the
contents of your
data file, click the
Data
Browser
icon.
Use the
Dictionary
Editor to view the
contents
of the DDL
file.

1.

Open the Sorting Utility step and click the Input Settings
tab.

2.

Enter file names in the Input File Name and Input DDL
Name text boxes.

3.

Click Add. The file name is dynamically added to the table in


the Input Data File Name and Input DDL Name columns.
OR
Click Replace. The default file names in the Input Data File
Name and Input DDL Name column are replaced with the
files you just specified.

4.

Select the Output Settings tab.

5.

Enter the Output File Name and Output DDL Name file
names. The Output File Name must have the extension .srt
to indicate this is a sorted file.

6.

Enter file names in the Statistics File Name and Process


Log Name text boxes.

To specify the output file qualifier


A File Qualifier is a unique name given to a data file. For
the Sort Utility, the output data file must have its unique file
qualifier.
1.
2.

Click Advanced and navigate to Output, Settings.


Select Output Data File Qualifier (default is OUTPUT).

You may also specify the following settings:


To specify the starting record
1.

Click Advanced and navigate to Input, Settings.

2.

Enter a numeric value in Start at Record. This value


determines the record in the input data file at which the Sort
Utility will begin processing (default is 1).

Enriching Your Data

9-4

Input and Output Settings


To specify the maximum number of records to process
1.

Click Advanced and navigate to Input, Settings.

2.

Enter a numeric value in Process a Maximum of. This


value specifies the maximum number of records to process.
By default, all records will be processed.

To process every nth record only

Valid
delimiters are
Tab, Space,
Semicolon,
Comma, and
Pipe.
Characters
other than
those listed
must be
enclosed by
quotation
marks.

1.

Click Advanced and navigate to Input, Settings.

2.

Enter a numeric value in Process Nth Sample. This value


specifies that only every Nth record will be processed. By
default, all records will be processed.

To use a delimited file


If you are using a delimited file for input and/or output, you
must specify delimited settings.
1.

Click Advanced and navigate to Input, Settings.

2.

Select Input Data File Delimiter Encoding and Input


Data File Delimiter from the drop-down list.

3.

For output, click Advanced and navigate to Output,


Settings.

4.

Select Output Data File Delimiter Encoding and Output


Data File Delimiter from the drop-down list.
See Encoding (Code Page) on page A-3 for more
information on encoding.

You can specify records to either Select or Bypass under


certain conditions in both input and output files. See
Select or Bypass Records on page 5-37 for instructions
on how to specify select/bypass definitions.

Creating and Working with TS Quality Projects

Process Settings

9-5

Process Settings
Once you have identified the input and output files, you are ready
to specify the settings used to process your data. The settings for
processing are managed in the Advanced Settings window.

Sort Fields
To specify sort fields

A red flag
indicates a
REQUIRED
field for this
operation.

1.

Click Advanced and navigate to Process, Settings.

2.

Click the Entry Settings tab.

3.

Select the input DDL fields from the drop-down list in the
Key box. These are the fields used in the sort process.
Sort fields are pre-determined according to the
country-specific step. You can change the default
fields by selecting different sort fields.

4.

Select the sort order from the drop-down list in the Order
box. Values are either Ascending Order or Descending
Order.

Geographic
fields used in the
sort process

Figure 9.1 Sort Entry Settings


To specify collating sequence
You can specify the collating sequence for the sort order. This is
optional.

Enriching Your Data

9-6

Additional Settings
1.

Click Advanced and navigate to Process, Settings.

2.

Click the Entry Settings tab.

3.

In the Collating Sequence box, specify the collating


sequence. Values are ASCII, EBCDIC, FOLDED_ASCII,
FOLDED_EBCDIC, or MULTI_NATIONAL. If omitted, the
default collating sequence defined by the operating system is
used.
For detailed information on the collating sequence, see
the Sort Utilitys Online Help.

Additional Settings
You can specify the following additional settings.
See Sort in
the TS Quality
Reference
Guide for the
complete
settings
information.

To retain the order of same-key records


If you want output data to retain the order of same-key records,
use Stable Sort.
1.

Click Advanced and navigate to Process, Settings.

2.

Click the Main Settings tab.

3.

Select Stable Sort.

To specify how equal-keyed records are handled


You can specify how duplicate records are handled when there
are duplicate keys.
1.

Click Advanced and navigate to Process, Settings.

2.

Click the Main Settings tab.

3.

In the Duplicates box, select an option from the drop-down


list. Values are:
KEEP_ALL - Keeps all the records.
KEEP_ONE - Keeps one record. It does not guarantee
that a particular record within the duplicate set will be
retained.
KEEP_NONE - Keeps none of the records.

Creating and Working with TS Quality Projects

Additional Settings

9-7

JUST_ DUPS - Keeps just the duplicates.


To enable debug function
1.

Click Advanced and navigate to Process, Settings.

2.

Click the Main Settings tab.

3.

Select Enable Debug Output.

4.

In the Debug File text box, accept the default path and file
name, or enter a new file name to receive debugging
information.

To count number of records processed


1.

Click Advanced and navigate to Process, Settings.

2.

Click the Main Settings tab.

3.

In the Sample Count text box, enter the number that


indicates the increment sample of records to read and
attempt to process from an input data file.
This count will be written to the Process Log file. To
display the Log file, select the Results tab and
navigate to the Process Log tab after the program is
run. The default is always 1.

To specify settings file encoding


1.

Click Advanced and navigate to Process, Settings.

2.

Click the Main Settings tab.

3.

In Settings File Encoding, select the appropriate encoding


from the drop-down list.
See Encoding (Code Page) on page A-3 for more
information on encoding.

Enriching Your Data

9-8

Run the Sorting Utility and Check Results

Run the Sorting Utility and Check Results


To run the Sort Utility and view results

When you click


Run,
TS Quality
automatically
saves your
settings.
To save your
settings
without running
the step, click
Save.

1.

Click OK to close the Advanced Settings.

2.

Click Run to run the Sorting Utility.


You can also right-click a step and select Run
Selected.

3.

Click OK.

4.

In the Results window, you will see the Statistics sub-tab.


The Sort Key summary is shown on this sub-tab.
TS Quality offers a number of utilities to perform
specific tasks. See Chapter 16, Utilities, for a review
of these tools.

Creating and Working with TS Quality Projects

Using the Postal Matchers

9-9

Using the Postal Matchers


Postal Matchers match your data to the country-specific TS
Quality Postal Directories and return address details and
database matches.
Postal Matchers perform these functions:
Verify and assign postal codes to name and address data
Assign delivery point identifier (DPID)
Standardize and correct address components
Provide linked data in a presentation form that meets the
country addressing standards
The TS Quality Postal Directories are included in the
package. Country-specific directories were installed
during the TS Quality installation process. You can
browse the postal directories using the Postal
Directory Browser. See Browsing the Postal
Directory on page 9-20.

Input and Output Settings


The Postal Matcher uses the output from the Sort Utility step as
input to this step.
To specify input and output files
1.

Open the Postal Matcher step and select the Input


Settings tab. Specify the Input File Name and Input DDL
Name.

2.

If you are using the Census tables and/or DPV tables, select
the Include Census Tables or Include DPV Tables box.

3.

Select the Output Settings tab. Specify the Output File


Name and Output DDL Name.

Enriching Your Data

9-10

Input and Output Settings


4.

Enter file names in the Statistics File Name and Process


Log Name text boxes.

To specify the input/output file qualifiers


A File Qualifier is a unique name given to a data file. Each
input and output data file must have its own unique file qualifier.
1.

Click Advanced and navigate to Input, Settings.

2.

Select Input Data File Qualifier (default is INPUT).

3.

Click Advanced and navigate to Output, Settings.

4.

Select Output Data File Qualifier (default is OUTPUT).

You can also specify the following settings:


To specify the starting record
1.

Click Advanced and navigate to Input, Settings.

2.

Enter a numeric value in Start at Record. This identifies the


record in the input data file at which the Postal Matcher will
begin processing (default is 1).

To specify the maximum number of records to process


1.

Click Advanced and navigate to Input, Settings.

2.

Enter a numeric value in Process a Maximum of. This


specifies the maximum number of records to process. By
default, all records will be processed.

To process every nth record only


1.

Click Advanced and navigate to Input, Settings.

2.

Enter a numeric value in Process Nth Sample. This


specifies that only every Nth record will be processed. By
default, all records will be processed.

To use a delimited file


If you are using a delimited file for input and/or output, you
must specify delimited settings.

Creating and Working with TS Quality Projects

Process Settings

Valid
delimiters are
Tab, Space,
Semicolon,
Comma, and
Pipe.
Characters
other than
those listed
must be
enclosed by
quotation
marks.

9-11

1.

Click Advanced and navigate to Input, Settings.

2.

Select Input Data File Delimiter Encoding and Input


Data File Delimiter from the drop-down list.

3.

For output, click Advanced and navigate to Output,


Settings.

4.

Select Output Data File Delimiter Encoding and Output


Data File Delimiter from the drop-down list.
See Encoding (Code Page) on page A-3 for more
information on encoding.
You can specify records to either Select or Bypass
under certain conditions in both input and output files.
See Select or Bypass Records on page 5-37 for
instructions on select and bypass definitions.

Process Settings
Once you have identified input and output files, you are ready to
specify settings to process your data. The settings for processing
are managed in the Advanced Settings window.

Postal Directories
The country-specific postal directories are included in TS Quality
and were installed when you installed the software. These
directories must be accessible to all projects.
See Installing TS Quality for a complete list of postal
directories for all countries and the locations of those
tables.
To specify postal directories
1.

Click Advanced in the Postal Matcher step and navigate to


Process, Settings.

Enriching Your Data

9-12

Postal Directories
The Process Settings window will vary from country
to country. See TS Quality Reference Guide for a
complete list of settings for each country.
2.

If you are using the US Postal Matcher, the settings are


displayed in Figure 9.2:

A red flag
indicates a
REQUIRED
field for this
operation.

Figure 9.2 Postal Matcher Settings (US)


3.

Refer to the table below to define each setting.

Setting

Description

Postal Base Data File

The file that contains street details


information: for example, USBASE.tbl.

Postal Level1 Data File

The file that contains level1 street name


information: for example, USINDEX1.tbl.

Postal Level2 Data File

The file that contains level2 city information:


for example, USINDEX2.tbl.

Postal Form File

The file that contains the postal certification


report. Required for USPS form.

Creating and Working with TS Quality Projects

Additional Settings

9-13

Setting

Description

Postal Form Database Date

Format of date to display on the report: for


example, 'MMM YYYY'.

Postal Form List

Name of list to be matched against US


tables: for example, 'DATA FILE'.

Postal Form Customer

Client name to display on the report: for


example, 'CUSTOMER NAME'.

Postal Form Job Number

The job number to print on the form: for


example, 99999.

4.

If you have checked the Include Census Tables and/or


Include DPV Tables box in the Input Settings tab,
Census Settings and/or DPV Settings window will be
enabled under Process. In this case, you must select your
census/DPV tables in each window.

Additional Settings
You can also specify the following additional settings.
See Postal
Matchers in
the TS Quality
Reference
Guide for the
complete
settings
information.

To enable debug function


1.

Click Advanced and navigate to Process, Settings.

2.

Select Enable Debug Output.

3.

In the Debug File text box, choose the default path and file
name, or enter a different file name to receive debugging
information.

To count number of records processed


1.

Click Advanced and navigate to Process, Settings.

2.

In the Sample Count text box, specify the number that


indicates the increment sample of records to read and
attempt to process from an input data file.
This count will be written to the Process Log file. To
display the Log file, select the Results tab and
navigate to the Process Log tab after the program is
run. The default is always 1.
Enriching Your Data

9-14

Run the Postal Matcher and View Results


To specify settings file encoding
1.

Click the Advanced button and navigate to Process,


Settings.

2.

In Settings File Encoding, select the correct encoding from


the drop-down list.
See Encoding (Code Page) on page A-3 for more
information.

Run the Postal Matcher and View Results


To run the Postal Matcher and view results

When you click


Run,
TS Quality
automatically
saves your
settings.
To save your
settings
without running
the step, click
Save.

1.

Click OK to close Advanced Settings.

2.

Click Run to run the Postal Matcher.

3.

Select OK.

4.

On the Results tab, the Statistics subtab appears. Record


Matches, Processing, Changes and Failures are shown
on this tab, as seen in figure 9.3.

Figure 9.3 Postal Matcher Statistics

Creating and Working with TS Quality Projects

Match Levels

9-15

After running the Postal Matchers, the Match Level Codes are
generated to identify specific conditions which occur for each record
being processed. You should review those codes to analyze the
postal matcher results.

Match Levels
The Match Level Codes indicate the accuracy of the match
between country geography data to the appropriate postal table.
The match level codes are written and the output record in the xx_
gout_match_level field.
In actual use, the xx in the descripton above will be
replaced with a two-letter country code (Example: US
= United States, CA = Canada, GB = Great Britain and
DE = Germany). Thus, xx_gout_match_level would
become US_gout_match_level for United States data.

Figure 9.4 Match Level Codes

Enriching Your Data

9-16

Match Levels
There are several Match Level Codes. Some common codes include:
A 0 in the US_GOUT_MATCH_LEVEL field indicates that the
input data successfully matched to the Directory
An Y in the US_GOUT_STREET_NAME_CHANGE field
indicates that the street name was changed
Misspelled street name was corrected
The full street name was given to the abbreviated name
See the TS Quality Reference Guide for a complete
list of Match Level Codes for the Postal Matchers.

Creating and Working with TS Quality Projects

Dual Address Information

9-17

Dual Address Information


Dual Address On the Same Line
In accordance with CASS requirements, if there are two addresses
on the same line, referred to as a dual address, the US Postal
Matcher may require both addresses for look up. Therefore, the
Customer Data Parser (CDP) needs to pass both addresses to the
US Postal Matcher.
Dual address information is passed to the US Postal Matcher from
the CDP using the us_gin and us_gout areas. The following rules
describe how a dual address is handled:
1.

Of the two addresses in a dual address, if one address is general


delivery, then us_gin_street_name will contain the other
address, and a G is set in the first position of us_gout_
secondary_type.

2.

If one of the addresses is a post office box (PO box), then us_
gin_street_name will contain the other address, and a P is
set in the first position of us_gout_secondary_type. The PO
box number is also stored, starting at the second position of
us_gout_secondary_type.

3.

If the dual address contains both a general delivery address and


a PO box number, then PO BOX is stored in us_gin_street_
name, and a G is set in the first position of us_gout_
secondary_type.

4.

If the dual address contains both a street name and a rural


route, then the street name is stored in us_gin_street_name,
and a 'R' is stored in the first position of us_gout_secondary_
type. In addition, the route number is stored starting at the
second position of us_gout_secondary_type, and the box
number is stored starting at the second position of us_gout_
secondary_number.

Currently, the Customer Data Parser handles the following dual


address cases:
street name / general delivery
Enriching Your Data

9-18

Dual Address Information Handling


general delivery / street name
street name / PO box
PO box / street name
general delivery / PO box
PO box / general delivery
rural route / general delivery
general delivery / rural route
rural route / PO box
PO box / rural route
street name / rural route
rural route / street name

Dual Address Information Handling


The following table shows where dual address information is stored
for the above cases:
Table 9.1 Dual Address Information Handling
Dual Address

us_gin_street_
name

Dual Addr Flag


(us_gout_
secondary_type)

street name /
general delivery

street name

general delivery /
street name

street name

street name /
PO box

street name

PO box
number

PO box /
street name

street name

PO box
number

general delivery /
PO box

PO box

PO box /
general delivery

PO box

Creating and Working with TS Quality Projects

us_gout_
secondary_
type[1]

us_gout_
secondary_
number[1]

Dual Address On Different Lines

9-19

Table 9.1 Dual Address Information Handling


Dual Address

us_gin_street_
name

Dual Addr Flag


(us_gout_
secondary_type)

us_gout_
secondary_
type[1]

us_gout_
secondary_
number[1]

rural route /
general delivery

rural route

general delivery /
rural route

rural route

rural route /
PO box

rural route

PO box
number

10

PO box /
rural route

rural route

PO box
number

11

street name /
rural route

street name

route
number

PO box
number

12

rural route /
street name

street name

route
number

PO box
number

The maximum length of the PO box number in cases 3, 4, 9 and 10


is 9. It extends into us_gout_secondary_number. The maximum
length of the PO box number in cases 11 and 12 is 6.

Dual Address On Different Lines


When dual address occurs on different lines, the address that is the
closest to the geography line is passed to the US Postal Matcher.

Changes to the PREPOS


If an address contains both a PO box and a rural route with a PO
box number, there is no room to store the second PO box number.
Therefore, the literal PO BOX is stored in pr_dwelling3_name_
recoded and the PO box number is stored in pr_dwelling3_
number.

Enriching Your Data

9-20

Browsing the Postal Directory

Browsing the Postal Directory


You can browse the postal directories using the Postal Directory
Browser. The Postal Directory Browser contains separate
interactive browsers to view the postal directories for all countries
included in the package. There are three levels for browsing: City
Level, Street Level, and Street Detail.
The Postal Directory Browser is not available for AsiaPacific (APAC) countries.

City Level Directory


To browse a city level directory
For detailed
information
on the Postal
Directory
Browser, see
the Online
Help.

1.

Select Postal Directory Browser on the Tools Palette. The


Configuration Dialog box appears.

Figure 9.5 Postal Directory Browser


Configuration Dialogue
2.

From the drop-down menu, select the country whose postal


directory you want to browse.

3.

Select the directory containing your pdb_settings directory.

Creating and Working with TS Quality Projects

Street Level Directory


4.

9-21

Click OK. The City Level window for the selected country
opens. This window lists cities, zip codes, and finance codes.

Figure 9.6 City Level Directory (US)


5.

To search for a particular city, use one of the search boxes in


the upper part of the window. For the US, the search boxes
are CITY, STATE, ZIPCODE, FINANCE CODE and US
Census Search.

6.

As you enter data in the search box, the program searches


for your entry. You need only enter information in one of the
search windows in order for the program to determine the
others.

7.

To clear the search boxes, click Clear.

Street Level Directory


To browse a street level directory
1.

Once you have selected a city, double-click the entry or click


Run to bring up the Street Level window. For the US, the
Enriching Your Data

9-22

Street Details
street level window contains all the street names for the
selected city.

Figure 9.7 Street Level Directory (US)


2.

To search for a certain street, use the search box. As you


enter information in the search box, the program will search
for the appropriate entry.

3.

To clear the search box, click Clear.

Street Details
To browse the street details
1.

Once you have selected a Street Name, double-click the


entry or click Run to bring up the Street Level Details
window.

2.

The Street Level Details window displays street details


under the fields for the selected street.

Creating and Working with TS Quality Projects

Street Details
3.

9-23

These fields vary from country to country. For example, the


US fields would look like this:

Figure 9.8 Street Name (US)


For detailed
information
on the
country
fields, see
Postal
Directory
Browsers
Online Help.

4.

The Postal Directory Browser displays the Street Detail.


Scroll to view all data presented by the Postal Directory
Browser.

Figure 9.9 Street Details (US)

Enriching Your Data

9-24

Street Details

Creating and Working with TS Quality Projects

10-1

CHAPTER 10

Linking Your Data

Linking Your Data

10-2
This chapter explains how to link your data. Linking is the process of
identifying records with a matching relationship (consumer/
business) in a file or duplicates in several files. Linking compares
records to determine the level of similarity between them.
The result of the comparisons is categorized as either a passed,
suspect, or failed match, based on the similarity of data elements in
the records, as well as the assigned score of their exceptions.
Data linking involves three steps:
Create window keys using the Window Key Generator
Sort records by the window key using the Sort Utility
Match records using the Relationship Linker

Creating and Working with TS Quality Projects

Using the Window Key Generator

10-3

Using the Window Key Generator


The Window Key Generator lets you create window keys that
are used to match records in the Relationship Linker. The
Relationship Linker tries to match records in the same window
key set so that it does not need to compare every record in the
database to every other record.
A window key is constructed from elements of input fields, such as
the first character of a business name and the first five characters
from a postal code field. To generate a window key, you must first
create a Window Key Rule that defines which part of each
element to include in the key. You can use one or more keys to filter
selected records for comparison.

Example
Input Records:
CENTER HOSPITAL
25 BRATTLE LN
ARLINGTON MA 02476

CHEMIST ASSOCIATES
12 BRANTWOOD RD
ARLINGTON MA 02476

Window Keys are generated from one of the window key rules
provided by the Window Key Generator. For example, Key_List_10
is set to generate the window key as follows:
Key_List_10 rule:
Use the first three character of postal code.
Append to this the first character of the business name.
Append to this the first character and subsequent
consonants of the street name.
Append to this a 1 if this is a personal name and a 2 if this
is a business name.

Linking Your Data

10-4

Input and Output Settings


Window key that is generated:
024CBR2

024CBR2

The same window key is generated for both records, bringing them
into the same match window for comparison purposes. Subsequent
matching rules will indicate that these records are not matches.

Input and Output Settings


The Window Key Generator uses the Postal Matcher output as input.
To specify input and output files
1.

Open the Window Key Generator step and click the Input
Settings tab.

2.

Enter the Input File Name and Input DDL Name.

3.

Click the Output Settings tab and enter the Output File
Name and Output DDL Name.

4.

Enter a file name in the Statistics File Name and Process


Log Name text boxes.

You can also specify these additional settings:


To specify the starting record
1.

Click Advanced and navigate to Input, Settings.

2.

Enter a numeric value in Start at Record. This specifies the


record in the input data file at which to begin processing
(default is 1).

To specify the maximum number of records to process


1.

Click Advanced and navigate to Input, Settings.

2.

Enter a numeric value in Process a Maximum of. This


specifies the maximum number of records to process. By
default, all records will be processed.

Creating and Working with TS Quality Projects

Process Settings

10-5

To process every nth record only


1.

Click Advanced and navigate to Input, Settings.

2.

Enter a numeric value in Process Nth Sample. This


specifies that only every Nth record will be processed. By
default, all records will be processed.

To use a delimited file


If you are using a delimited file for input and/or output, you
must specify delimited settings.
Valid
delimiters are
Tab, Space,
Semicolon,
Comma, and
Pipe.
Characters
other than
those listed
must be
enclosed by
quotation
marks.

1.

Click Advanced and navigate to Input, Settings.

2.

Select Input Data File Delimiter Encoding and Input


Data File Delimiter from the drop-down list.

3.

For output, click Advanced and navigate to Output,


Settings.

4.

Select Output Data File Delimiter Encoding and Output


Data File Delimiter from the drop-down list.
See Encoding (Code Page) on page A-3 for more
information on encoding.

You can specify records to either Select or Bypass under


certain conditions in both input and output files. See
Select or Bypass Records on page 5-37 for instructions.

Process Settings
Once you have specified input and output files, you can define the
settings to process your data. The settings for processing are
specified in the Advanced Settings window.

Linking Your Data

10-6

Create Window Key Rules

Create Window Key Rules


The Window Key is generated from a window key rule selected from
the Key_List. Before you can apply key rules, you must first
construct them.
To define window key rules

You can create


up to 30
window keys.
The maximum
window key
length is 50
bytes.

1.

Click Advanced. Navigate to Window Key Rules.

2.

Select a key file from the list of Key_List_01-30.

3.

In Primary Field Name, select the primary field name you


want to use in building the window key from the drop-down
list.

4.

In Number Characters Primary Field, specify the number


of characters to use from the Primary Field Name.

5.

In Primary Field Winkey Code, select conditions you want


to apply to the primary field from the drop-down list.

Figure 10.1 Window Key Rules


In this example, the Key_List_10 rule is used to generate
the window key as follows:
A red flag
indicates a
REQUIRED
field for this
operation.

Use the first three characters of the postal code


Append the first character of the business name
Append the first character and subsequent consonants of
the street name
Append a 1 if this is a personal name, and a 2 if this is
a business name
You can also specify a secondary window key. The
secondary window key will be used if the conditions in
Field Value Invoke Secondary Field are met.

Creating and Working with TS Quality Projects

Specify the Window Key Field


6.

10-7

Review the list of the fields, number of characters and the


window key codes used in the generation of the window key.

Specify the Window Key Field


The Window Key Field determines where the generated window
key will be placed on the output record.
To set window key fields
1.

Navigate to Keys, Keys Settings, Source Key. Under


Source Key, click on a cell and select the name of Key_List
from the drop-down list that will appear (Key_List_01 - 30).

2.

In Window Key Field Name, select the field name from the
drop-down list. The generated window key will be placed into
that field on the output record. In this example, the
generated window key from KEY_LIST_10 will be placed into
the field named WINDOW_KEY_01:

Figure 10.2 Window Key Field

Additional Settings
You can also specify these additional settings:
To enable debug function
1.

Click Advanced and navigate to Process, Settings.

2.

Select Enable Debug Output.

3.

In the Debug File text box, accept the default path and file
name, or enter the name of a file to receive debugging
information.

To count number of records processed


1.

Click Advanced and navigate to Process, Settings.


Linking Your Data

10-8

Additional Settings
2.

In the Sample Count text box, specify the number that


indicates the increment sample of records to read and
attempt to process from an input data file.
This count will be written to the Process Log file. To
display the Log file, select the Results tab and
navigate to the Process Log tab after the program is
run. The default is always 1.

To specify settings file encoding


See Window
Key Generator
in the TS
Quality
Reference
Guide for
complete
settings
information.

1.

Click Advanced and navigate to Process, Settings.

2.

In Settings File Encoding, select the encoding from the


drop-down list.
See Encoding (Code Page) on page A-3 for more
information on encoding.

To specify mask file


1.

Click Advanced and navigate to Process, Settings.

2.

In the Mask File text box, enter the path and file name for
the mask file.

Creating and Working with TS Quality Projects

Run the Window Key Generator and View Results

10-9

Run the Window Key Generator and View


Results
To run the Window Key Generator and view results

When you click


Run,
TS Quality
automatically
saves your
settings.
To save your
settings
without running
the step, click
Save.

1.

Click OK to close the Advanced Settings.

2.

Click Run to run the Window Key Generator step.

3.

Select OK.

4.

On the Results tab, the Statistics sub-tab appears.

5.

Navigate to the Output Settings tab and click the Data


Browser icon to view the WINDOW_KEY_01 field.

Figure 10.3 Window Key Generated


6.

Notice that all of the generated window keys end with a 2.


This means all of the records have been designated as
business records.

Linking Your Data

10-10

Sorting the Record by the Window Key

Sorting the Record by the Window Key


After creating the window keys, but before running the Relationship
Linker, the input record must be sorted by the window key. The
Sort Utility is used to sort a file into the desired order. In this
example, the output file from the Window Key Generator will be
sorted by the WINDOW_KEY_01 field. The output file will have
the extension .srt to indicate that the file has been sorted.

Input and Output Settings


The Sort Utility uses the output from the Window Key Generator
step as input.
To specify input and output files
Tip:
You can either
edit the file names
manually or click
the File
Chooser
icon to
browse to the
appropriate file
and select it.
To view the
contents of your
data file, click the
Data
Browser
icon.
Use the
Dictionary
Editor to
view the
contents of
the DDL file.

1.

Open the Sorting Utility step.

2.

Select the Input Settings tab.

3.

Enter file names in the Input File Name and Input DDL
Name text boxes.

4.

Click Add. The file name is dynamically added to the table in


the Input Data File Name and Input DDL Name columns.
OR
Click Replace. The default file names in the Input Data File
Name and Input DDL Name column are replaced with the
files you just specified.

5.

Navigate to the Output Settings tab.

6.

Enter the Output File Name and Output DDL Name. The
Output File Name should have the extension of .srt to
indicate that it is a sorted file.

7.

Enter file names in the Statistics File Name and Process


Log Name text boxes.

Creating and Working with TS Quality Projects

Process Settings

10-11

To specify the output file qualifier


The File Qualifier is a unique name given to a data file. For
the Sort Utility, the output data file must have the unique file
qualifier (with .srt suffix).
1.
2.

Click Advanced and navigate to Output, Settings.


Specify Output Data File Qualifier. The default is OUTPUT.
See Input and Output Settings on page 9-2 for the
optional input and output settings for the Sort Utility.

Process Settings
Once you have identified input and output files, you are ready to
define the settings to process your data. The settings for processing
are managed in the Advanced Settings window.

Specify Sort Fields


To specify sort fields and sort order

A red flag
indicates a
REQUIRED
field for this
operation.

1.

Click Advanced and navigate to Process, Settings.

2.

Click Entry Settings.

3.

Select the input DDL fields from the drop-down list in the
Key box.

4.

Select the sort order from the drop-down list in the Order
box. Values are either Ascending Order or Descending
Order.

Figure 10.4 Sort Field for Window Key


See Additional Settings on page 9-6 for the
additional settings for the Sort Utility.
Linking Your Data

10-12

Run the Sorting Utility and Check Results

Run the Sorting Utility and Check Results


To run the Sort Utility and view results

When you click


Run,
TS Quality
automatically
saves your
settings.
To save your
settings
without running
the step, click
Save.

1.

Click OK to close the Advanced Settings.

2.

Click the Run button to run the Sorting Utility.


You can also right-click a step and select Run
Selected.

3.
4.

Select OK.
On the Results tab, the Statistics sub-tab appears. The
Sort Key Summary is shown on this tab.
Be sure the file to be used in the Relationship Linking
step is sorted by the appropriate window key.

Creating and Working with TS Quality Projects

Using Relationship Linker

10-13

Using Relationship Linker


The Relationship Linker step identifies the relationships between
records in a file at the business and consumer level. It can also
identify whether duplicates exist in several files.
The Relationship Linker uses Comparison Routines to determine
the level of similarity between records. The result of the
comparisons is categorized as either Pass, Suspect, or Fail, based
on the similarity of data elements.
There are two types of linking functions:
Window Linkingcompares records to other records in the
same file
Reference Linkingcompares records in the input file to
an existing reference file
For each linking, there are two levels of matching:
Consumer
Consumer Level 1 - Household level matching
Consumer Level 2 - Individual level matching
Business
Business Level 1 - Company level matching
Business Level 2 - Contact level matching
Comparison Routines are used to compare a variety of types of
data including business names, personal names, and geographic
components. For example, the ABSOLUTE routine compares two
fields and looks for an exact match.
The next chapter will explain how to change and tune
the comparison routines. See TS Quality Reference
Guide, Appendix C for a detailed description of
Relationship Linker routines and their associated
scoring values.

Linking Your Data

10-14

Linking Examples

Linking Examples
This section contains detailed examples for each stage of matching,
beginning with input data.

Example 1: Sample Input Data


Assume that you have the following input data:
---------------------------------------------------------------Vals Lube & Repair 105 Main St
Tyngsboro
Ma
01879
Vals Lubrication
Main St
Tyngsboro
Ma
01879
John C Nicoli
25 Linnell Cir
Billerica
Ma
01862
J C Nicoli
25 Linnell Cir
Billerica
Ma
01862
John Nicole
91 Linnell Cir
Billerica
Ma
01862
Chris J Nicoli
25 Linnell Cir
Billerica
Ma
01862
Vals Lube Co
105 Main St
Tyngsboro
Ma
01879
C
J Nicoli
25 Linnell Cir
Billerica
Ma
01862
Vasco Laboratories
13 Main St
Tyngsboro
Ma
01879
----------------------------------------------------------------

Example 2: Data With Appended Window Key


Create a window key (the last field) using Key_List_10. (See
Create Window Key Rules on page 10-6 for the rules of Key_List_
10. )
---------------------------------------------------------------------------Vals Lube & Repair 105 Main St
Tyngsboro
Ma
01879 018VA MAI2
Vals Lubrication
Main St
Tyngsboro
Ma
01879 018VA MAI2
John C Nicoli
25 Linnell Cir
Billerica
Ma
01862 018NICLIN1
J C Nicoli
25 Linnell Cir
Billerica
Ma
01862 018NICLIN1
John Nicole
91 Linnell Cir
Billerica
Ma
01862 018NICLIN1
Chris J Nicoli
25 Linnell Cir
Billerica
Ma
01862 018NICLIN1
Vals Lube Co
105 Main St
Tyngsboro
Ma
01879 018VA MAI2
C
J Nicoli
25 Linnell Cir
Billerica
Ma
01862 018NICLIN1
Vasco Laboratories
13 Main St
Tyngsboro
Ma
01879 018VA MAI2
----------------------------------------------------------------------------

Creating and Working with TS Quality Projects

Linking Examples

10-15

Example 3: Data Sorted By Window Key


The input record must be sorted by the window key. The
Relationship Linker will match records in the same window key set.
---------------------------------------------------------------------------John C Nicoli
25 Linnell Cir
Billerica
Ma
01862 018NICLIN1
John Nicole
91 Linnell Cir
Billerica
Ma
01862 018NICLIN1
Chris J Nicoli
25 Linnell Cir
Billerica
Ma
01862 018NICLIN1
J C Nicoli
25 Linnell Cir
Billerica
Ma
01862 018NICLIN1
C
J Nicoli
25 Linnell Cir
Billerica
Ma
01862 018NICLIN1
Vals Lube Co
105 Main St
Tyngsboro
Ma
01879 018VA MAI2
Vasco Laboratories
13 Main St
Tyngsboro
Ma
01879 018VA MAI2
Vals Lubrication
Main St
Tyngsboro
Ma
01879 018VA MAI2
Vals Lube & Repair 105 Main St
Tyngsboro
Ma
01879 018VA MAI2
----------------------------------------------------------------------------

Example 4: Data Grouped by Matched Level 1 (Households)


After running the Relationship Linker, matched households would
look like this:
---------------------------------------------------------------------------John C Nicoli
25 Linnell Cir
Billerica
Ma
01862 018NICLIN1
Chris J Nicoli
25 Linnell Cir
Billerica
Ma
01862 018NICLIN1
J C Nicoli
25 Linnell Cir
Billerica
Ma
01862 018NICLIN1
C
J Nicoli
25 Linnell Cir
Billerica
Ma
01862 018NICLIN1
---------------------------------------------------------------------------John Nicole
91 Linnell Cir
Billerica
Ma
01862 018NICLIN1
---------------------------------------------------------------------------Vals Lube Co
105 Main St
Tyngsboro
Ma
01879 018VA MAI2
Vals Lubrication
Main St
Tyngsboro
Ma
01879 018VA MAI2
Vals Lube & Repair 105 Main St
Tyngsboro
Ma
01879 018VA MAI2
---------------------------------------------------------------------------Vasco Laboratories
13 Main St
Tyngsboro
Ma
01879 018VA MAI2
----------------------------------------------------------------------------

Linking Your Data

10-16

Linking Examples

Example 5: Data Grouped by Matched Level 2 (Individuals) in


Matched Level 1 (Households)
After running the Relationship Linker, matched individuals in
matched households would look like this:
----------------------------------------------------------------------------*John C Nicoli
25 Linnell Cir
Billerica
Ma
01862 018NICLIN1
J C Nicoli
25 Linnell Cir
Billerica
Ma
01862 018NICLIN1
*Chris J Nicoli
25 Linnell Cir
Billerica
Ma
01862 018NICLIN1
C
J Nicoli
25 Linnell Cir
Billerica
Ma
01862 018NICLIN1
----------------------------------------------------------------------------*John Nicole
91 Linnell Cir
Billerica
Ma
01862 018NICLIN1
----------------------------------------------------------------------------*Vals Lube Co
105 Main St
Tyngsboro
Ma
01879 018VA MAI2
Vals Lubrication
Main St
Tyngsboro
Ma
01879 018VA MAI2
Vals Lube & Repair 105 Main St
Tyngsboro
Ma
01879 018VA MAI2
----------------------------------------------------------------------------*Vasco Laboratories
13 Main St
Tyngsboro
Ma
01879 018VA MAI2
-----------------------------------------------------------------------------

* Indicates best record or 'survivor' of the match. See Using the


Create Common Utility on page 12-3 to learn more about the best
record and survivor record.

Example 6: Data Grouped by Suspect Level 1 (Households)


After running the Relationship Linker, suspect household would look
like this:
---------------------------------------------------------------------------John C Nicoli
25 Linnell Cir
Billerica
Ma
01862 018NICLIN1
Chris J Nicoli
25 Linnell Cir
Billerica
Ma
01862 018NICLIN1
J C Nicoli
25 Linnell Cir
Billerica
Ma
01862 018NICLIN1
C
J Nicoli
25 Linnell Cir
Billerica
Ma
01862 018NICLIN1
John Nicole
91 Linnell Cir
Billerica
Ma
01862 018NICLIN1
---------------------------------------------------------------------------Vals Lube Co
105 Main St
Tyngsboro
Ma
01879 018VA MAI2
Vals Lubrication
Main St
Tyngsboro
Ma
01879 018VA MAI2
Vals Lube & Repair 105 Main St
Tyngsboro
Ma
01879 018VA MAI2
---------------------------------------------------------------------------Vasco Laboratories
13 Main St
Tyngsboro
Ma
01879 018VA MAI2
----------------------------------------------------------------------------

Creating and Working with TS Quality Projects

Linking Examples

10-17

Example 7: Data Grouped by Suspect Level 2 (Individuals) within


Suspect Level 1 (Households)
After running the Relationship Linker, suspect individuals in suspect
households would look like this:
----------------------------------------------------------------------------*John C Nicoli
25 Linnell Cir
Billerica
Ma
01862 018NICLIN1
J C Nicoli
25 Linnell Cir
Billerica
Ma
01862 018NICLIN1
John Nicole
91 Linnell Cir
Billerica
Ma
01862 018NICLIN1
*Chris J Nicoli
25 Linnell Cir
Billerica
Ma
01862 018NICLIN1
C
J Nicoli
25 Linnell Cir
Billerica
Ma
01862 018NICLIN1
----------------------------------------------------------------------------*Vals Lube Co
105 Main St
Tyngsboro
Ma
01879 018VA MAI2
Vals Lubrication
Main St
Tyngsboro
Ma
01879 018VA MAI2
Vals Lube & Repair 105 Main St
Tyngsboro
Ma
01879 018VA MAI2
----------------------------------------------------------------------------*Vasco Laboratories
13 Main St
Tyngsboro
Ma
01879 018VA MAI2
-----------------------------------------------------------------------------

Linking Your Data

10-18

Window Linking

Window Linking
Window Linking compares records to other records in the same
file. A group of records is matched to each other, one window key
set at a time.

Input and Output Settings


The Relationship Linker uses the output from the Sort Utility2 step
as input.
Tip:
You can either
edit the file names
manually or click
the File
Chooser
icon to
browse to the
appropriate file
and select it.

To specify input and output files


1.

Open the Relationship Linker step and select the Input


Settings tab.

2.

Specify a file name in the Input File Name and Input DDL
Name text boxes.

3.

Click Add. The file name is dynamically added to the table in


the Input Data File Name and Input DDL Name columns.
- OR -

To view the
contents of your
data file,
click the
Data
Browser icon.
Use the
Dictionary
Editor to view the
contents
of the DDL
file.

Click Replace. The default file names in the Input Data File
Name and Input DDL Name column are replaced with the
files you just specified.
4.

Navigate to the Output Settings tab.

5.

Enter file names in the Output File Name and Output DDL
Name text boxes.

6.

Optionally, specify a Linking File. A linking file indicates


which matched records are linked together with common
data. If you want to produce a linking file, identify the
Linking Data File and Linking DDL File.

7.

Enter file names in the Statistics File Name and Process


Log Name text boxes.

Creating and Working with TS Quality Projects

Input and Output Settings

10-19

To specify the input/output file qualifiers


A File Qualifier is a unique name given to a data file. Each
input and output data file must have its own unique file qualifier.
1.

Click Advanced and navigate to Input, Settings.

2.

Specify Input Data File Qualifier (default is INPUT).

3.

Click Advanced and navigate to Output, Settings.

4.

Specify Output Data File Qualifier (default is OUTPUT).

You can also specify the following settings.


To specify the starting record
1.

Click Advanced and navigate to Input, Settings.

2.

Enter a numeric value in Start at Record. This specifies the


record in the input data file at which to begin processing
(default is 1).

To specify the maximum number of records to process

Valid
delimiters are
Tab, Space,
Semicolon,
Comma, and
Pipe.
Characters
other than
those listed
must be
enclosed by
quotation
marks.

1.

Click Advanced and navigate to Input, Settings.

2.

Enter a numeric value in Process a Maximum of. This


specifies the maximum number of records to process. By
default, all records will be processed.

To process every nth record only


1.

Click Advanced and navigate to Input, Settings.

2.

Enter a numeric value in Process Nth Sample. This


specifies that only every Nth record will be processed. By
default, all records will be processed.

To use a delimited file


If you are using a delimited file for input and/or output, you
must specify delimited settings.
1.

Click Advanced and navigate to Input, Settings.

Linking Your Data

10-20

Basic Settings
2.

Select Input Data File Delimiter Encoding and Input


Data File Delimiter from the drop-down list.

3.

For output, click Advanced and navigate to Output,


Settings.

4.

Select Output Data File Delimiter Encoding and Output


Data File Delimiter from the drop-down list.
See Encoding (Code Page) on page A-3 for more
information on encoding.
You can specify records to either Select or Bypass
under certain conditions in both input and output files.
See Select or Bypass Records on page 5-37 for
instructions.

Basic Settings
You must specify the match method and the name form field.
To select match method

A red flag
indicates a
REQUIRED
field for this
operation.

1.

Click Advanced and navigate to Process, Settings.

2.

In Match Method, select Window Matching from the


drop-down list.

To specify name form field


1.

Click Advanced and navigate to Process, Settings.

2.

In Name Form Field, select the name form field from the
drop-down list. The Name Form Field contains the
Consumer/Business flag. This field is created by the
Transformer or Customer Data Parser, and is used by the
Relationship Linker to distinguish between consumer and
business records.

Creating and Working with TS Quality Projects

Field and Pattern Files

Flag

Description

Consumer

Business

10-21

The consumer/business flag within the matching


window must be the same.

Field and Pattern Files


The Relationship Linker uses Field Files and Pattern Files in the
linking process.The default files for your country are included in the
TS Quality package.
Field Files - contains fields to compare in the linking
process.
Default Field Files:
\TrilliumSoftware\tsq10r5s\<project>\settings\
xxbus1fld.stx (business level1)
xxbus2fld.stx (business level2)
xxcon1fld.stx (consumer level1)
xxcon2fld.stx (consumer level2)
(xx = 2-digit country code)
Pattern Files - contains patterns or report cards to
determine the level of similarity between the records in the
linking process. The pattern is assigned a number and
designated with a pass, suspect, or fail.
Default Pattern Files:
\TrilliumSoftware\tsq10r5s\<project>\settings\
xxbus1pat.stx (business level1)
xxbus2pat.stx (business level2)
xxcon1pat.stx (consumer level1)
xxcon2pat.stx (consumer level2)
(xx = 2-digit country code)
Linking Your Data

10-22

Window Key Field


To specify field and pattern settings
1.

Click Advanced and navigate to Process, Field Pattern


Settings.

2.

Accept the default country specific settings files or select the


customized settings file from the drop-down list.
See Using the Relationship Linker Rule Editor on
page 11-12 to learn more about customizing the field
and pattern files.

Window Key Field


The Relationship Linker tries to match records in the same window
key set. Therefore, you must specify the window key field.
To specify window key field
1.

Navigate to Process, Transaction Window Settings.

2.

In Window Key Field, select the window key field you are
using for matching. In this example, Window Key Field is set
to WINDOW_KEY_01.

Figure 10.5 Window Key Field

Window Size
You can control how many records are added to the match window.
If there are more records of one window key than the value
specified, additional windows are created for the remaining records.
For example, if you have 1000 records and set the value at 500,
additional match windows are created for the remaining records.

Creating and Working with TS Quality Projects

Run the Relationship Linker and View Results

10-23

To specify maximum window size


1.

Click Advanced and navigate to Process, Transaction


Window Settings.

2.

In Maximum Window Size, specify a numeric value.

For the additional settings for window linking, See


Additional Settings on page 10-28.

Run the Relationship Linker and View Results


To run the Relationship Linker and view results
When you click
Run,
TS Quality
automatically
saves your
settings.
To save your
settings
without running
the step, click
Save.

1.

Click OK to close the Advanced Settings and then click Run


to run the Relationship Linker.
You can right-click on a step and select Run Selected.

2.

Select OK.

3.

On the Results tab, the Statistics sub-tab appears. Review


the statistics for the Relationship Linker on the sub-tab.

4.

Click Results Analyzer to view the record detail for the


linking process. The output data set is sorted by matched
individual number within matched household number within
the window key.
The Results Analyzer allows the user to view the
actual data and match results. We will explain this tool
in detail in the next chapter.

Linking Your Data

10-24

Reference Linking

Reference Linking
Reference Linking compares records in your input file to an
existing reference file. It is mainly used to update new records
within the existing master file in the database.
For example, suppose youve received a new set of records after
running the initial linking. In this case, you would take the new
records as your input file and the initial matched records as your
reference file. You can compare the input file with the reference file
and verify the existence of new records in the reference file, and
update the file if necessary.
If a match is found, a matching key number is copied from the
reference record to the input record. If no match is found, a new
key number is generated and appended onto the input record. The
number of output records in reference linking is the same as in the
input records. Users can use the matching key numbers to update
the reference file.
See Relationship Linker in the TS Quality Reference
Guide for detailed information on Reference Linking.

Input and Output Settings


To specify input and output files
1.

Open the Relationship Linker step and select the Input


Settings tab.

2.

Enter file names in the Input File Name and Input DDL
Name text boxes.

3.

Click Add. The file name is dynamically added to the table in


the Input Data File Name and Input DDL Name columns.
- OR Click Replace. The default file names in the Input Data File
Name and Input DDL Name column are replaced with the
files you just specified.

Creating and Working with TS Quality Projects

Input and Output Settings


Tip:
You can either
edit the file names
manually or
click the File
Chooser
icon to browse to
the appropriate
file and select it.
To view the
contents of your
data file, click the
Data
Browser
icon.
Use the
Dictionary
Editor to
view the
contents of
the DDL
file.

10-25

4.

Click the Reference Match checkbox to enable reference


matching. When this checkbox is checked, the Reference
File and Reference DDL options become enabled.

5.

Specify the Reference File and Reference DDL.

6.

Navigate to the Output Settings tab.

7.

Enter file names in the Output File Name and Output DDL
Name text boxes.

8.

You may also specify a Linking File. A linking file indicates


which matched records are linked together with common
data. If you want to produce a linking file, specify the
Linking Data File and Linking DDL File.

9.

Enter file names in the Statistics File Name and Process


Log Name text boxes.

To specify a second output file


A second output file for the reference linking contains all records
from the reference file that had a matching record in the input
file.
1.

Click Advanced and navigate to Reference, Output


Settings.

2.

Specify Reference Output Data File and Reference


Output DDL File.

To specify the input/output file qualifiers


A File Qualifier is an unique name given to a data file. Each
input and output data file must have its own unique file qualifier.
1.
2.
3.
4.

Click Advanced and navigate to Reference, Input


Settings.
Specify Reference Input Data File Qualifier.
Click Advanced and navigate to Reference, Output
Settings.
Specify Reference Output Data File Qualifier.
See the Window Linking section for the optional input
and output settings.
Linking Your Data

10-26

Basic Settings

Basic Settings
The steps for settings of Match Method, Name Form Field, Field
Pattern, and Window Key are the same as those for window linking.
See Basic Settings on page 10-20 for details.

Specify Matching Numbers


If a match is determined, a matching key number is copied from the
reference record to the input record. You must specify the fields
where those matching numbers are stored.
Reference Level 1 Number - Identifies the field in the
reference file where existing level 1 numbers are stored. For
matched records at level1, this number in the reference file
is copied to the input file.
Reference Level 2 Number - Identifies the field in the
reference file where existing level 2 numbers are stored. For
matched records at level2, this number in the reference file
is copied to the input file.
Reference Record ID - Identifies the field where record ID
are stored. This value must be unique between reference file
and input file.
You must add these fields to the DDL file prior to
attempting reference linking.
To specify matching numbers
1.

Click Advanced and navigate to Process, Reference


Matching. Enter the Reference Level1 Number field.

2.

Enter the Reference Level2 Number field.

3.

Enter the Reference Record ID field.

To specify numbers when there is no match


If an input record does not match any record in the reference
file at the level 1, it will be assigned a number from the
Number Generation Start and Number Generation Cycle.
Creating and Working with TS Quality Projects

Specify Matching Numbers

10-27

1.

Click Advanced and navigate to Process, Reference


Matching.

2.

In Number Generation Start, enter a starting number for


unmatched new records, like 0
The starting number will be this value plus 1.

3.

In Number Generation Cycle, enter a text or numeric


string which will be added to the beginning of the Number
Generation Start value, as in NM. If you do not specify a
value, the default will be used. The default is YYDDD, where
YY is the last 2 digits of the year, and DDD is a number date
from 1/1).

You can also specify the following additional settings.


To match all reference records
You can control whether to identify all matches when a input
record matches more than one record in the reference file.
1.

Click Advanced and navigate to Process, Reference


Matching.

2.

To enable all matches, check the box next to Reference File


Match All. If this box is not checked, Relationship Linker
does not attempt to match any additional records on the
reference file after matching one record.

To specify maximum window size


You can control how many records are added to the match
window. If there are more records of one window key than the
value specified, additional windows are created for the
remaining records. For example, if you have 1000 records and
set this value to 500, additional Match windows are created for
the remaining records.
1.

Click Advanced and navigate to Process, Reference


Matching.

2.

In Maximum Window Size, specifies a numeric value.

Linking Your Data

10-28

Display Match/Suspect Pattern IDs

Display Match/Suspect Pattern IDs


If you want to display matched or suspect household/individual
pattern IDs in the output, you can specify the field to store those
IDs.
To specify fields for match/suspect pattern IDs
1.

Click Advanced and navigate to Process, Reference


Matching.

2.

In Reference Level1 Pass (Suspect) Pattern Field,


specify a DDL field where Level 1 pattern IDs are written out
for output.

3.

In Reference Level2 Pass (Suspect) Pattern Field,


specify a DDL field where Level 2 pattern IDs are written out
for output.

Additional Settings
For both Window Linking and Reference Linking, you can configure
these additional settings:
To enable debug function
See
Relationship
Linker in the
TS Quality
Reference
Guide for the
complete
settings
information.

1.

Click Advanced and navigate to Process, Settings.

2.

Select Enable Debug Output.

3.

In the Debug File text box, accept the default path and file
name, or enter the name of the file which will receive
debugging information.

To count number of records processed


1.

Click Advanced and navigate to Process, Settings.

2.

In the Sample Count text box, specify the number that


indicates the increment sample of records to read and
attempt to process from an input data file.

Creating and Working with TS Quality Projects

Run the Relationship Linker and View Results

10-29

This count will be written to the Process Log file. To


display the Log file, select the Results tab and
navigate to the Process Log tab after the program is
run. The default is always 1.
To specify settings file encoding
1.

Click Advanced and navigate to Process, Settings.

2.

In Settings File Encoding, select the encoding from the


drop-down list.
See Encoding (Code Page) on page A-3 for more
information on encoding.

To specify mask file


1.

Click Advanced and navigate to Process, Settings.

2.

In the Mask File text box, enter the path and file name for
the mask file.

Run the Relationship Linker and View Results


To run the Relationship Linker and view results
When you click
Run,
TS Quality
automatically
saves your
settings.
To save your
settings
without running
the step, click
Save.

1.

Click OK to close the Advanced Settings.

2.

Click Run button to run the Relationship Linker.


You can right-click on a step and select Run Selected.

3.

Select OK.

4.

On the Results tab, the Statistics sub-tab appears. Review


the statistics for the Relationship Linker on the sub-tab.

5.

Click the Results Analyzer button to view the record detail


for the linking process.
The Results Analyzer tool allows the user to view the
actual data and match results. We will explain this tool
in detail in the next chapter.

Linking Your Data

10-30

Run the Relationship Linker and View Results

Creating and Working with TS Quality Projects

11-1

CHAPTER 11

Tuning the Linking Rules

Tuning the Linking Rules

11-2
The output of the Relationship Linking process is displayed in the
Relationship Linker Results Analyzer. This tool allows you to
view and analyze linked results. After viewing these results, you can
determine if there is a need to customize the rules of the link
process to meet your business requirements.
In this chapter, you will perform these tasks:
Use the Results Analyzer to view and analyze the results of
the Relationship Linker process
Use the Rule Editor to analyze the linking rules and add a
field to compare in the link process
Customize the field and pattern lists by adding fields and
patterns to the process
Re-run the Relationship Linker using the new linking rules
and view results
Use the Data Comparison Calculator to test the comparison
routine and appropriate score

Creating and Working with TS Quality Projects

Using the Relationship Linker Results Analyzer

11-3

Using the Relationship Linker Results


Analyzer
The Relationship Linker Results Analyzer displays linked
records in a spreadsheet format. Once you have run the
Relationship Linker, you can display the match results using this
tool. You can browse the matched results and examine the data to
see how records were initially matched. You can then decide if it is
necessary to change the business rules to meet your requirements.

View the Linking Results


To start the Results Analyzer
1.

Open the Relationship Linker step, and click the Results


Analyzer button.

Results
Analyzer
button

Figure 11.1 Launch Results Analyzer


2.

The Relationship Linker Results Analyzer opens.


Tuning the Linking Rules

11-4

View the Linking Results


In the Results Analyzer, each column is titled with either a field
from one of the match comparison list files or a key field from the
DDL file. The individual record data is displayed in the horizontal
row. View linked data by clicking the appropriate tab.

Click on
the tab and
view data

Matched
records

Match key

Figure 11.2 Relationship Linker Results View


Linked records are grouped together by color, alternating between
blue and white. If a record is by itself and not above or below
another record of the same color, then that record did not match
any other record.
If all of the records are business records, as in the
example above, no Consumer_Lev1 or Consumer_Lev2
records are displayed.
Records are displayed based on a specific key, depending on the
currently-selected tab. For example, if you are on the Business_
Lev1 tab, looking at Business_Lev1 matches, then matches will be
Creating and Working with TS Quality Projects

View the Linking Results

11-5

displayed based on the lev1_matched field.


When viewing the relationship linking results in the Results
Analyzer, you can see Matched records as well as Suspect
records:
The field with
the match key
will be in bold,
and highlighted
in red, to show
that this key is
being used to
show matches.

Matched Displays data with exact matches between


records. All records met the requirements for pass patterns
(patterns that begin with P).
Suspect Lists data with the most likely matches between
records. All records met the requirements for suspect
patterns (patterns that begin with S).
To view matched and suspect records
1.

Matched records at the Consumer/Business Level_1 and


Level_1 are displayed by default.

2.

To view Suspect records at the Consumer/Business


Level_1, click the Suspect radio button. To execute this
view, click the red exclamation point
.

Check this button to view


suspect matches

Click this button to execute


the selected view

Figure 11.3 Switch Matched and Suspect Records


3.

When you view suspect matches, the field for the matched
level is highlighted in red and italicized, in addition to the
field that contains the match key (highlighted in bold and in

Tuning the Linking Rules

11-6

View the Linking Results


red). This shows how the matches reflect in a suspect level
versus those in a matched level.

Figure 11.4 Suspect View with Matched Level


4.

To return to the Matched record view at the Consumer/


Business Level_1, click the Matched radio button. To
execute this view, click the red exclamation point.

5.

If you want to review the Suspect records for Level_2,


select the Suspect radio button next to Level_2. To

Creating and Working with TS Quality Projects

Edit Fields to Display

11-7

execute this view click on the red exclamation point, and


then select the Business_Lev2 tab.

Figure 11.5 Business Level View

Edit Fields to Display


In the Results Analyzer view, you can select and delete fields to
display.
To select and delete fields to display

If you select
Show
Standard
Fields in the
Format menu,
it displays only
the standard
DDL fields.

1.

Select Tools, Browse More Fields.

2.

The left window shows all Available Fields. Any field can be
highlighted and dragged into the Selected Fields window. A
field can also be highlighted and moved by clicking Add. If
you want to move all fields, click Add All.

3.

Click Show. Every field that is shown in the window will


appear as a column in the main viewer.

4.

To delete fields from the display, select those fields in the


Selected Fields window.

Tuning the Linking Rules

11-8

Save Fields to Display


5.

Click Delete, then Show, to update the display.

Figure 11.6 Select Fields to Display


6.

To search for a field, enter the field name in the Search text
box. Click Show.

Save Fields to Display


You can also save a view of fields in this window. If you frequently
look at the same fields in a file, saving a view can save time.
To save a view of fields
1.
2.

In the Browse More Fields window, select fields to display.


Click the Save button.

Creating and Working with TS Quality Projects

View Records in a Range


3.

11-9

In the Save window, name the view, and then identify the
desired location for the file.

Figure 11.7 Save Fields to Display


4.

To view a stored view, select the name of the view from the
drop-down menu in Select a Selected Customized View.
The fields will be loaded in the Selected Fields window.
Click Show to view the stored fields.
You can use Back
and Forward
to display the previous or next view.

on the Tools menu

View Records in a Range


If your output file is very large, it is a good idea to search smaller
subsets of your file. This will make the program run more quickly.
You can select a range of records within the file to view.
To view records in a range
1.

Enter a starting record number in the Browse Records


From text box and an ending number in the To text box.

Tuning the Linking Rules

11-10

View Records in a Range


2.

Click Go. Only records in the specified range will be


displayed.

Figure 11.8 View Records in a Range


You can also use the Previous Block
and Next Block
buttons to browse the data. The program browses in
blocks, based on the entered range.
For example, if you entered a range of Record 610, then
click Next Block, the program displays Record 1115. If you
click Previous Block, the program displays Record 15.
There are also Previous Block and Next Block buttons
displayed at the top and bottom of the vertical scroll bar on
the right.

If you notice
breaks in the
record number
sequence, it is
because each
record is either
a Consumer or
Business level
record.

Previous Block
Next Block

To view records by group size or pattern ID


You can view records in a group, either by group size or pattern
ID.
1.

Enter values in the Minimum Number of Members and/or


Pattern Number text boxes.

Figure 11.9 Record Group Size or Pattern ID


2.

Only matched groups that correspond to that value will be


displayed. For example, if you enter 2 for the Minimum
Number of Members and 100 for the Pattern Number, only

Creating and Working with TS Quality Projects

View Records in a Range

11-11

matches in groups of two or more with the pattern number


100 will be displayed.

Matches in
groups of 2 or
more

Pattern number

Figure 11.10 View Records by Pattern Number


For more detailed information about Result Analyzer, see
the Online Help.

Tuning the Linking Rules

11-12

Using the Relationship Linker Rule Editor

Using the Relationship Linker Rule Editor


Once you have reviewed the data, you may want to add or change a
field and a pattern in the match rules to meet your business
requirements. For example, any records at the Contact level
(Business_Lev2) that have the same Last_name and Account_
number field should be positively matched together. You can use
the Relationship Linker Rule Editor to change the match rules
to achieve that goal.
The field and pattern list files used in the Relationship Linker
process are displayed in the Relationship Linker Rule Editor.

View the Linking Rules


To start the Rule Editor
1.

From the Relationship Linker step, click on the Rules


Editor button on the bottom left.

Rules
Editor
button

Figure 11.11 Launch Relationship Linker Rule Editor


2.

The Relationship Linker Rules Editor opens.

Creating and Working with TS Quality Projects

View the Linking Rules

11-13

To view the Linking Rules


1.

When you open an existing field and existing pattern files,


the Field List Editor (upper pane) and Grade Pattern
Editor (lower pane) open automatically.

2.

Select Tile Horizontally or Tile Vertically from the


Window menu to view both field and pattern lists. You can
view the Consumer or Business, Level 1 or Level 2 by
clicking on the appropriate tab.

Click on the
tab and view
different levels
of field and
patterns.

Click on a
column
heading and
drag it to the
desired
location to
rearrange the
columns.

Figure 11.12 Relationship Linker Rule Editor

Tuning the Linking Rules

11-14

View the Linking Rules


The following table contains a list of columns in the Field List
Editor window and Grade Pattern Editor window.
Column

Description

Field List Editor


Description

Describe all fields in the field settings file. Double-click the cell to
edit it.

Score A - E

Specify up to 5 grade thresholds. For example, the first score is


the threshold for grade A and the second score is the threshold for
grade B. A through D must be positive; E can be positive or
negative.

Comparison
Routine

The Linker calls this routine to perform the field comparison.


Double-click the cell, select the desired routine from the list, and
click OK.

Propagation
Routine

The Linker calls this routine to perform the comparison


propagation for this field. Double-click the cell and select a routine
from the drop-down list.

Field Name 1
-3

Specify up to three fields for linking. Double-click the cell to open


the field name list and double-click the desired field name.

Routine
Modifier

Specify a value passed to a comparison routine. Each routine uses


a different number of modifiers; some use none. Double-click the
cell to open the list and double click on a modifier.

Grade Pattern Editor


Category

Lists the pattern category: P(Pass), F (Fail), or S (Suspect). Click


inside the cell and select the pattern from the drop-down menu.

Pattern ID

The pattern ID is a number ranging from 0 to 999. No duplicates


are allowed.

Creating and Working with TS Quality Projects

Customize the Field and Pattern Lists


Field Name
Columns

11-15

The remaining column headings take their names from the


description column in the Field List Editor window. The valid
grades are A, B, C, D, and E. The hyphen (-) represents a
wildcard character. Click inside the cell and select the grade from
the drop-down menu.

Customize the Field and Pattern Lists


The Field List contains the fields which are compared in the link
process. The Pattern List contains patterns used to determine the
degree of similarity between records.
You can customize the linking process by adding fields and/or
patterns to the process. For example, you can add a field and a
pattern to the link rules so that any records at the Contact
level(Business_Lev2) with the same Last_name and Account_
number field will be positively matched together.
To add a field to the field list
1.

In the Field List Editor click last_name in the Description


column.

2.

Select Edit, Insert After Selected Row to add a row for a


new field.
If you insert a new row in the Field List Editor, a new
column is automatically inserted into the Grade
Pattern Editor. Conversely, if you delete a row in the
Field List Editor, the corresponding column in the
Grade Pattern Editor is also deleted.

3.

Double-click the Description column and add a description


of account_number for this row.

4.

Double-click the Score A column and add 100. This means


that you want to compare the Account_numbers for two
records and they must match at 100%.

Tuning the Linking Rules

11-16

Customize the Field and Pattern Lists


5.

Double-click in the Comparison Routine column and select


partial1. partial1 is the routine used to compare the actual
field data in the Account_number fields.

6.

Double-click the Field Name 1 column and select Account_


number as the field for comparison.

Figure 11.13 Account_number Field Added


To add a pattern to the pattern list
1.

In the Grade Pattern Editor click Pattern ID 128 and


select Edit, Insert Before Selected Row. The new pattern
row is added.

2.

In the Category column select a P for a positive match


pattern (Pass). In the Pattern ID column give the pattern
the number 400 as this is very different from the other
patterns in the list.
The Pattern IDs 128 and 400 have no special meaning.
They are used here as examples only.

Creating and Working with TS Quality Projects

Customize the Field and Pattern Lists


3.

11-17

Select an A for the grade for the last_name field and for
the grade for the account_number. The grade A means
Score A (100) for those fields.

Figure 11.14 Pattern ID 400 Added


4.

Select File, Save to save the file. When asked Do you want
to continue? select Yes. When asked Do you want to
delete subsequent duplicate patterns? select No.

5.

Close the Relationship Linker Rule Editor.

6.

Close the Results Analyzer.

7.

See Checking Errors in the Field and Pattern Lists on page


11-18 to verify any errors in the changes you have made.

Tuning the Linking Rules

11-18

Checking Errors in the Field and Pattern Lists

Checking Errors in the Field and Pattern Lists


If you have made changes to the field and/or pattern file, make
sure to run the Error Report. The program displays a message if it
discovers a problem in the file, such as missing routines or duplicate
pattern IDs. For example, the following grade pattern file has a
duplicated pattern ID:

Pattern ID
102 is
duplicated

Figure 11.15 Duplicate Pattern ID


To check errors in the field and pattern lists
1.

After the changes have been made, select Error Report


from the Tools menu. If an error is found, you will receive an
error message.

Figure 11.16 Error Message for Single Error

Creating and Working with TS Quality Projects

Re-Run the Relationship Linker and View Results


2.

11-19

The message prompts you to continue. If you click Yes, the


error checking continues. You may see another error such as
the one below.

Figure 11.17 Error Message for Additional Errors


3.

This message tells you that some grade patterns are


duplicates. Click Yes to remove all duplicates. Click No to
leave the duplicates in the file.

4.

Once you have deleted these duplicate patterns, a message


appears confirming the deletion.

5.

Select Save from the File menu to save the file.

For more detailed information about the Relationship


Linker Rule Editor, see the Online Help.

Re-Run the Relationship Linker and View


Results
Once a change is made to the field or pattern list, the Relationship
Linker process must be re-run. At this time, the Relationship Linker
will use the new linking rules you defined.
To run the Relationship Linker with the new rules
1.

Open the Relationship Linker step and click Run.


You can right click on a step and select Run Selected.
This is an alternate way to run the step.
Tuning the Linking Rules

11-20

Re-Run the Relationship Linker and View Results


2.

Click Results Analyzer to view the new results.

3.

Click the Business_Lev2 tab to view the results of the new


contact matching.

4.

In the lower left corner, type 400 in the Pattern Number


box and click OK. This will show only records that were
matched using Pattern ID 400. Review the records. Notice
that this new field and pattern were able to link records that
use nicknames in the first name field.

Pattern Number
box

Figure 11.18 New Matching Results

Creating and Working with TS Quality Projects

Using the Data Comparison Calculator

11-21

Using the Data Comparison Calculator


The Data Comparison Calculator can help you determine the
correct comparison routine and appropriate score for fields that you
add to the match process. For example, you can test the difference
between the ABSOLUTE and PARTIAL1 comparison routines and
decide which routine you want to use.
The steps for testing the routines are the same for most of the
comparison routines; the exceptions are SUBSTRING, DATE,
ARRAY1, ARRAY2 and MXDNAME. This section shows the general
steps for using these routines.
See the TS Quality Reference Guide, Appendix C for a
detailed description of Relationship Linker routines and
their associated scoring values.
To perform a comparison test
Check the
Match Case
box if you are
performing a
case-sensitive
comparison.

1.

From the Relationship Linker Results Analyzer select Tools,


Invoke Data Comparison Calculator.

2. The Data Comparison Calculator will open.


1. Enter a value for the first field in the Record 1, Field 1
text box, and then enter a value for the second field in the
Record 2, Field 1 text box.
2. Highlight a routine in the Comparison Routines list. If the
routine uses modifiers, they will appear in the Routine
Modifiers box. Select a modifier from the list or highlight
(none) (default).
3. Click Comparison. The score appears in the Score box.

Example
In this example, two values in the Account_number field are
compared using the ABSOLUTE and PARTIAL1 routines.
ABSOLUTE compares two fields and looks for an exact match.
Score 100 is an exact match, including blank vs. blank. PARTIAL1
compares two fields and looks for an exact match, but applies
different scores for blanks. Score 100 is an exact match excluding
Tuning the Linking Rules

11-22

Using the Data Comparison Calculator


blank vs. blank, 75 is blank field vs. non-blank field, and 65 is blank
field vs. blank field.
To run the ABSOLUTE and PARTIAL1 comparison routines
1.

Type a sample Account_number into the Record 1 and


Record 2 Field 1 boxes. Select the PARTIAL1 Comparison
Routine from the Comparison Routines list.

Figure 11.19 Data Comparison Calculator


2.

Click Compare. The score is 100. Change the Comparison


Routine to ABSOLUTE and click Compare. The score is
again 100.

3.

Clear the Record 1 and Record 2 Field 1 boxes. Now click


Compare. The score is again 100. Change the Comparison
Routine to PARTIAL1. The score of a blank field to a blank
field using PARTIAL1 is 65. This is an important distinction.
We did not want two records with blank Account_number

Creating and Working with TS Quality Projects

Using the Data Comparison Calculator

11-23

fields to positive match together.


For more detailed information on the Data Comparison
Calculator, see the Online Help.

Tuning the Linking Rules

11-24

Using the Data Comparison Calculator

Creating and Working with TS Quality Projects

12-1

CHAPTER 12

Selecting the Best Record

Selecting the Best Record

12-2
The Create Common Utility lets you select the best record of a
matched set of records (called the survivor), and then copies that
record to a field in another record, across a matched set of records.
This selection process is defined by decision routines. You can
commonize data in the current field or in a new field, using data
records that originate in another field.
In this chapter, you will perform these tasks:
Understand commonization and survivorship
Determine match key level settings
Identify common fields
Assign a survivor record
Run Create Common and view its results
Use the Data Browser to view the actual record data

Creating and Working with TS Quality Projects

Using the Create Common Utility

12-3

Using the Create Common Utility


You can use up
to ten levels of
output data
from the
Relationship
Linker.

The Create Common Utility allows you to set options that copy
data across a linked record set. This module has two major
functions:
CommonizationCopy data in one field to other fields in
records linked by a match key. You can commonize data in
an existing field or in a new field. You can also commonize
data sourced from another field.
SurvivorshipSelect a user-defined survivor record
among a group of records, using survivor selection rules.
This function flags a single record at any level, indicating the
best record of the linked set.
Input data file must be sorted by match keys (such as
LEV1_MATCHED) prior to being processed by this module.
If you run this module right after the Relationship Linker
step, the input file is automatically sorted by the match
keys. If you run this module separately, be sure to sort
the input file by match keys.

Example
Assume that the best record is determined according to the most
recent date in the Last_contact_date field. In this example, you
want to copy the account representative information with the most
recent contact date to the set of linked records, and then identify
one account representative per business.
Commonize the account representative from the record that
has the most recent Last_contact_date field.
Once the data is copied, place an indicator of 1 into the
Survivor_flag field for the record that has the most recent
Last_contact_date.
This indicator will be used later to select the best records
from the file.

Selecting the Best Record

12-4

Input and Output Settings

Input and Output Settings


The Create Common Utility uses the output from the Relationship
Linker step as input to this step.
Tip:
You can either
edit the file names
manually or click
the File Chooser
icon to
browse
for and
select the file.
To view the
contents of your
data file,
click the
Data
Browser
icon.
Use the
Dictionary
Editor to view the
contents
of the
DDL file.

To specify input and output files


1.

Open the Create Common step, and select the Input


Settings tab.

2.

Specify a file name in the Input File Name and Input DDL
Name text boxes.

3.

Select the Output Settings tab.

4.

Specify the Output File Name and Output DDL Name.

5.

Specify a file name in the Statistics File Name and


Process Log Name text boxes.

To specify the input/output file qualifiers


A File Qualifier is a unique name given to a data file. Each
input and output data file must have its own unique file
qualifier.
1.

Click Advanced and navigate to Input, Settings.

2.

Specify Input Data File Qualifier (default is INPUT).

3.

Click Advanced and navigate to Output, Settings.

4.

Specify Output Data File Qualifier (default is OUTPUT).

You can also specify the following settings:


To specify maximum array records
You can specify the number of records held in memory for the
Match Key Level 1 setting (The Match Key Level settings are
described later in this chapter). The default is 10000.
1.
2.

Click Advanced and navigate to Input, Settings.


Enter a numeric value in Maximum Array Records.

Creating and Working with TS Quality Projects

Input and Output Settings

12-5

To specify the maximum number of records to process


1.

Click Advanced and navigate to Input, Settings.

2.

Enter a numeric value in Process a Maximum of. This


specifies the mamimum number of records to process. By
default, all records will be processed.

To process every nth record only


1.

Click Advanced and navigate to Input, Settings.

2.

Enter a numeric value in Process Nth Sample. This


specifies that only every Nth record will be processed. By
default, all records will be processed.

To use a delimited file


If you are using a delimited file for input and/or output, you
must specify delimited settings.
Valid delimiters
are Tab, Space,
Semicolon,
Comma, and
Pipe. Characters
other than those
listed must be
enclosed by
quotation marks.

1.

Click Advanced and navigate to Input, Settings.

2.

Select Input Data File Delimiter Encoding and Input


Data File Delimiter from the drop-down list.

3.

For output, click Advanced and navigate to Output,


Settings.

4.

Select Output Data File Delimiter Encoding and Output


Data File Delimiter from the drop-down list.
See Encoding (Code Page) on page A-3 for more
information on encoding.
You can specify records to either Select or Bypass
under certain conditions in both input and output files.
See Select or Bypass Records on page 5-37 for
instructions.

Selecting the Best Record

12-6

Process Settings

Process Settings
Once you have specified input and output files, you can specify the
settings to process your data. The settings for processing are
managed in the Advanced Settings window.

Match Key Level Settings


Match Key Level settings specify the field that holds the match key
used to group records for evaluation. For example, business records
that were matched together usually have the same LEV1_
MATCHED number. Only records in the same group will be
compared and evaluated.
To specify match key level setting
1.

Click Advanced and navigate to Output, Level Settings.

2.

In Key Field, select the match key from the drop-down list
of DDL fields.

Figure 12.1 Match Key Level Settings

Common Fields
The Common Fields designate the decision routines used to copy
data from one field into other fields in the records linked by a
common key.
A red flag
indicates a
REQUIRED
field for this
operation.

To specify common field


1.

Navigate to Output, Common Fields.

Creating and Working with TS Quality Projects

Common Fields
2.

12-7

Specify values for the following settings:

Setting

Description

Level ID

Numeric value that specifies the level of


commonization, for up to 10 levels of data
hierarchy. For example: Level 1=business,
Level 2=location, Level 3=contact, and so on.

Test Field

Field that contains information necessary to


commonize data across records. Works in
conjunction with the decision routines.

Decision Routine Encoding Type of encoding used by the Decision Routine.


Decision Routine

Defines what and how data is processed.

From Field

Field that contains data which is modified or


moved to the Target Field.

Target Field

Field used to store data from the source field


based on the decision routine.

Example
This example uses a decision routine called HIGHCHAR_NBNZ.
The HIGHCHAR_NBNZ routine commonizes the highest value
(non-blank, non-zero) that occurs in the Last_contact_
date field of all records at a record level of 1.
It copies the values in the Acct_rep field with the most
recent Last_contact_date (HIGHCHAR_NBNZ) and puts
this value into the Common_rep field.

Figure 12.2 Common Fields Settings


Record 1 contains the highest value in the Last_contact_
date field. The data in Acct_rep in Record 1 is commonized
into the Common_rep field:

Selecting the Best Record

12-8

Survivor Record

Input

LEV1_MATCHED

Last_contact_
date

Acct_rep

Output

Common_rep

Record 1

00000013

2005-03-17

JLS

Record 1

JLS

Record 2

00000013

2003-01-07

BPL

Record 2

JLS

Record 3

00000013

2004-02-08

JCN

Record 3

JLS

Record 4

00000014

2005-01-18

KJP

Record 4

KJP

Record 5

00000014

2003-11-09

MMR

Record 5

KJP

Survivor Record
You can designate a survivor record from a group of records linked
by a match key. Any record flagged as the survivor is assigned a
flag number. The Assign Survivor function defines the test field,
decision routine and target field for survivor identification.
A red flag
indicates a
REQUIRED
field for this
operation.

To assign survivor
1.

Navigate to Output, Assign Survivor.

2.

Specify values for the following settings:

Setting

Description

Level ID

Numeric value that specifies the level of


commonization, for up to 10 levels of data
hierarchy. For example: Level 1=business,
Level 2=location, Level 3=contact, and so on.

Test Field

Field that contains information necessary to


commonize data across records. Works in
conjunction with the decision routines.

Decision Routine

Specifies which decision routine to use for the


survivorship function.

Decision Routine Encoding Type of encoding used by the Decision Routine.


Target Field

Field used to store data from the source field


when the create common rule is satisfied.

Creating and Working with TS Quality Projects

Survivor Record

12-9

Setting

Description

Assigned Value

Numeric value that is assigned to the survivor


record.

Example
This example uses a decision routine called HIGHCHAR_NBNZ.
Assume that the best record is the one with the most recent date
(HIGHCHAR_NBNZ) in the Last_contact_date field. This record
needs a survivor flag of 1 in the Survivor_flag field to identify it
as the best record for the LEV1_MATCHED grouping.

Figure 12.3 Survivor Settings


The HIGHCHAR_NBNZ routine looks for the highest
character value (non-blank, non-zero) that occurs in the
Last_contact_date field of all records at a record level of 1.
In this case, Records 1 and 4 contain the highest date
contact, so the program takes those records as survivor. As a
result, the Survivor_flag field is flagged with a 1.
Input

LEV1_
MATCHED

Last_
contact_date

Acct_rep

Output

Common_
rep

Surviror_
flag

Record 1

00000013

2005-03-17

JLS

Record 1

JLS

Record 2

00000013

2003-01-07

BPL

Record 2

JLS

Record 3

00000013

2004-02-08

JCN

Record 3

JLS

Record 4

00000014

2005-01-18

KJP

Record 4

KJP

Record 5

00000014

2003-11-09

MMR

Record 5

KJP

Selecting the Best Record

12-10

Additional Settings

Additional Settings
You can also specify the following settings:
See Create
Common in
the TS Quality
Reference
Guide for
complete
settings
information.

To enable debug function


1.

Click Advanced and navigate to Additional....

2.

Select Enable Debug Output.

3.

In the Debug File text box, accept the default path and file
name, or specify a different file to receive debugging
information.

To count the number of records processed


1.

Click Advanced and navigate to Additional....

2.

In the Sample Count text box, specify the number that


indicates the increment sample of records to read and
attempt to process from an input data file.
This count will be written to the Process Log file. To
display the Log file, select the Results tab and
navigate to the Process Log tab after the program is
run. The default is always 1.

To specify settings file encoding


1.

Click Advanced and navigate to Additional....

2.

In Settings File Encoding, select the encoding from the


drop-down list.
See Encoding (Code Page) on page A-3 for more
information on encoding.

Creating and Working with TS Quality Projects

Run the Create Common and View Results

12-11

Run the Create Common and View Results


To run the Create Common and view results
When you click
Run,
TS Quality
automatically
saves your
settings.
To save your
settings
without running
the step, click
Save.

1.

Click OK to close the Advanced Settings.

2.

Click Run to run the Create Common Utility.


You can also right-click a step and select Run
Selected.

3.

Select OK.

4.

On the Results tab, the Statistics sub-tab appears.

5.

Navigate to the Output Settings tab and click the Data


Browser button next to the Output File Name.

6.

In the Field Selection window, select the fields you used for
the Create Common process, such as LEV1_MATCHED, Acct_
rep, Last_contact_date, Survivor_flag, and Common_rep.

7.

Click Display to see the records.

8.

Notice that for one Business household, the Active record


with the most recent Last_contact_date has a 1 in the
Survivor_flag field. All records in a Business Household
have the Acct_rep from the record with the most recent
Last_contact_date copied into the Common_rep field.

Figure 12.4 Create Common Results Displayed

Selecting the Best Record

12-12

Create Common Decision Routines

Create Common Decision Routines


Decision Routines are the program rules and instructions used in
the Create Common Utility. They control two functions:
How data is searched for and how commonization will
function within the program
How records will be set up for survivorship
Routines marked For commonization only cant be used
to determine a surviving record.

Table 12-1: Create Common Decision Routines


Decision Routine

Description

LOWEST

Lowest numeric value for selected data field

LOWEST_NB

Lowest non-blank numeric value for selected data field

LOWEST_NZ

Lowest non-zero numeric value for selected data field

LOWEST_NBNZ

Lowest non-blank/non-zero numeric value for selected data field

HIGHEST

Highest numeric value for selected data field

HIGHEST_NB

Highest non-blank numeric value for selected data field

HIGHEST_NZ

Highest non-zero numeric value for selected data field

HIGHEST_NBNZ

Highest non-blank/non-zero numeric value for selected data field

LOWCHAR

Lowest character value for selected data field

LOWCHAR_NB

Lowest non-blank character value for selected data field

LOWCHAR_NZ

Lowest non-zero character value for selected data field

LOWCHAR_NBNZ

Lowest non-blank/non-zero character value for selected data field

HIGHCHAR

Highest character value for selected data field

HIGHCHAR_NB

Highest non-bank character value for selected data field

HIGHCHAR_NZ

Highest non-zero character value for selected data field

Creating and Working with TS Quality Projects

Create Common Decision Routines

12-13

Table 12-1: Create Common Decision Routines


Decision Routine

Description

HIGHCHAR_NBNZ

Highest non-blank/non-zero character value for selected data field

LEAST

least occurring value for selected field

LEAST_NB

Least occurring non-blank value for selected field

LEAST_NZ

Least occurring non-zero value for selected field

LEAST_NBNZ

Least occurring non-blank/non-zero value for selected field

LITERAL

The specified value of a Selected Data Field. Value is in


parentheses: (For Commonization Only). This example searches for
the literal value 978-436-8900:
LITERAL (978-436-8900)
The literal value must be the same length as the test field. If spaces
are required in the literal string, the entire LITERAL decision routine
must be enclosed in quotes. In the line below, the literal value 978436-8900 is preceded by four blanks, so the entire routine must be
enclosed in quotes.
LITERAL (
978-436-8900)

LONGEST

Compares the length of the test field data on one record against the
length of the data in the same field on another record. System
commonizes the longer of the two fields.
Field1 = Smith
Field2 = Smit
In this case, the contents of test field, Smith (the longer of the
two) is commonized.

MOST

Most occurring value for selected data field

MOST_NB

Most occurring non-blank value for selected data field

MOST_NZ

Most occurring non-zero value for selected data field

MOST_NBNZ

Most occurring non-blank/non-zero value for selected data field

SHORTEST

Compares the length of the test field data on one record against the
length of the data in the same field on another record. System
commonizes the shorter of the two fields.
Test field = Smith
Test field = Smit
In this case, Smit (the shorter of the two) is commonized.

Selecting the Best Record

12-14

Decision Routine Selections for a Single Field

Table 12-1: Create Common Decision Routines


Decision Routine

Description

SURVIVOR

Survivor Value Found in List (For Commonization Only)

Decision Routine Selections for a Single Field


In the examples below, we will consider 10 records, and how the
content of those records applies to ten different decision routines.
Record #
Field Contents
Record 1
123
Record 2
123
Record 3
456
Record 4
___
Record 5
___
Record 6
___
Record 7
000
Record 8
000
Record 9
000
Record 10
000
The following table shows sample decision routine results:
Routine

Searches for the

To commonize field (Records)

HIGHEST

Highest numeric value

456 (Record 3)

LOWEST

Lowest numeric value

___ (Records 4, 5 and 6)

LOWEST_NB

Lowest, non-blank numeric value

000 (Records 7-10)

LOWEST_NZ

Lowest, non-zero numeric value

___ (Records 4, 5 and 6)

LOWEST_NBNZ

Lowest, non-blank, non-zero


numeric value

123 (Records 1 and 2)

LEAST

Least occurring value

456 (Record 3)

MOST

Most occurring value

000 (Records 7-10)

Creating and Working with TS Quality Projects

Decision Routine Selections for a Single Field

12-15

Routine

Searches for the

To commonize field (Records)

MOST_NZ

Most occurring non-zero value

___ (Records 4, 5 and 6)

MOST_NBNZ

Most occurring non-blank, nonzero value

123 (Records 1 and 2)

Selecting the Best Record

12-16

Decision Routine Selections for a Single Field

Creating and Working with TS Quality Projects

13-1

CHAPTER 13

Manipulating Your Data

Manipulating Your Data

13-2
In some cases you may want to manipulate and reconstruct data
elements at certain stages of data processing. Use the Data
Reconstructor to manage various data manipulation tasks. The
Data Reconstructor is particularly useful when global data needs to
be standardized into an identical format at the end of a project.
This chapter explains how to use the Data Reconstructor. You will
perform these tasks:
Specify input, output, and DDL files for the Data
Reconstruction step
Define specific Data Reconstruction rules for each country
Set the Use Rule
Run Data Reconstruction and view results
Use the Data Browser to view the reconstructed data
Generate a single file of all your global data

Creating and Working with TS Quality Projects

Using the Data Reconstructor

13-3

Using the Data Reconstructor


The Data Reconstructor is a flexible, rule-based data
reconstruction program. It features a rich scripting language with
conditional IF/ELSE capabilities and text manipulation. This
scripting feature enables you to apply rule-based logic at any point
in a job stream or real-time process.
The Data Reconstructor reconstructs addresses from a combination
of data, elements, and postal matcher output fields. Reconstruction
rules can be used to create an input file for a database or to create
delivery address fields with specific size constraints.

Rules File
The Rules file is a plain text file that contains data reconstruction
rules, which are constructed with a special scripting language.
Country-specific rules files are included in the installation package.
These rules use nested IF/ELSE logic that includes selection and
conditional data reconstruction features.
A rules file can contain a single rule or many rules; however, only
one rule can be executed at a time.
Default Rules Files:
C:\TrilliumSoftware\tsq10r5s\<project name>
settings\xxdrrules.sto
xx is a 2-digit country code such as ca, de, gb, or us.

Manipulating Your Data

13-4

Rule Script Language


A sample usdrrules.sto file might look like this:
Rule
rule label_line
Rule Name
Keyword
#---------------------------------------#
# Output Alignment Section
#---------------------------------------#
if(out.NEWADDRL4(1:5) = "
") then
move out.NEWADDRL5, NEWADDRL4;
move " "
, NEWADDRL5;
endif;
Endrule
endrule
Keyword

Rule Script Language


The Data Reconstructor provides a rich script language to use when
writing data reconstruction rules. You can combine existing data
elements and literal values to create new data elements, based on
markers you find within the record (such as Parser and Postal
Matcher type fields and flag fields). You can use conditional logic to
accommodate special factors when reconstructing your data. Rules
can be either simple or complex, depending on your business,
country, and language requirements.

Fields

Fields are used in the script language to reference input or output


data fields (defined in the DDL files) and literal values. When used
to refer to a data field, the field-name must exactly match the
spelling and case of the name in the corresponding
DDL file.

Syntax
n.
out.
IN.
OUT.

Literal
Values

[n:n]
field name
[n:*]
(n:*)

literal value
(n:n)

OR

literal value
BLANKS
ZEROS
NULLS

Literal values are string constants that consist of any combination of


characters enclosed within either double-quotation marks () or

Creating and Working with TS Quality Projects

Data Reconstructor Rules

13-5

single-quotation marks (').


'TS Quality'

"Mary said "you can quote me!""


or

or

TS Quality'

'This is what Mary's friend said'

A literal string must begin and end with the same type of
quotation mark.
If you need to include an actual quote character in the string, you
can either enter it twice in a row or quote the entire string with the
other quote character.
Although there is no practical limitation to the length of a
literal value, this version of the Data Reconstructor limits
the total combined size of all literals to 100 KBytes.

Data Reconstructor Rules


Reserved Words
The following words are reserved words; they have special meaning
and cannot be used except for their intended purpose:
alphabetic

alphanumeric

append:0spaces

copy_all

is

append:pack

append:2spaces

left_justify

left_justify:full

NULLS

numeric

perform

proper_case

proper_case:a

else

proper_case:A

proper_case:anyline

proper_case:g

lower_case

proper_case:G

proper_case:geography

proper_case:n

then

proper_case:N

proper_case:name

proper_case:S

endif

proper_case:s

proper_case:street

right_justify:full

LT

AND

ends_with

OR

title_case

and

EQ

or

endrule

append
append_pack

GE

OUT

move

GT

Out

upper_case

BLANKS

if

pack

NE

CONTAINS

Manipulating Your Data

13-6

Precedence
IN

right_justify

ZEROS

contains

in

rule

copy

LE

STARTS_WITH

ENDS_WITH

Precedence
Precedence controls which operators are executed first in an
expression. Operators are grouped into the following levels
(from highest to lowest):
Operator type

Keyword or symbol

Relational operators

GT, LT, GE, LE, < >, <=, >=

Equity operators

EQ, NE, =, !=, <>, ==

String operators

Contains starts_with, ends_with


CONTAINS STARTS_WITH, ENDS_WITH, IS

Logical AND operator

And, and, &, &

Logical OR operator

OR, or, ||

Example
In the following expression, relational operations are performed first
(== >= and <), followed by the logical AND operation, and finally
the logical OR operation:
if(state == "CA" or zip_code >= 10000 AND zip_code < 20000)
//statement(s);
endif;
endif;
endrule

Associativity
Associativity controls how operators at the same precedence level
are grouped. All operations have left-to-right associativity.

Examples
The following expressions perform the same action:

Creating and Working with TS Quality Projects

Associativity

13-7

Example 1

if(prov == "NL" OR prov == NL OR prov ==


NB)
move Atlantic, out.region;

PE OR prov =

Example 2

if(((prov == "NL" OR prov == NS) OR prov ==


= NB)
move Atlantic, out.region;

Comments

PE) OR prov

The Data Reconstructor recognizes three styles of comments:

C Style

Begin with /*, end with */ and include all characters in between.
Comments can span multiple lines.
/*
#... Example of C style comments.
*/
Only C style comments can be embedded in the middle of a line.

C++ Style

Begin with // and extend to the end of the line. If multi-line comments
are required, the comment portion of each line must begin with //.
//
//... This is an example of C++ style comments.
//

Shell Style

Similar to C++ style comments except that # is used instead of //.


Comments begin with # and extend to the end of the line.
#
#... This is an example of shell style comments.
#

Manipulating Your Data

13-8

Associativity

Input or
Output
Dictionary?

By default, the source_field in an action statement and the first


field in an IF condition are assumed to be input fields (as defined in
the input dictionary). Also, the destination_field in an action
statement and the second field in an IF condition are assumed to be
output fields (as defined in the output dictionary).
It is possible to override these assumptions in the following ways:
Prefix an input field with in. or IN.
Prefix an output field with out. or OUT.

Example
move out.newline2, OUT.newline1;
You may declare your fields explicitly as input or output by always
including the IN. or OUT. prefix (this will also improve your scripts
readability):
if(in.gout_fail_level != "0") then
move in.line1, out.line1;
move in.line2, out.line2;
move in.line3, out.line3;
move in.line4, out.line4;
endif;

Selecting a
Portion of a
Field

The language has a built-in substring capability that allows you to


select a portion of an input or output field by specifying a position
and length after the field as [n:n].
The first n is the beginning position of the substring. The
first character in a field is considered to be in position 1.
The second n is the length of the substring. Length can
be specified as * to indicate the remainder of the field.
For example, each of these statements does the same thing:
move "CANADA, OUT.newline4;
move "CANADA, OUT.newline4[1:*];

Creating and Working with TS Quality Projects

Associativity

13-9

Substring notation can only be used with DDL fields and cannot be
used with literal values. For example, each of these statements will
generate an error message:
move BLANKS[1:10] , OUT.newline1; // will generate an error
move "CANADA"[2:*], OUT.newline2; // will generate an error

Binary
Data
Strings

A binary string constant can be either octal or hexadecimal.

Concatenat
ing Literal
Values

Literal values can be joined together using a plus sign (+) as an


operator. This can be useful when you need to create a very long
literal string or to make your scripts easier to understand.

Hexadecimal the first quote character must be preceded


immediately by an upper or lower case x and each character
is represented by its equivalent two-digit hexadecimal value
(range 00 - FF). (A special case is made for x"CR" (carriage
return) which is considered equivalent to x"0D" and "LF" (line
feed), which is considered equivalent to x"0A". For example:
X'5368656C646F6E' or x"CRLF".)
Octal the first quote character must be preceded
immediately by an upper or lower case o and each character
is represented by its equivalent three-digit octal value (range
000 - 377). For example: O"110141162164154151156147" or
o'015012'.

Example
move
+
+
move

BLANKS,
ZEROS and
NULLS

"----------------------------------"
"-------------------------------------"
"--------------------------------", dashed_line_120ch;
'Network Pathways Inc., ' +
'Suite 100-401, '
+
'1600 Bedford Hwy, '
+
'Bedford, NS, '
+
'Canada B4A 1E8'
, return_address;

The BLANKS, ZEROS and NULLS keywords can be used to set a field
entirely to blanks, zeros or binary-zeros. They can also test to see if
a field contains only blanks, zeros or binary zeros.
Manipulating Your Data

13-10

Associativity
Whenever these keywords are used, a literal value is created
dynamically with exactly the right number of blanks, zeros or
NULLS to match the size of the other fields used in the expression.
If, for some reason, all fields in an expression are BLANKS, ZEROS
or NULLS keywords, the length of the resulting literal values will be
one.

Examples

In this example, all fields used within the IF-conditions are one byte
long:
If(BLANKS == BLANKS) then
// always true
endif;
If(BLANKS == ZEROS) then
// always false
endif;
In this example, the length of the BLANKS literal will be ten bytes to
match the 10-byte substring selected from the 30-byte city field
using the city[2:10] notation.
If(city[2:10] == BLANKS) then
// characters 2 through 11 of city are blank
endif;

IF
Statements

These statements allow you to add conditional logic in your scripts


to choose between two or more options. For instance, you can
choose to build an output address from Postal Matcher fields or from
original input data based on Parser and Postal Matcher flags.

Syntax
IF [condition [and/or/AND/OR] condition]
[action statement]
[else action statement;]
ENDIF;
IF statements consist of three parts:

Creating and Working with TS Quality Projects

THEN

Associativity

13-11

the condition(s) to be evaluated


the action_statement(s) to execute when the condition(s)
are TRUE
the action_statement(s) to execute when the condition(s)
are FALSE.
When conditions evaluate as TRUE, the action_statement(s)
following the conditions are executed; otherwise, the action_
statement(s) that follow the else keyword are executed. Conditions
must be enclosed in round brackets. The then keyword is optional
and can be omitted, or included to improve readability.
When two fields of unequal lengths are compared, the
comparison is made as if the shorter of the two fields
was padded with blanks to match the length of the larger
field.

Example

If the field urban_city_name was 20 bytes long, the following two


conditions would be the same:
if(urban_city_name == "BOSTON")
if(urban_city_name == "BOSTON
")

Conditions

The program conditions include four relational conditions, two


equality conditions and six string conditions as shown in table A-15:
Table 13.1 Data Reconstructor Rules Conditions

Relational Conditions

Description

field1 GT field2
field1 > field2

Greater Than True if field1 is greater than field2

field1 GE field2
field1 >= field2

Greater Than Or Equal To


True if field1 is greater than field2 or field1 is equal to field2

field1 LT field2
field1 < field2

Less Than
True if field1 is less than field2

field1 LE field2
field1 <= field2

Less Than Or Equal To


True if field1 is less than field2 or field1 is equal to field2

Manipulating Your Data

13-12

Associativity
Table 13.1 Data Reconstructor Rules Conditions

Equality Conditions

Description

field1 EQ field2
field1 == field2
field1 = field2

Equal To
True if field1 is equal to field2

field1 NE field2
field1 != field2
field1 <> field2

Not Equal To
True if field1 is not equal to field2

String Conditions

Description

field1 is numeric

String is Numeric True if field1 contains only numerics.


Leading and trailing blanks are trimmed from the field before
making the comparison. The IS_DIGIT_FNAME table
indicates the alphabetic characters.

field1 is alphabetic

String is Alphabetic True if field1 contains only alphas.


Leading and trailing blanks are trimmed from the field before
the comparison is made. The IS_ALPHA_FNAME table
indicates the alphabetic characters.

field1 is alphanumeric

String is Alphanumeric True if field1 contains only alphas


or numerics.
Leading and trailing blanks are trimmed from the field before
the comparison. IS_ALPHA_FNAME and IS_DIGIT_
FNAME specify alphabetic and numeric characters.

field1 CONTAINS field2


field1 contains field2
field1 ~= field2

String Contains True if field2 is found anywhere within


field1. Leading and trailing blanks are trimmed from both
fields before comparisons.

field1 STARTS_WITH field2


field1 starts_with field2
field1 ~< field2

String Starts With True if field1 starts with field2.


Leading and trailing blanks are trimmed from both fields
before comparisons.

field1 ENDS_WITH field2


field1 ends_with field2
field1 ~> field2

String Ends With True if field1 ends with field2. Leading


and trailing blanks are trimmed from both fields before
comparisons.

Creating and Working with TS Quality Projects

Associativity

13-13

Example
This is an example using all twelve conditions.
if(zip_code GT "10000"
zip_code LT "50000"
pr_rev_group GE "008"
pr_rev_group LE "010"
pr_gout_fail_level == "0"
state != "NY"
first_name starts_with "PH"
last_name ends_with "ING"
company_name contains "TAXI"
in.birth_date is numeric
postal_code[1:1] is alphabetic
company_name is alphanumeric
move "1", flag;
else
move "0", flag;
endif;

Logical
Operators

AND
AND
AND
AND
AND
AND
AND
AND
AND
AND
AND
) then

IF conditions can be combined using logical AND and OR operators


to create compound conditions.

Table 13-2: Data Reconstructor Logical Operators


Logical Operators

Description

condition1 AND condition2


condition1 and condition2
condition1 && condition2

Logical AND. TRUE only if both condition1 and condition2


are TRUE.

condition1 OR condition2
condition1 or condition2
condition1 || condition2

Logical OR. TRUE if either condition1 or condition2 is


TRUE.

The order of evaluation of compound conditions is described in the


section Precedence and Associativity. (See Rules File on page
13-3.) The usual order can be altered using brackets to group the
conditions to be evaluated first. See the following example.
Manipulating Your Data

13-14

Associativity

Example
if((pr_rev_group == "000" OR pr_rev_group == "009") AND
pr_gout_fail_level == "0") then
/* Construct a new address from postal matcher output
fields */

Nested IF
Statements

You can create nested IF statements in which one IF statement is


embedded within another.
IF [condition1]
IF [condition2]
[Action statement1]
ELSE
[Action statement2]
ENDIF
ELSE
[Action statement4]
ENDIF

Example
rule LABEL1
if(gb_out_match_level = "0") then
if(gb_out_dpndthorough_name <> BLANKS) then
move
gb_out_house_number,
nwaddrl3;
append gb_out_dpndthorough_name, nwaddrl3;
append gb_out_dpndthorough_desc, nwaddrl3;
append gb_out_thorough_name
, nwaddrl4;
append gb_out_through_desc
, nwaddrl4;
else
move
gb_out_house_number
, nwaddrl3;
append gb_out_thorough_name
, nwaddrl3;
append gb_out_through_desc
, nwaddrl3;
endif;
endrule
The sample shows one rule definition called LABEL1, which will
either populate output fields nwaddrl3 or nwaddrl4, depending on
whether the field gb_out_dpndthorough_name is blank or not, as
long as the record had a match level of 0. Both nwaddrl3 and
Creating and Working with TS Quality Projects

Associativity

13-15

nwaddrl4 fields are populated if there was data in the dependent


thoroughfare name field.

Action
Statements
Syntax
verb [:modifier] [source field] [,] [destination field] ;
- orperform rule_name;
Some action statements may include a modifier that changes their
operation slightly. The modifier must immediately follow the verb
and be delimited from it with a single colon.
For example, the append:2spaces statement works like the
append statement, with the exception that two spaces are used for
a delimiter instead of one. The comma separating the source-field
from the destination-field is optional.
Specific action statements take either no, one, or two arguments as
described in the following sections.

Action
Statements
Require
No
Arguments

There is one action statement that requires no arguments:

Statements

Description

copy_all

Copies all corresponding input fields to output fields. Fields are


considered to correspond if they have the same name in both input and
output DDL files. Any output fields that do not correspond to input fields
are reset to blanks.
When DDL names are the same, copy_all moves the entire input record
to the output record instead of performing field-by-field moves as an
optimization.

The copy_all statement resets to blanks any output field


Manipulating Your Data

13-16

Associativity
that has no corresponding input field. For this reason, it
should always be used at the beginning of your script.

Action
Statements
Require
One
Argument

The script language has six action statements that require a single
argument. In each of these statements, the lone argument is used
to specify a destination field and cannot be a literal value.

Statements

Description

pack

Removes all blank characters from the destination field.

upper_case

Converts all of the characters in the destination field to upper case.

lower_case

Converts all of the characters in the destination field to lower case.

title_case

Converts all characters in the destination field to a mix of upper and


lower case. The first alphabetic character (and any alphabetic that
follows a non-alphabetic one) are converted to upper case; the
remaining characters are converted to lower case.
A special exception is made for apostrophe-s which is converted to
lower case. For example "MARY-JANES BAKERY" would be changed
to "Mary-Jane's Bakery".

right_justify

Right-justifies the contents of the destination field. Removes any


trailing blanks from the contents of the field.

left_justify

Left justifies the contents of the destination field. Removes any


leading blanks.

right_justify:full

Right justifies the contents of the destination field and converts


each occurrence of multiple blanks to a single blank. For example,
given a 20 character field containing the following value: "EXPIRY
20001127 ",
right_justify:full produces:"

Creating and Working with TS Quality Projects

EXPIRY 20001127"

Associativity

13-17

Statements

Description

left_justify:full

Left justifies the contents of the destination field and converts each
occurrence of multiple blanks to a single blank. For example, for a
20-character field containing this value:
"

THE

PIT STOP "

left_justify:full produces:"THE PIT STOP

"

proper_case

Converts all characters in the destination field to a mix of upper and


lower case using an external UPLOW table. When no corresponding
entries are found in the UPLOW table, the destination field is still
converted to mixed upper/lower case using title_case logic.

proper_case:a
proper_case:A
proper_case:anyline

proper_case:g
proper_case:G
proper_
case:geography

Indicates that the proper_case statement is not for any specific


line type. Only the ("A") line-type entries in the UPLOW table will
be searched. This is the default operation when no modifier is
specified.
proper_case for a field containing name information.
Searches the ("N") line-type entries in the UPLOW table, followed
by the ("A") line-type entries if a match was not found in the "N"
entries.
proper_case for a field containing street information.
Searches the ("S") line-type entries in the UPLOW table, followed by
the ("A") line-type entries if a match was not found in the "S"
entries.
proper_case for a field containing geography information.
Searches the ("G") line-type entries in the UPLOW table, followed
by the ("A") line-type entries if a match was not found in the "G"
entries.

perform

Causes all statements in another rule to be executed.

proper_case:n
proper_case:N
proper_case:name
proper_case:s
proper_case:S
proper_case:street

Example:

perform

fix_name_line

This will execute all statements in the previously defined fix_


name_line rule in the same rule-file. You can only perform a rule
that has already been defined within the rule file.

Manipulating Your Data

13-18

Associativity

Action
Statements
Require
Two
Arguments

The first argument specifies a source-field or a literal value and the


second argument specifies the destination-field.

Statements

Description

copy

Copies the contents of one field to another, adjusting the data type (if
necessary) to match the description of the output field in the DDL. If
the first argument is a literal, a move operation is performed instead of
a copy.

move

Moves one text field to another. Unlike copy, no conversion from one
data type to another is attempted.
If source-field is longer than destination-field, it is truncated during the
move. If source-field is shorter than destination-field, the destinationfield is padded with blanks after the move.

append

Appends the contents of one field to the end of the contents of another
field, after first adding a single blank character as a separator.
If the destination-field is currently empty (all blanks) then a move
operation is performed instead of append. This makes it possible to
perform a series of append operations on the same destination-field
without creating unwanted blanks at the beginning of the field.
If there is not enough room at the end of the destination-field, the
source-field will be truncated to fit. There must be at least 2 blanks at
the end of the destination-field before an append operation will be
attempted.

append_pack
append:pack
append:0spaces

Works like the append statement, but without the blank separator.
Appends the contents of one field directly to the end of contents of
another field.
There must be at least 1 blank at the end of the destination-field before
this operation will be attempted.

Creating and Working with TS Quality Projects

Associativity

13-19

Statements

Description

append:2spaces

Appends contents of one field to the end of the contents of another field
after first adding two blank characters as a separator. May be required
in some countries (e.g. Canada) to separate the postal-code from the
remainder of the line.
If there is not enough room at the end of the destination-field, the
source-field will be truncated to fit. There must be at least 3 blanks at
the end of the destination-field before an append:2spaces operation
will be attempted.

Overlapping
Fields

Move, append and append_pack/append:pack operations with


source and destination-fields that overlap in memory are fully
supported. These operations are completed as if a temporary copy
of the source-field had been made before the operation started.

Example
move "TRILLIUM,
out.temp;
move out.temp[2:4], out.temp[1:4]// following this move the out.temp
// field will contain RILLIUM"

String
Variables

String variables must be declared with a STRING keyword before


they are used. String variables can be declared in two places:
At the beginning of the rules file, before any rules are
defined
At the beginning of a specific rule, before the first action
statement
String variables have specific properties:
Names begin with a dollar sign: $NAME
They are also case-sensitive ($NAME vs. $name vs.
$Name)
They have a default length of 256 characters, unless a
different length is specified at the time theyre first declared

Example

String variables may be used any place in a rule that a DDL field
name can be used.

Manipulating Your Data

13-20

Input and Output Settings


STRING $LAST_NAME[30];
STRING $last_name;
STRIN $BigBuffer[10000]

// 30ch long
// 256ch long
//10,000ch long

rule

sample1
STRING $name[50];
move in.first_name, $name;
append in.last_name, $name;
move $name, out.full_name;
endrule;

Input and Output Settings


In this example, the Data Reconstructor uses the output from the
Create Common Utility step as input. By using the same output DDL
(global DDL) for all country-specific data, you can standardize the
global data into the same format.
The global DDL contains these fields:
Original name and address fields
Country-specific Postal Matcher match level codes
Reconstructed name and address fields based on new
standardized, enriched and linked data fields
(NEWADDRESSL1 - 10)
Original user-defined fields
To specify input and output files
1.

Open the Data Reconstructor step and select the Input


Settings tab.

2.

Specify a file name in the Input File Name and Input DDL
Name text boxes.

Creating and Working with TS Quality Projects

Input and Output Settings


Tip:
You can either
edit the file names
manually or click
the File Chooser
icon to
browse
for and
select the file.
To view the
contents of your
data file,
click the
Data
Browser
icon.
Use the
Dictionary
Editor to view the
contents
of the
DDL file.

13-21

3.

Select the Output Settings tab.

4.

Specify the Output File Name and Output DDL Name.


Enter a file name in the Statistics File Name and Process
Log Name text boxes.

To specify the input/output file qualifiers


A File Qualifier is a unique name given to a data file. Each
input and output data file must have its own unique file
qualifier.
1.

Click Advanced and navigate to Input, Settings.

2.

Specify Input Data File Qualifier (default is INPUT).

3.

Click Advanced and navigate to Output, Settings.

4.

Specify Output Data File Qualifier (default is OUTPUT).

You can also specify the following settings:


To specify the maximum number of records to process
1.

Click Advanced and navigate to Input, Settings.

2.

Enter a numeric value in Process a Maximum of. This


specifies the mamimum number of records to process. By
default, all records will be processed.

To process every nth record only

Valid delimiters
are Tab, Space,
Semicolon,
Comma, and
Pipe. Characters
other than those
listed must be
enclosed by
quotation marks.

1.

Click Advanced and navigate to Input, Settings.

2.

Enter a numeric value in Process Nth Sample. This


specifies that only every Nth record will be processed. By
default, all records will be processed.

To use a delimited file


If you are using a delimited file for input and/or output, you
must specify delimited settings.
1.

Click Advanced and navigate to Input, Settings.

2.

Select Input Data File Delimiter Encoding and Input


Data File Delimiter from the drop-down list.
Manipulating Your Data

13-22

Settings for the Data Reconstructor


3.

For output, click Advanced and navigate to Output,


Settings.

4.

Select Output Data File Delimiter Encoding and Output


Data File Delimiter from the drop-down list.
See Encoding (Code Page) on page A-3 for more
information on encoding.
You can specify records to either Select or Bypass
under certain conditions in both input and output files.
See Select or Bypass Records on page 5-37 for
instructions.

Settings for the Data Reconstructor


Once you have specified input and output files, you can specify the
settings used to process your data. The settings for processing are
managed in the Advanced Settings window.

Setting the Rules File


To specify the rules file
1.

Click Advanced and navigate to Process, Settings.

2.

Specify the Rules File.


Example:
C:\TrilliumSoftware\tsq10r5s\tmt\settings\
usdrrules.sto

Setting the Use Rule


Each rules file can contain a number of rules available for use. Each
rule begins with the rule keyword and ends with the endrule
keyword. You must specify which rule you are using.

Creating and Working with TS Quality Projects

Setting the Use Rule

13-23

Example
A sample usdrrules.sto file might look like this:
Rule
rule label_line
Rule Name
Keyword
#---------------------------------------#
# Output Alignment Section
#---------------------------------------#
if(out.NEWADDRL4(1:5) = "
") then
move out.NEWADDRL5, NEWADDRL4;
move " "
, NEWADDRL5;
endif;
Endrule
endrule
Keyword

To specify the Rule


1.

Click Advanced and navigate to Process, Settings.

2.

Specify the rule name in the Use Rule field. For the example
above, enter label_line. If you are using multiple rules in
the Rules File, place a comma after each rule.

Rule File
Use Rule

Figure 13.1 Rule Settings

Manipulating Your Data

13-24

Additional Settings

Additional Settings
You can also specify the following settings:
See Data
Reconstructor
in the TS
Quality
Reference
Guide for
complete
settings
information.

To use alphabetic characters table


You can include a table which identifies characters that are
alphabetic characters. This setting may be required for the
special characters found in many languages. When this table is
not specified, your operating systems default settings will be
used.
This table is used by the is alphabetic, is alphanumeric,
proper_case and title_case rules.
1.

Click Advanced and navigate to Process, Settings.

2.

In Alpha Defines Table, enter the path and file name of


your alphabetic character table.

To use numeric digits table


You can include a table which identifies characters that are
numbers. This setting may be required for the special characters
found in many languages. When it is not specified, your
operating systems default settings will be used.
This table is used by the is numeric and is alphanumeric
rules.
1.

Click Advanced and navigate to Process, Settings.

2.

In Numeric Defines Table, enter the path and file name of


your numeric digit table.

To use lowercase/uppercase translation table


You can include a table used to translate characters to all lower
or upper case. This setting may be required for the special
characters found in many languages, or to convert from one
code page to another. When this table is not specified, your
operating systems default settings will be used.

Creating and Working with TS Quality Projects

Additional Settings

13-25

This table is used by the proper_case, title_case, and lower_


case/upper_case rules.
1.

Click Advanced and navigate to Process, Settings.

2.

In Lowercase Translation File or Uppercase Translation


File, enter the path and file name of your table.

To enable debug function


1.

Click Advanced and navigate to Process, Settings.

2.

Select Enable Debug Output.

3.

In the Debug File text box, accept the default path and file
name, or specify the file to which debugging information will
be written.

To count number of records processed


1.

Click Advanced and navigate to Process, Settings.

2.

In the Sample Count text box, specify the number that


indicates the increment sample of records to read and
attempt to process from an input data file.
This count will be written to the Process Log file. To
display the Log file, select the Results tab and
navigate to the Process Log tab after the program is
run. The default is always 1.

To specify settings file encoding


1.

Click Advanced and navigate to Process, Settings.

2.

In Settings File Encoding, select the encoding from the


drop-down list.
See Encoding (Code Page) on page A-3 for more
information on encoding.

Manipulating Your Data

13-26

Run Data Reconstruction and View Results

Run Data Reconstruction and View Results


To run Data Reconstructor and view results

When you click


Run,
TS Quality
automatically
saves your
settings.
To save your
settings
without running
the step, click
Save.

1.

Click OK to close the Advanced Settings.

2.

Click Run to run the Data Reconstruction step.


You can also right-click on a step and select Run
Selected.

3.

Select OK at the Message box.

4.

On the Results tab, the Statistics sub-tab appears.

5.

Navigate to the Output Settings tab and click the Data


Browser button next to the Output File Name.

6.

In the Field Selection window, select the fields you used for
the Data Reconstruction process, such as NEWADDRL1 NEWADDRL5.

7.

Click Display to review the data and ensure that it has been
reconstructed properly.

Creating and Working with TS Quality Projects

Bringing the Data Together

13-27

Bringing the Data Together


The final step in the project uses the Transformer to merge multiple
files into a single output file of global data. On input, select only the
survivor records for inclusion in the single output data file. On
output, you will have a file of survivor records with the accurate
account representative assignment and the format required for
each country.

Add a Global Transformer step


To merge data from multiple countries together, you must add a
Global Transformer step to the project.
To add a Transformer step to the project
If the Data Flow
Architect area
is locked,
unlock it by
right clicking in
the Data Flow
Architect and
selecting Lock.

1.

From the Main menu, click Edit, Add new project step
from palette. The palette displays all the available steps.
You can also select the List View tab and click Add
New Step from Palette. Right-click anywhere in the
Data Flow Architect area and select Add New Step
from Palette.

2.

Under the Standardization category, select Transformer.


Drag and drop this step onto the Data Flow Architect area.

3.

In the Choose Country Name box, select Global.

4.

In the Provide a Unique Step Name box, enter a step


name: for example, Transformer at End.

5.

Drag and drop this Transformer at End step to the end of


the country flows and attach the Data Reconstruction steps.

6.

To connect the steps, click the right connection area on the


Data Reconstruction step, then click anywhere on the
Transformer at End step.
Alternatively, you can right-click the steps and select Start
Connection or End Connection from the pop-up menu.
Manipulating Your Data

13-28

Add a Global Transformer step

Figure 13.2 Connecting Steps in Data Flow Architect


See Using the Data Flow Architect on page 2-20 for
detailed instructions on how to connect steps.

Creating and Working with TS Quality Projects

Input and Output Settings

13-29

Input and Output Settings


To specify input and output files

Tip:
You can either
edit the file names
manually or click
the File Chooser
icon to
browse
for and
select the file.
To view the
contents of your
data file,
click the
Data
Browser
icon.
Use the
Dictionary
Editor to view the
contents
of the
DDL file.

1.

Open the Transformer at End step and click Input


Settings.

2.

Select the Input Data and Input DDL files for the first country
(for example, Canada). Click Add.

Figure 13.3 Input File Settings


3.

Add the other countries input data files and DDLs.

Figure 13.4 Input File Settings for All Countries


4.

Select the Output Settings tab.

5.

Select the Output File Name and Output DDL Name.

Figure 13.5 Output File Settings


6.

Enter a file name in the Statistics File Name and Process


Log Name text boxes.

Manipulating Your Data

13-30

Select and Bypass Records

Select and Bypass Records


The final step should use only survivor records as input; that is,
records with a 1 in the Survivor_flag field. This will create a final
output file with one contact per business. This record will also have
the commonized account representative data. To achieve this
result, use the Select and Bypass Condition function on the input
files.
To build a Select/Bypass Condition
See Select or
Bypass
Records on
page 5-37 for
more
information.

1.

Click Advanced and navigate to Input, Settings, Select


Record Conditions.

2.

Click the first condition row and select Edit Condition. The
Logic Builder window appears.

3.

Double-click Survivor_flag in the DDL Fields list.

4.

Double-click EQUAL TO in the Operators list. Enter 1. This


operation will select only records that have a 1 in the
Survivor_flag field.

5.

Click Apply and Close. Remain on the Advanced Settings


window.

Figure 13.6 Select Bypass Logic Builder


Creating and Working with TS Quality Projects

Process Settings

13-31

6.

Select the next input file name under Input Files and repeat
the steps.

7.

Repeat for all remaining input files.

Process Settings
Once you have specified input and output files, you can specify
settings used to process your data. The settings for processing are
managed in the Advanced Settings window.

Source Identification
Since you are merging multiple records into one file, an input
source identifier should be applied. This indicator designates the
origin of the record.
To specify source identification

The File
Source field is
set to four
characters in
the standard
DDL.

1.

Click Additional Settings.

2.

Enter File Source to indicate the input origin of the record.


For example, use CA for the records from the Canadian file.
Remain on the Advanced Settings window.

Figure 13.7 Source Identification


Manipulating Your Data

13-32

Run Transformer and View Results


3.

Select the next input file name under Input Files and repeat
the steps.

4.

Repeat for all remaining input files.

Run Transformer and View Results


To run Transformer and view results

When you click


Run,
TS Quality
automatically
saves your
settings.
To save your
settings
without running
the step, click
Save.

1.

Click OK to close the Advanced Settings.

2.

Click Run to run the Transformer at End step.


You can also right-click on a step and select Run
Selected.

3.

Select OK.

4.

Summary results appear on the Results tab of this step.

Figure 13.8 Summary Resutls


Creating and Working with TS Quality Projects

Run Transformer and View Results

13-33

5.

Notice that only selected records from each input file are
included.

6.

Navigate to Output Settings and click the Data Browser


button next to the Output File Name.

7.

In the Field Selection window, click Add All and Display.

8.

Review the records and fields selected by this process. The


final records would look like this:

Figure 13.9 Final Record Output

Manipulating Your Data

13-34

Run Transformer and View Results

Creating and Working with TS Quality Projects

14-1

CHAPTER 14

Packaging Projects

Packaging Projects

14-2
Most users create a script from the steps in the project. This
chapter will describe how to create and run a script, and provides a
summary of Real-Time Processing. Real-Time processing can ensure
that new data entering the database is transformed, cleansed,
enriched and linked at the point of entry. The TS Quality Analyzer
tool can demonstrate a Real-Time processing environment.
This chapter focuses on several tasks:
Use the List View to order and select steps for inclusion to a
batch script
Generate a script to run all selected steps
Save, view and run the script
Export and import projects
Understand the Director architecture and the role of the
cleansing/matching servers
Move from a batch environment to a real-time environment
Understand the role of the business rules
Use the TS Quality Analyzer to sample real-time cleansing
and matching

Creating and Working with TS Quality Projects

Batch Script

14-3

Batch Script
You can combine projet steps into a batch script that can be run
on UNIX or Windows platforms. The Windows interface lets you set
up a project that exists on UNIX (by using mapped network drives),
but that can be run using the local Windows client.

Create a Script
Before generating a script to run your project, you must select the
steps to be included.
To select steps
1.

In the List View in the Control Centers main window, use


CTRL+click and select the desired steps.
If you want to select a block of contiguous steps, click on
the first step desired and then SHIFT+click on the last
step desired.
If you want to select all the steps in the List View, click on
Select All Steps
on the tool bar.

To generate a batch script


All batch files
have the
extension of
.bat.

1.

Click Generate Script to Run


Batch Process window will appear.

on the tool bar. The

Figure 14.1 Batch Process Window


Packaging Projects

14-4

Edit a Script
2.

On the Save As tab, specify the name and location of the


script.

3.

Click Save to save your steps for a batch process.

Edit a Script
To edit a batch script
1.

The Modify tab allows you to view and edit the batch file.
Select the Modify tab. The steps in the batch file are listed
on the left and the file contents appear on the right.
If you click a step on the left, the right pane will
automatically page down to the steps section in the file.
Any remarks are preceded with a *rem*.

Figure 14.2 Batch Script - Modify Tab

Creating and Working with TS Quality Projects

Run a Script

14-5

2.

To edit the script, click Edit. This will open the script in the
WordPad editor.

3.

Make your changes and save the file.

4.

Click Refresh

to update the file.

Run a Script
To run a batch script
1.

Select the Run tab. From this tab, choose one of the
following methods to run the batch file:
Run as an attached process - If you run the batch file
as an attached process, the Control Center must remain
open. The Notify when job is done running check box
is active only when this option is selected.
Run as a detached process - If you run the batch file
as a detached process, the process will run independently
of the Control Center.

Figure 14.3 Batch Script - Run Tab


2.

Select the appropriate radio button, and select Notify when


job is done running if necessary.

Packaging Projects

14-6

Create Multiple Batch Files


3.

Click Run to start the process. If you run the batch as an


attached process, you will see the following Message box
when the process is complete.

Figure 14.4 Batch Script - Complete Message


4.

Click OK. Review the Console Results.


If there are no messages in the Console Results, the
batch job ran successfully.

Create Multiple Batch Files


After running the batch process, the Batch Process window remains
open. If you want to run another batch process, you must close
the window from the first batch process that was run. A new
batch file requires a fresh batch window.

Creating and Working with TS Quality Projects

Exporting/Importing Projects

14-7

Exporting/Importing Projects
The Export and Import functions take a Control Center project and
relocate it to a new release directory, or to another machine,
without loss of project steps. This feature lets you move project
contents to different platforms and drive locations.
Whether an upgrade is taking place or a project is being moved
from one licensed server to another, Import/Export procedures
make it easy to move your data quality process from one location to
another. This feature also allows previous versions of the TS Quality
to be migrated successfully into the current environment.
Projects created with TS Quality are fully exportable and can be
imported without loss of user-defined steps. These projects can be
exported into a TS Quality directory.
In order to export and import a project to another physical
machine, the project directory and its accompanying
subdirectories must be physically copied to a media device
and transferred to the other machine.

Packaging Projects

14-8

Export Projects

Export Projects
The first step of the export/import procedure is to export the
project.
To export a project
1.

In the Control Center main window, double-click the project


that you want to export.

2.

Select File, Export Project...

Figure 14.5 Export Project


3.

In the Export Project window, accept the default exported


project file name, or provide the correct file name. The
default file name adds a .zip extension to the current file
name.

4.

Click Start Export. The export process starts.

5.

Select OK.

Creating and Working with TS Quality Projects

Import Projects

14-9

Import Projects
The second step of the export/import procedure is to import the
project into the current environment.
To import a project
1.

In the Control Center main window, select File, Import


Project...

Figure 14.6 Import Project


2.

Enter file names for the Project to Import (the .zip file),
New Project Directory and New Project Name. Use the
navigation buttons to browse for the file and directory.

3.

Under Original Path, you should see the old path for the
project, postal table, census table and software. If you want
to change this information, specify the new location under
the New Path area and click Substitute.

4.

Click Start Import. The import process starts.

5.

Select OK. The old project will be re-created in the new


location.

Note that the imported project is set up exactly as it appeared

Packaging Projects

14-10

Import Projects from Windows to UNIX


before it was exported:
All steps will be included and every step module will be in
the same place in the Data Flow Architect window as it was
before the project was exported.
Settings files for each step retain their information. The file
contents themselves are unchanged, except in cases where
the platform of the import project was different than the
platform it was created on when it was exported. In this case,
the slashes in paths will change to the direction that is
correct for the new platform. If relative paths were used
within the settings files, this feature will save some effort in
changing path locations within the files.

Import Projects from Windows to UNIX


If the original platform was Windows and the new platform is UNIX,
the slash direction will be reversed. The paths that include drive
letters will need to be corrected manually to specify the UNIX
location.
Using relative paths (for example, ..\ddl\input.ddl) will save work if
you know in advance that your project will be moved from Windows
to UNIX. It is the users responsibility to modify all path definitions
in every file and each run command correctly once the import is
complete.
We recommend testing the export process by moving or
removing some data files from the original project before
exporting the project or moving it onto a portable media
device.

Creating and Working with TS Quality Projects

Real-Time Processing

14-11

Real-Time Processing
Now that the batch process is in place, you can leverage the
business rules which you have developed in batch and use them in
a real-time environment. By implementing a real-time solution,
your data can be transformed, cleansed, enriched and linked at
point of entry. This section provides an overview of the TS Quality
real-time processing with the Director. The Director for TS
Quality is an optional application. For information on the Director
for TS Quality, please contact your Trillium Software sales
representative.

The Director
The Director acts as a registry for cleansing and matching servers
that are made available to the calling environment.

Figure 14.7 Director Architecture Overview


Packaging Projects

14-12

Cleansing Server

Cleansing Server
TS Quality uses a single API approach to simplify the task of moving
data through the various country specific modules. The simple API
eliminates the need for the programmer to know the internal
workings of each TS Quality function. This interface uses a single
configuration file to enable simple construction of complex
transactional data quality processing. The business rules developed
in the batch application are used in this configuration file.

Matching Server
The Matching Server supports reference matching, allowing you to
compare an incoming record to the database of existing records.
Match results are returned to the calling application.

Figure 14.8 Directory Architecture - Startup Process

Creating and Working with TS Quality Projects

Real-Time Transaction Processing

14-13

Real-Time Transaction Processing


Once the Director and Cleansing/Matching servers are started, a
transaction record can be processed by the calling application.
Initially, the client makes a request for the cleansing services. The
Director provides a handle to the cleansing server so that the calling
application can communicate directly with the cleansing server. The
application sends the data to the cleansing server and the cleansed
data is returned to the calling application. The handle is then
released back to the Director. The process is repeated for matching.

Figure 14.9 Director Architecture - Record Processing


Through the reference match function in the Relationship Linker,
duplicate records can be identified so that they do not enter the
database.

Packaging Projects

14-14

Moving From Batch to Real-Time

Moving From Batch to Real-Time


Business rules designed for the batch process can also be used as
resources for real-time processing. Batch process DDLs and
settings files can be reused in real time. An XML configuration file,
specific to the Director Architecture, is used to control the real-time
process modules.Arguments in this file make individual modules
available to the calling application.

Linking Single Record Using the TS Quality


Analyzer
You can use the TS Quality Analyzer to watch real-time processing.
The TS Quality Analyzer application is written in C# and acts as the
transaction broker for the TS Quality real-time interface.

Process Method
When a record is entered into the data entry window, it is collected
and sent to the real-time cleansing engine when the user clicks the
Cleanse button. The cleansed results are displayed on the screen.
Next, the cleansed transaction record can be compared to records
in the master database with the Relationship Linkers reference
match function. Candidate records are retrieved from the database
for the given window key. The transaction record and the retrieved
records are then sent to the reference match function for
comparison.
The calling application must retrieve the records from the
database using the window key from the transaction
record. The retrieval of records is not a function of TS
Quality.
If a match is found, the matched record is displayed. If no match is
found, a message will be displayed on the screen.
See Matching on page 8-8 for instructions on how to run
the Matching function using the TS Quality Analyzer.
Creating and Working with TS Quality Projects

15-1

CHAPTER 15

Working from the


Command Line

Working from the Command Line

15-2
This chapter describes commands that run the TS Quality modules
on UNIX and 32-bit PC platforms. Use these commands for two
purposes:
run modules
create log files

Creating and Working with TS Quality Projects

Executing TS Quality Modules

15-3

Executing TS Quality Modules


All TS Quality modules can be run from the command line.
Before you run a module, make sure that your
environment variables are properly set. See
Getting Started with TS Quality for instructions.

Syntax
The syntax for command line execution is:
<program_name>

<settings_file_name>

<log_file_name>

where:

Example

program_name

The modules executable. See the next page for a


complete list of program names.

settings_file_name

The path and name of the modules settings file.

log_file_name

The path and name of the modules log file. A log


file displays any processing errors in the program.
This command is optional.

This is a sample command to execute the Transformer.


tranfrmr

..\settings\ustranfrmr.stx ..\data\error.log

Working from the Command Line

15-4

Program Names

Program Names
Table 15.1 contains a complete list of program names. Use the
appropriate program name in your command line:

Table 15.1 Command Line Execution Program Names


Module

Program Name

Transformer

tranfrmr

Customer Data Parser

cusparse
apparse (China, Japan, Korea, and Taiwan)

Parsing Customization

prcustom
apcustom (China, Japan, Korea, and Taiwan)

Business Data Parser

busparse

Postal Matchers

xxpmatch (xx=country code)


Examples:
aupmatch (AU) capmatch (CA) depmatch (DE)
hkpmatch (HK) gbpmatch (GB) uspmatch (US)
tqpmatch (TQ)

appmatch (China, Japan, Korea, and Taiwan)


Global Data Router

globrtr

Window Key Generator

winkey

Relationship Linker

rellink

Create Common

common

Data Reconstructor

datarec

File Display Utility

tsqdisp

File Update Utility

fileupdt

Creating and Working with TS Quality Projects

Program Names

15-5

Table 15.1 Command Line Execution Program Names


Frequency Count Utility

tsqfreq

Merge Split Utility

mrgsplit

Resolve Utility

resolve

Set Selection Utility

tsqsetsl

Sort Utility

tsqsort

Working from the Command Line

15-6

Program Names

Creating and Working with TS Quality Projects

16-1

CHAPTER 16

Working with the TS Quality


Utilities

Working with the TS Quality Utilities

16-2
TS Quality offers a number of utilities to perform specific tasks. All
utilities can be executed from the TS Quality Control Center, in a
batch process, or from the command line. This chapter explains
how to use these utilities:
File Display utility
File Update utility
Frequency Count utility
Merge Split utility
Resolve utility
Set Selection utility
Sort utility
Each utilitys basic settings, such as input and output settings, are
the same as the TS Quality core modules. This chapter focuses on
the process settings information (the Advanced Settings window)
for each utility.
Refer to the TS Quality Reference Guide for
complete settings information on each utility.

Creating and Working with TS Quality Projects

File Display Utility

16-3

File Display Utility


The File Display Utility lets you create a customized display copy
of a file without altering the contents of the original file. For
example, use the File Display Utility to review your output data
after you run the Relationship Linker to determine if your linking
results are accurate, or if you need to tune your business rules to
improve the results.
You can also use the File Display Utility to create a new file that
organizes the original files data to meet specific display
requirements.

Input and Output Settings


The File Display Utility can use the input file and input DDL from any
other step. This utility is most often used to display and verify
results from the Relationship Linker. In this case, use the Input File
Name and Input DDL Name from the output of the Relationship
Linker step.
The File Display Utility does not use a DDL for output.

Outer Key and Inner Key


The data that will be displayed is grouped by Outer Key and Inner
Key.
Outer Key - Creates a group or records which have the
same value in the field specified in the Outer Key Field.
Inner Key - Creates a group of records within the outer
key that have the same value in a field specified in the Inner
Key Field.
For example, the data can be grouped by LEV1_MATCHED (Outer
Key) and LEV2_MATCHED_IN_LEV1_MATCHED (Inner Key) as
shown in Figure 16.1:

Working with the TS Quality Utilities

16-4

Outer
Key
Group

Outer Key and Inner Key

Inner
Key
Group

Figure 16.1 Outer Key Group and Inner Key Group


To specify outer key and inner key fields
1.

From the File Display Utility step, click Advanced and


navigate to Input, Settings.

2.

Select the Outer Key Field and Inner Key Field from the
drop-down menu.

In addition, you can specify the following settings for the outer key
and inner key:
Setting

Description

Inner Key Field


Encoding

Encoding type used by Inner Key Field. If not


specified, the platforms native encoding is used by
default.

Creating and Working with TS Quality Projects

Title and Delimiters

16-5

Setting

Description

Inner Key Compare


Bytes

Number of Inner Key Field bytes to use. By default,


this is the field length of Inner Key Field.

Outer Key Field


Encoding

Encoding type used by Outer Key Field. If not


specified, the platforms native encoding is used by
default.

Outer Key Compare


Bytes

Specifies the number of Outer Key Field bytes to


use. By default, this is the field length of Outer Key
Field.

Title and Delimiters


You must specify the title and delimiters for the title section, the
outer key lines, and the inner key lines.
Title
Delimiters

Outer Key
Delimiters

Inner Key
Delimiters

Figure 16.2 Title, Outer Key, Inner Key Delimiters


In this example, the lines in the reports title section are separated

Working with the TS Quality Utilities

16-6

Title and Delimiters


by a series of forward slashes (/), while the outer lines are
separated by an asterisk (*) and the inner lines are separated by a
hyphen (-).
To specify the title

A red flag
indicates a
REQUIRED
settings.

1.

Click Advanced and navigate to Output, Settings.

2.

In Title 1, enter the first title line of the report. This line
must be enclosed within quotation marks ( ) if there is a
space between characters. For example: Matching Report

3.

If necessary, specify a second title line in Title 2 and a third


title line in Title 3. Those lines must also be enclosed by
quotation marks ( ) if there is a space between characters.

4.

In Title 1, 2, 3 Encoding, specify the encoding for each line


if necessary. If not specified, the platforms native encoding
is used by default.

To specify the delimiters


1.

Click Advanced and navigate to Output, Settings.

2.

In Title Line Delimiter, specify a line delimiter for the title


line.

3.

In Outer Key Delimiter, specify a line delimiter for the


outer key. A Tab, Space, Semicolon, Comma, Pipe, or
any other character may be used. Characters other than
those listed must be enclosed by quotation marks.

4.

In Inner Key Delimiter, specify a line delimiter for the


outer key. A Tab, Space, Semicolon, Comma, Pipe, or
any other character may be used. Characters other than
those listed must be enclosed by quotation marks.

5.

In the Encoding settings for each line, specify the encoding


if necessary. If not specified, the platforms native encoding
is used by default.

Creating and Working with TS Quality Projects

Title and Delimiters


6.

16-7

In addition, you can specify the following settings for the


outer key and inner key:

Setting

Description

Carriage Return

Indicates how the end of line is indicated in the report,


which affects how the report displays on different
platforms. When checked, a line feed is used to indicate
the end of a line (UNIX platforms). If unchecked, a
carriage return/line feed is used to indicate the end of a
line (all other platforms).

Maximum Lines
Per Page

Numeric value that specifies the maximum number of


lines to display on a page. The default is 66.

Compress Blank
Lines

When checked, compresses blank lines on the report.

Inner Break
Spacing

Number of separator lines to print for the break of the


inner set. The default is 1.

Outer Break
Spacing

Number of separator lines to print for the break of the


outer set. The default is 1.

Outer Key
Minimum

Numeric value that specifies the minimum number of


records in the set to display in the report.

To display the record in a format that is easy to read,


specify at least Title1, Title Line Delimiter, Outer
Key Delimiter and Inner Key Delimiter.

Working with the TS Quality Utilities

16-8

Field Settings

Field Settings
The fields that will be displayed in the report should also be
identified.

LINE_01

LINE_02

LINE_03

LINE_04

Figure 16.3 Fields Displayed


This sample report includes LINE_01 to LINE_04, and all fields are
displayed on one line.
To specify fields to be displayed
1.

Click Advanced and navigate to Output, Create Report.

Creating and Working with TS Quality Projects

Field Settings
2.

See TS Quality
Reference
Guide and
Online Help
for complete
settings
information.

16-9

Refer to the table below and configure these additional


settings:

Setting

Description

Report Value Type

Defines the type of Report Value entry. Values are:


DDL field name
Literal value
Insert spaces only
If the space option is used, Report Value is
not required. However, you must specify the
number of spaces with Report Value
Length.

Report Value

Specifies either a literal value or a field name to


display in the report.

Report Value
Encoding

Specifies the encoding used by Report Value. If not


specified, the platforms native encoding is the
default.

Report Value Length

Specifies a limit for the length of the value displayed


in the report. This is useful when using a DDL field
and the field length is very long but might not be
completely filled.

Report Line Number

Specifies the line number for the value in Report


Value. The same Report Line Number value can
be associated with more than one Report Value,
but they must be grouped together and remain in
numerical order.

Working with the TS Quality Utilities

16-10

File Update Utility

File Update Utility


The File Update Utility updates a file, called the master file, with
the data contained in another file, called the transaction file. You
can update records in the master file based on a specific key set
when the keys match between records.The File Update utility
separates records based on match or not-match key conditions and
creates separate files for these records for further review or
processing.

Match Keys and Fields


The update will be applied to the records in the master file based on
a Match Key specified by the user. If the match keys values are
equal in the master and transaction files, the record will be
considered matched, and field values in the matched records in
the master file will be updated by the values in the transaction file.
Fields to be updated are determined by the output DDLs for the
output files. If the key values in the master file and transaction file
are not equal, the record will be considered unmatched and the
unmatched records will be written out.
Match keys are specified with the Match Key setting.
Field names are used for match keys.
The values in the key fields must be equal in both the
master and transaction files to perform the update.
Up to five match keys can be specified.
All input files (master and transaction files) must be sorted
in ascending order by the match key.
Neither field names nor field lengths for match keys need
to be equal for the master and transaction files. The
program will search for a match based on the values in
the key fields specified.
Fields to be updated must be specified in the output DDLs. When
there are common fields between the master and transaction files,
Creating and Working with TS Quality Projects

Match Keys and Fields

16-11

the values in the common fields in the master file will be


overwritten (updated) by the values in the same fields in the
transaction file.

Example
In these sample master and transaction files, the following fields
are used (the Match Key is the Record_Key field):
Tran File

Master File

Record#
Record_Key
Street
City

Record#
Record_Key (Match Key)
Name

(Match Key)

State

If the output DDL has all fields from the master and transaction
files, the match master file includes the following fields. Therefore,
the value in the common field, Record#, will be overwritten by the
transaction file:
Match Master File

The values of
these fields
are inserted
from the tran
file.

Record #
Record_Key
Name
Street
City
State

The value of this


field will be
overwritten
(updated) by the
tran file.

If you prefer not to overwrite or update the common field


in the master file with information from the transaction
file, redefine the field name either for the master or for
the transaction file.

Working with the TS Quality Utilities

16-12

Match Keys and Fields

Example
Input
In this example, the master file contains the customers names and
the transaction file contains the customers addresses.
Match key: Record_Key field.
Master Input File (master.dat)
Rec #

Record_Key

Name

0001

John Nicoli

0001

J Nicoli

0002

Mary Rogers

0003

Kevin McCarthy

Tran Input File (tran.dat)


Rec #

Record_Key

Street

City

State

100

0001

25 Linnell Circle

Billerica

MA

200

0001

1 Elm St.

Nashua

NH

300

0002

25 Linnell Circle

Billerica

MA

400

0002

12 Oak St.

Waltham

MA

500

0004

3 Royal Court

Boston

MA

Output DDL
The following fields are used in the DDL for ALL master output files
(match_master.ddx, match_dup_master.ddx, unmatch_
master.ddx):
Rec #
Record_Key
Name
Street

Creating and Working with TS Quality Projects

Match Keys and Fields

16-13

Rec #
City
State

Output
The program searches for records in master.dat that have the
same key values as tran.dat. In this case, the records with
0001 and 0002 in the Record_Key field are matched
records.

1.

2.

The Rec #1 in master.dat and the Rec #100 in tran.dat are


matched records because they matched first. Similarly, Rec
#3 and the Rec # 300 are matched records. Therefore, in the
match master file, the values in the Rec# field will be updated
by tran.dat, and all address-related fields and their values will
be added from the tran.dat file.

Match Master File (match_master.dat)


Rec #

Record_Key

Name

Street

City

State

100

0001

John Nicoli

25 Linnell Circle

Billerica

MA

300

0002

Mary Rogers

25 Linnell Circle

Billerica

MA

3.

Rec #2 in master.dat is a duplicate matched record because


it appeared after the first matched record (Rec #1). Therefore,
Rec #2 is written out to the match master duplicate file. For the
match master duplicate file, the user must select whether or
not to update the record by the Update Output Rec setting. See
the following cases:
Duplicate records are additional records with the same key
that appear after the first match occurs.

Match Master Duplicate File (match_dup_master.dat)


Case 1: Update the matched duplicate records
UPDATE_OUTPUT_REC

ON

Working with the TS Quality Utilities

16-14

Match Keys and Fields

Rec #

Record_Key

Name

Street

City

State

100

0001

J Nicoli

25 Linnell Circle

Billerica

MA

Case 2: Do not update the matched duplicate records


UPDATE_OUTPUT_REC
Rec #

Record_Key

Name

0001

J Nicoli

4.

OFF
Street

City

State

Rec #4 in master.dat is an unmatched record because it does


not share the same key value with any records in tran.dat.
Therefore, Rec #4 is written out to the unmatch master file.

Unmatch Master File (unmatch_master.dat)


Rec #

Record_Key

Name

0003

Kevin McCarthy

5.

Street

City

State

The matched records in tran.dat are written out to the match_


tran.dat file to show which transaction records matched a master
record. If there are any matched duplicate, unmatched, or
unmatched duplicate records in the transaction file, they are also
written out to separate files.

Match Tran File (match_tran.dat)


Rec #

Record_Key

Street

City

State

100

0001

25 Linnell Circle

Billerica

MA

300

0002

25 Linnell Circle

Billerica

MA

Creating and Working with TS Quality Projects

Input and Output Settings

16-15

Match Tran Duplicate File (match_dup_tran.dat)


Rec #

Record_Key

Street

City

State

200

0001

1 Elm St.

Nashua

NH

400

0002

12 Oak St.

Waltham

MA

Input and Output Settings


For input, you must specify the Master File Name and DDL,
Transaction File Name and DDL. For output, you can specify
various types of Match and Unmatch files.
Master and transaction files must be sorted by match keys.

Match Key Settings


The update will be applied to the records in the master file based on
the Match Key specified by the user. If the match keys values
are equal between the master and transaction files, the record will
be considered matched, and field values in the matched records in
the master file will be updated with the values in the transaction
file.
A red flag
indicates a
REQUIRED
setting.

To specify match keys


1.

From the File Update Utility step, click Advanced and


navigate to Input, Master Settings.

2.

Specify Match Keys by selecting field names from the dropdown list.

3.

Navigate to Input, Transaction Settings.

4.

Specify the same Match Keys as you specified for Master


Settings.
If you specify multiple match keys, separate the keys
with commas. For example: Last_Name,First_
Name,SS_number.

Working with the TS Quality Utilities

16-16

Transaction Output Settings


To enable Table Update
When it is not practical to sort the master file by the match keys
due to its size, enable the Table Update function to update the
master file without sorting. The program will only update the
first matched record in the master file with the contents of
the first matched record in any transaction file. Any
following matches for that match key will be ignored.
1.

Navigate to Input, Master Settings.

2.

Select Table Update.

If Table Update was turned on in the Master Settings,


duplicate records will not be written out because the
program will not search for duplicate records.
For the match master duplicate file, the user must
select whether or not to update the record by using the
Update Output Rec setting in the Master Match Dup
Settings.

Transaction Output Settings


If you specify transaction output files such as Tran Match File and
Tran Unmatch File, you need to set the Transaction File Qualifier in
Output Advanced Settings.
To specify transaction settings
See TS Quality
Reference
Guide and
Online Help
for complete
settings
information.

1.

Click Advanced and navigate to Output, Match Tran


Settings or any of the output transaction files settings for
the files you specified on the Output Setting tab.

2.

Select For Tran. This is the value specified in the File


Qualifier in Transaction Settings.

Creating and Working with TS Quality Projects

Frequency Count Utility

16-17

Frequency Count Utility


The Frequency Count Utility analyzes data records to determine
the frequency of input fields by counting the occurrences of literal
data strings, mask shapes and blanks.The resulting frequency
counts are displayed in the output file.
Example
As shown in this example, the data can be counted by FIRST_
NAME, LAST_NAME and STREET_ADDR. When multiple fields are
specified like this, the frequency counts will be made on the
combined value of the fields, not on the individual fields.
COUNT
65
35
35
30
30
35
34
34
1
1
35
35
35
35
35
35
35
35
35

FIRST_NAME LAST_NAME
John
John
Nicoli
John
Nicoli
John
Smith
John
Smith
Bernard
Bernard
LeCuyer
Bernard
LeCuyer
Bernard
LCuyer
Bernard
LCuyer
Clara
Clara
Currier
Clara
Currier
Iulia
Iulia
Andrei
Iulia
Andrei
Jack
Jack
Sweeney
Jack
Sweeney

STREET_ADDR

13 Yellow Way
23 Purple Circle

19 Blue St
19 Blue St

18 Red Road

14 Orange Parkway

17 Black Street

Figure 16.4 Frequency Count

Working with the TS Quality Utilities

16-18

Input and Output Settings

Input and Output Settings


The Frequency Count Utility uses the Input File Name and Input
DDL Name from any other TS Quality step.

Count Settings
To specify fields to count
A red flag
indicates a
REQUIRED
setting.

See TS Quality
Reference
Guide and
Online Help
for complete
settings
information.

1.

From the File Update Utility step, click Advanced and


navigate to Input, Freq Settings.

2.

Click Entry Settings. Select Field Name from the dropdown list. This is the field which will be counted.

3.

Select either
Literal or
Mask for
Count Type.

4.

Click Field
Settings.
Select Sort
Type
(Descending Order or Ascending Order) and Sort Option
(Count or Value). The Sort Option specifies whether to sort
the results by Count or by field Value.

5.

Optionally, you can check Show All Combinations to


display an expanded output view. You can also specify how
many records of a given frequency are displayed in the
Number of Top Occurrences text box. For example, a
value of 100 will show the top 100 most frequently occurring
records.
When Show All Combinations is checked, the
program counts the number of occurrences of a
specified data combination. For example, if a field
contains Tom Smith and another record contains Tom,
the report shows two occurrences of Tom and one
occurrence of Tom Smith.

Creating and Working with TS Quality Projects

Merge Split Utility

16-19

Merge Split Utility


The Merge Split Utility lets you manipulate files with merge keys
and split rules. You can create merge keys to determine how files
will be merged, and create rules to split files into multiple smaller
files or to produce multiple output files from a single input file.

Input and Output Settings


The Merge Split Utility uses the Input File Name and Input DDL
Name from any other TS Quality step.

Using Multiple Input Files to Create an Output


DDL
You can specify up to a maximum of ten (10) input files and their
associated DDLs and use these to create a common output file for
later processing by modules downstream in your workflow. This
process requires that after you specify the input files, you map
input fields from the associated DDLs to a common output DDL file.
To add multiple input files and map
1.

Double-click a Merge Split Utility step to open the Merge


Split Utility window.

2.

In the Input Data File field, type or browse to the input file
you wish to use.

3.

In the Input DDL File field, type or browse to the inpt DDL
file associated with the input data file you specified in Step 2.

4.

Click Add.

5.

Repeat Steps 2-3 until youve added all DDL files you want to
use to create the common output format.

6.

Click the Define Output DDL button (bottom left).

7.

The Define Output DDL dialog appears.

Working with the TS Quality Utilities

16-20

Using Multiple Input Files to Create an Output DDL

Figure 16.5 Define Output DDL dialog


8.

Use the Input DDL drop-down menu to select the DDL file
you want to use to map fields to an output DDL file. The
input DDL fields appear in the left pane and the final output
DDL fields appear in the right-pane.

9.

Use the buttons in the center panel to refine the output DDL
list of fields. You can choose from these options:
Addadds the selected input DDL field to the output DDL
list.
Deletedeletes a selected output DDL field from the list.

Creating and Working with TS Quality Projects

Merge Files

16-21

Move Upmoves the selected field in the output DDL list up


one row.
Move Downmoves the selected field in the output DDL list
down one row.
Redefineredefines an input field as a portion of an output
field. Use this option to map multiple input fields to the same
redefined output field.
Consolidateconsolidates an input field with an existing
output field. Use this option when two or more fields have
different names but contain the same data, such as zipcode,
ZIP5, and postal_code.
For Redefine and Consolidate, make sure that the
lengths of the input fields do not exceed the overall
length of the redefined or consolidated output DDL
field.
10.

When you are ready, click Save to save the output DDL field
mapping. When the Merge Split Utility step runs, it will create
an output DDL file that uses this mapping.

Merge Files
For a merge operation, all input files that will be merged and the
output file MUST be the same shape. In other words, they must use
the same DDL.
Input files must be sorted by match keys.

Example
In this example, Input 1 will be merged into Input 2 using the
Name field as Match Key.
Input 1
Custmer_ID#

Name

0000001

John Nicoli

0000002

Mary Nicoli

Working with the TS Quality Utilities

16-22

Merge Files
Input 2
Custmer_ID#

Name

9000001

Alice Rogers

9000002

Kevin McCarthy

The following DDLs are used for ALL input files and output files:
Customer_ID#
Name

On output, the program copies Match Key values from Input 1 and
Input 2 along with other components of data. The record order will
be determined by the order of key values. As a result, the total
number of records is the sum of the number of records from both
Input 1 and Input 2.
Output File
Customer_ID# Name
9000001

Alice Rogers

0000001

John Nicoli

9000002

Kevin McCarthy

0000002

Mary Nicoli

To merge files
A red flag
indicates a
REQUIRED
settings.

1.

From the Merge Split Utility step, click Advanced and


navigate to Process, Settings.

2.

On the Field Settings tab, select Merge for Process Type


from the drop-down list.

3.

On the Field Settings tab, select Match Key from the dropdown list.
You can specify up to five fields for the Match Key,
separated by commas.

Creating and Working with TS Quality Projects

Split a File

16-23

Split a File
Splitting a file is useful when your system has a file-size limit or you
want to separate a file into manageable pieces. The pieces can later
be re-assembled using the Merge operation. You can split the input
file by the number of records or bytes per segment.
To split a file
1.

Click Advanced and navigate to Process, Settings.

2.

On the Field Settings tab, select Split for Process Type


from the drop-down list.

3.

On the Field Settings tab, select Partition Method from


the drop-down list.

Partition Method

Description

Round Robin Number Split the file by number of records. If this is selected,
the Round Robin Number must be specified.
Round Robin Keys

Split the file by the key (field). If this is selected,


Match Keys must be specified.
The input file should be sorted by the Match Key
field.

Ranges

Split the file by a range of values. If this is selected,


Range Start and Range End values (Entry
Settings tab) must be specified.

Ranges Stable

Split the file by the field name and field length. The
field name and field length are specified by Match
Key.

Records Per Segment Split the file by segment. If this is selected, Records
Per Segment must be specified.
Bytes Per Segment

Split the file by segment. If this is selected, Bytes


Per Segment must be specified.

Segment Per File

Split the file by the defined number of segments. The


number of segments is spcified by Number of
Output Files.

You can specify up to five fields for the Match Key,


separated by commas. If the Partition method is set to
Ranges, this setting can contain only one field.
Working with the TS Quality Utilities

16-24

Split a File
4.

On the Field Settings tab, specify Number of Output


Files. This is the number of output files to create. The entry
in the output file name is used as a base file name;
extensions will be generated up to the value specified here.

Examples
Round Robin Keys
In this example, the input file will be split to Output 1 and Output 2
using the Round Robin Keys method, and Lev2_matched field as
the Match Key.
Input File
Name

Lev2_matched

B McCarthy

000001

Bob McCarthy

000001

Catherine Rogers

000002

Cathy Rogers

000002

On output, the program splits the input file. The first Lev2_matched
group will be written to Output1, and the second Lev2_matched
group will be written to Output2.
Output 1
Name

Lev2_matched

B McCarthy

000001

Bob McCarthy

000001

Output 2
Name

Lev2_matched

Catherine Rogers

000002

Cathy Rogers

000002

Round Robin Number


In this example, the input file will be split to Output 1, Output 2 and
Output 3 using the Round Robin Number method, and the Round
Creating and Working with TS Quality Projects

Merge and Split Files

16-25

Robin Number is set to 1 .


Input File
Name

Lev2_matched

B McCarthy

000001

Bob McCarthy

000001

Catherine Rogers

000002

Cathy Rogers

000002

On output, record #1 is written to output1, record #2 is written to


output2, and record #3 is written to output3. Since there are only
three output files specified, record #4 goes back to output1, and
the cycles continue in this manner:
Output 1
Name

Lev2_matched

B McCarthy

000001

Cathy Rogers

000002

Output 2
Name

Lev2_matched

Bob McCarthy

000001

Output 3
Name

Lev2_matched

Catherine Rogers

000002

Merge and Split Files


To merge and split files
1.

Click Advanced and navigate to Process, Settings. On the


Field Settings tab, select Both for Process Type from the
drop-down list.
Working with the TS Quality Utilities

16-26

Merge and Split Files


2.

On the Field Settings tab, select Partition Method from


the drop-down list.
For the Both Process Type, first Merge, then Split will
be performed.

Partition Method

Description

Round Robin Number Split the file by number of records. If this is selected,
a Round Robin Number must be specified.
Round Robin Keys

Split the file by the key (field). The input file must be
sorted by this field. If this is selected, Round Robin
Keys must be specified.

Ranges

Split the file by a range of values. If this is selected,


Range Start and Range End values (Entry
Settings tab) must be specified.

Ranges Stable

Split the file by field name and field length. The field
name and field length are specified by Match Key.

Records Per Segment Split the file by segment. If this is selected, Records
Per Segment must be specified.
Bytes Per Segment

Split the file by segment. If this is selected, Bytes


Per Segment must be specified. The number of
segments is specified by Number of Output Files.

Segment Per File

Split the file by the defined number of segments.

3.

On the Field Settings tab, select Match Key from the dropdown list.

You can specify up to five fields, separated by commas. If


the Partition method is set to Ranges, this setting can
contain only one field.
4.

See TS Quality
Reference
Guide and
Online Help
for complete
settings
information.

On the Field Settings tab, specify Number of Output


Files. This is the number of output files to create. If this
value is greater than 1, the entry in the output file name is
used as a base file name, and extensions will be generated
up to the value specified here.

Creating and Working with TS Quality Projects

Resolve Utility

16-27

Resolve Utility
See TS Quality
Reference
Guide and
Online Help
for complete
settings
information.

The Resolve Utility resolves transitivity affecting links between


transactions. When a Window Linking is performed with the
Relationship Linker, you can create a link file indicating which
records are linked together with common data.
Transitivity occurs when two records are linked together indirectly
through a third record in a multi-linking process. For example,
record A may have linked record B in the first run and record B may
link record C in a subsequent run using a different window key.
When this happens, record A has linked record C through
transitivity. Using the MALINK file from the Relationship Linker, the
Resolve utility creates a relationship of the records that can then be
used to represent the entire matched record set.
Please contact Trillium Software Customer Support
for more information on multi-linking.

Example

MALINK file from


match on SS#

MALINK file from


match on Name

Merge Files

Resolve

If the MALINK record layout is:


Recid(20) + Recid(20) + Match Type(1) + Pattern(3)
00000001
00000002
P
405
Working with the TS Quality Utilities

16-28

Input and Output Settings


Multiple matches are run on Social Security Number and name.
Record A matches to B, and on the other match record B matches to
record C. Each run produces a MALINK file with the matches in it: A
-> B & B -> C. The MALINK files are then combined.
Recid
0002
0002
0007
0015

Recid
0007
0015
0009
0022

Type Pat
P
329
P
210
P
230
P
230

The Resolve Utility processes this file and produces the following
output:
Recid
0007
0015
0009
0022

Recid
0002
0002
0002
0002

Transitivity would also show that, if record A matched to record B


and record B matched to record C, then record A must also match
to record C.
The Resolve Utilitys output is then used to update the keys on the
first linkings output. Typically done with the File Update Utility, any
occurrence of Recid in column 1 is updated to the value in Recid in
column 2. The File Update Utilitys output is then sorted on the
updated key to group all recoded records together with their
resolved match set.

Input and Output Settings


For input, the Resolve Utility takes merged MALINK files from the
2nd-Nth relationship linkers in a multi-link process. It resolves
transitivity issues of matches from the runs which followed the first
linking.
The Resolve Utility does not use a DDL for output.
Creating and Working with TS Quality Projects

Link Field

16-29

Link Field
To link files
A red flag
indicates a
REQUIRED
setting.

1.

From the Resolve Utility step, click Advanced and navigate


to Input, Settings.

2.

In From Link Field, select the DDL field from the drop-down
list. This DDL field contains the starting key of the link
(generally located in the left column of the Relationship
Linkers link output file).

3.

In To Link Field, select the DDL field from the drop-down


list. This DDL field contains the ending key of the link
(generally located in the right column of the link output file
from the Relationship Linker).

In addition, you can specify the following settings:


Setting

Description

Process Group
Records

Number of records to process in a set. When the


program reaches this limit, it writes output to the file
in a resolved form. Buffers are created and processing
continues.

Process Group
Memory

Maximum memory to use in a set. This overrides the


Process Group Record settings if both are used.

Working with the TS Quality Utilities

16-30

Set Selection Utility

Set Selection Utility


The Set Selection Utility selects data from a file and then skip and
select that data on output. Selection of data occurs based on Match
Keys, Select Record Conditions and Bypass Record
Conditions. Field names are used for match keys.
This utility is useful when you select records based on relationship
keys (created during the linking process) where you want a set of
records to be evaluated for defined criteria.

Example
For example, if the Match Key is Household_Number field, the
program first selects records that have the Household_Number field
in the input file.
Assume that, in the Select Record Conditions or Bypass Record
Conditions, the condition is set to Household_number=00001. In
this case, the program selects records if the values in the
Household_Number field equal 00001.
After running the program, you can verify the results of the select
operation by viewing the output file in the Data Browser.

Input and Output Settings


The Set Selection Utility is usually used on the results of the
Relationship Linker. In this case, use the Input File Name and Input
DDL Name from the output of the Relationship Linker step.
Input files must be sorted by match keys.

Creating and Working with TS Quality Projects

Select Records

16-31

Select Records
The selection will be applied to records in the input file based on the
Match Key field specified by the user. Records with the same
match key values will be selected and written to the output file.
To select records
1.

From the Set Selection Utility step, click Advanced and


navigate to Input, Settings.

2.

In Match Key, select the DDL field from the drop-down list.
The program selects records which have this field.

Figure 16.6 Select Match Key


To set a limit on number of records or key sets
You can specify the maximum number of records to be selected
and the minimum number of records per Match Key set (a group
of records with the same value) to be selected. These settings
can be set separately for input and output.
1.

Click Advanced and navigate to Input, Settings for input


and Output, Settings for output.

2.

Refer to the table below and specify the appropriate values:

Setting

Description

Maximum Total
Records

Numeric value greater than or equal to 1. It


specifies the maximum total number of records of a
specific key set to process.

Working with the TS Quality Utilities

16-32

Select Records
Setting

Description

Minimum Records Per Numeric value. Any key set with a record count that
Set
is less than or equal to this value will be discarded
without processing.

See TS Quality
Reference
Guide and
Online Help
for complete
settings
information.

Maximum Records
Per Set

Numeric value. Any key set with a record count that


is equal to or exceeds this value will be discarded
without processing.

Maximum Set

Numeric value that limits the total number of key


sets to process.

To set a condition to select records


Next, you can specify a specific value for the match key that you
want to select. Use Select and Bypass Conditions to do this.
1.

Click Advanced and navigate to Input, Settings, Select


Record Conditions.

2.

Create a desired condition to select records. See Select or


Bypass Records on page 5-37 for building a condition.

Figure 16.7 Select Conditions

Creating and Working with TS Quality Projects

Sort Utility

16-33

Sort Utility
See TS Quality
Reference
Guide and
Online Help
for complete
settings
information.

The Sort Utility reads records from input data files and sorts them
to produce a single output file. The single output file is created in a
common shape with a single associated Data Dictionary Language
(DDL) file.
The sort functions support up to 99 sort keys. During the sort
step, you can select fields from input records to be written to the
output. This process is controlled through the input and output DDL
field-mapping function.
See Chapter 9 and Chapter 10 for details of the Sort
Utility.

Working with the TS Quality Utilities

16-34

Sort Utility

Creating and Working with TS Quality Projects

17-1

CHAPTER 17

Customizing the Control


Center

Customizing the Control Center

17-2
TS Quality allows you to customize your work area by changing
several configuration options:
fonts used within the Control Center
size and style of text
color used within the Control Center

Creating and Working with TS Quality Projects

Changing the Control Center Display Settings

17-3

Changing the Control Center Display


Settings
You can change the way that items in the Control Center are
displayed by modifying the following settings:
fonts for Menu, Text Viewer, Step Labels, Step Comments
and Project Labels
color for background and arrows
Any changes made here become the default until more
changes are made.
To open the Preference menu
You can also
right-click
anywhere
inside the Data
Flow Architect
(just not on a
specific step
module) and
select
Preference.

1.

Select Setup, Preferences. There are two tabs, General


and Display.
The General tab allows you to decide which applications
or functions launch when the Control Center opens.
See Set Up the Control Center on page 2-5 for
descriptions for the General tab.
The Display tab lets you select a new default font, style,
and size for the text appearing in the Menu, Text Viewer,
Step Labels, Step Comments, and Project labels. You can

Customizing the Control Center

17-4

Changing the Control Center Display Settings


also choose background and foreground colors for these
items.

Figure 17.1 Preference Display Tab


To change the Font Settings
1.

Select the Display tab. In the Category box, click the item
that you want to change.

2.

In Font Selection, select the new font, style, and size from
the pull-down menus on the right. As you make your
changes, the text in the Sample box reflects the changes.

3.

Click OK. The Preferences tab closes and the new font
settings are applied to the selected item.

To change the Color Settings


1.

Select the Display tab. In the Category box, click the item
that you want to change. The Color section becomes active.

Creating and Working with TS Quality Projects

Changing the Control Center Display Settings


2.

17-5

Click either the Foreground or Background button,


depending on which color you want to change. These buttons
invoke separate identical windows.
Swatches
The Swatches tab lets you choose a color from a palette of preset
colors:
a.

Click a color in the palette. The Preview section below


displays the selected color scheme. As you select a color
from the palette, your choice will be recorded in the
Recent: grid on the right. The far-left box in the first row
of the grid will be filled with the selected color. Each time
you select a color, the rest of the boxes will be filled in.

b.

Click OK. The Foreground/Background window closes.

HSB
The HSB tab lets you define the color by Hue (the colors tint),
Saturation (the hues purity), and Brightness (the colors
brightness):
a.

Select the color component that you want to change by


selecting either the Hue, Saturation, or Brightness
radio button. The Color box on the left changes based on
your selection.

b.

Move the slider up or down to shift though the color


spectrum. The Color box on the left adjusts accordingly.

c.

When you want to see how a color will appear, click on


the section of the Color box where the color appears. The
Preview section displays the selected color scheme.

d.

Click OK. The Foreground/Background window closes.

RGB
The RGB tab allows you to define a color as a combination of the
Red, Green, and Blue primary colors.
a.

Drag the sliders for the respective colors to the left or


right. The Preview section updates as the color
changes.

Customizing the Control Center

17-6

Changing the Control Center Display Settings


b.
3.

When you are satisfied with your selection, click OK. The
Foreground/Background window closes.

Once you are satisfied with the changes, click OK in the


Preferences window. The new color settings will be applied.

To change the background color


1.

Right-click anywhere inside the Data Flow Architect (not on a


specific step module) and select Preference, Background
Color.... The Background Color Chooser window opens.

2.

Refer to the steps above for changing the color settings.

3.

Click OK. The new color settings will be applied.

To change the arrow color


1.

Right-click anywhere inside the Data Flow Architect (not on a


specific step module) and select Preference, Arrow
Color.... The Arrow Color Chooser window opens.

2.

Refer to the steps above for changing the color settings.

3.

Click OK. The new color settings will be applied.

Creating and Working with TS Quality Projects

A-1

APPENDIX A

The Data Dictionary


Language and DDL Types

The Data Dictionary Language and DDL Types

A-2

The Data Dictionary Language

The Data Dictionary Language


The Data Dictionary Language (DDL) is a method for defining data
file and record layouts. The Data Dictionary Language file,
commonly referred to as DDL or DDL file, is a collection of keywords
that contains the definitions of the input and output files that are
used by TS Quality. DDL input and output files must be defined for
each module in TS Quality.
See Chapter 2, Working with a Project for structure and
components of DDLs and how to create them.

Creating and Working with TS Quality Projects

Data Dictionary Language (DDL) Types

A-3

Data Dictionary Language (DDL) Types


The Type must be specified for every DDL field entry. There are four
Type categories: Encoding (code page), Trillium Types, Date
Format, and Class Keyword.

Encoding (Code Page)


Encoding is a mapping of binary values to code positions which
represent characters of data. It is also called a code page. The
following table is a list of the main character encoding used in TS
Quality.
Note that some encoding below may not be available
depending on the chosen module or GUI Tool. Contact
Customer Support for more information.

Table A-1: DDL Encoding


Type

Description

NOTRANS

NOTRANS means No Translation. The operations will be done in the


default encoding for the host computer.
NOTE: Users need to be careful that the data will not be
translated into their native encoding. For example, if a data
file from Greece is run on a computer in the US and both
the settings files and all of the fields in the DDL are set to
NOTRANS, you will likely get a different result than if the
same project was run in Greece.

ASCII

American Standard Code for Information Interchange. A 7-bit


encoding for representing English characters.

BIG5

Traditional Chinese

CCSID937

Traditional Chinese

CP037

EBCDIC, IBM037

CP1250

Latin 2, Eastern European

The Data Dictionary Language and DDL Types

A-4

Encoding (Code Page)

Table A-1: DDL Encoding


Type

Description

CP1251

Cyrillic (Slavic)

CP1252

Latin1 (ANSI)

CP1253

Greek

CP1254

Turkish

CP1255

Hebrew

CP1256

Arabic

CP1257

Baltic

CP1258

Vietnamese

CP932

Microsoft Extended Shift-JIS Japanese

CP936

Simplified Chinese, GBK

CP949

Korean

CP950

Traditional Chinese

EUCCN

Simplified Chinese, Unix, GB2312, EUC-SC

EUCJP

Japanese, Unix, EUC-JP, EUC-J, JEUC, J-EUC, EUCJ

EUCKR

Korean, Unix, EUC-KR, KS_C_5861-1992

EUCTW

Traditional Chinese, Unix, CNS-11643, CNS-11643-1992

GB12345

Traditional Chinese

HZGB2312

Simplified Chinese, HZ-GB-2312

IBM-83-4040
IBM-83-4242

Japanese corporate kanji code

ISO2022JP

Japanese, ISO-2022-JP

ISO-8859-7

Latin/Greek

ISO 8859-9

Latin-1 modification for Turkish (Latin-5)

Creating and Working with TS Quality Projects

Encoding (Code Page)

A-5

Table A-1: DDL Encoding


Type

Description

JEF-83-A1A1
JEF-83-4040
JEF-78-A1A1
JEF-78-4040

Japanese corporate kanji code. Hitachi.

JOHAB

Korean

KEIS-83-A1A1
KEIS-83-4040
KEIS-78-A1A1
KEIS-78-4040

Japanese corporate kanji code. Fujitsu.

LATIN1

ISO 8859-1

LATIN2

ISO 8859-2

LATIN4

Baltic

LATIN7

Baltic

LATIN9

ISO 8859-15, Latin1 + Euro symbol and accented characters

UCS2

The encoding of Unicode as 16-bit values. This is the default


transformation format of Unicode. UCS2 is the same as UTF16.

UTF7

The encoding of Unicode as 7-bit values that can be transmitted


safely via E-mail (MIME messages).

UTF8

The encoding of Unicode as 8-bit values. In this encoding, all ASCII


characters are represented by themselves, and all bytes of multibyte characters have the eighth bit turned on. UTF8 is the default
encoding for XML.

UNICODE20:BIGENDIAN

Unicode with the most significant byte first. Other name: bigendian

UNICODE20:LITTLE
-ENDIAN

Unicode with the least significant byte first. Other name: littleendian

The Data Dictionary Language and DDL Types

A-6

Trillium Types

Trillium Types
Trillium Type is a data type of DDL field. The following table is a list
of Types used in TS Quality. Many of these Types can be used
together (example: PACKED DECIMAL).

Table A-2: Trillium Types


Type

Description

ASCII NUMERIC

Numeric characters in ASCII.

BITFIELD

BITFIELD type is an array of bits embedded within a byte or an


array of bytes. They are treated as right-justified unsigned
integers. The length is specified by the LENGTH statement. The
starting bit position is specified by the POSITION statement.
Numbering schemes for identifying the position of bits are: littleendian - the smallest position number is at the far right of the
entity, big-endian - the smallest position number is at the far left
of the entity. One is used as the starting counting position.

BOOLEAN

BOOLEAN may also be qualified as INTEGER. They are treated as


right justified binary integers; however, fields with the value of
zero are considered to be equal to FALSE, while fields with a nonzero value are considered to be TRUE.

BINARY

Binary data type.

INTEGER

INTEGER types may be signed or unsigned. They are treated as


right justified binary integers of the length specified in the LENGTH
statement. Integers may also be qualified as BOOLEAN.

PACKED

PACKED types can be signed or unsigned. Packed decimal digits of


the length are specified in the LENGTH statement in bytes. They
are treated as right-justified. Since packed decimals are stored two
digits to a byte, the total number of digits is twice the length for
UNSIGNED PACKED and twice the length minus 1 for signed
PACKED. For signed values the right-most nibble holds the sign
value.

Creating and Working with TS Quality Projects

Trillium Types

A-7

Table A-2: Trillium Types


Type

Description

ZONED DECIMAL

The ZONED DECIMAL type is treated as EBCDIC NUMERIC


characters with the least significant byte divided into a numeric
digit and a sign. The sign occupies the least significant nibble of
the byte and follows the conventions for PACKED decimal signs.

The Data Dictionary Language and DDL Types

A-8

Date Format

Date Format
Date format is a type of data which may contain only valid dates.
The following table contains a list of valid date formats.

Table A-3: DDL Date Format


Type

Data Format

ASCII AMERICAN

MM(/)DD(/)YYYY. 8 or 10 bytes.

ASCII EUROPEAN

DD(/)MM(/)YYYY. 8 or 10 bytes.

ASCII JULIAN

(YY)YY(/-)DDD. 5, 7, or 8 bytes.

ASCII LONG JULIAN

YYYY(/-)DDD. 7 or 8 bytes.

ASCII YEAR FIRST

YYYY(/)MM(/)DD. 8 or 10 bytes.

EBCDIC AMERICAN

MM(/)DD(/)YYYY. 8 or 10 bytes.

EBCDIC EUROPEAN

DD(/)MM(/)YYYY. 8 or 10 bytes.

EBCDIC JULIAN

(YY)YY(/-)DDD. 5, 7, or 8 bytes.

EBCDIC LONG JULIAN

YYYY(/-)DDD. 7 or 8 bytes.

EBCDIC YEAR FIRST

YYYY(/)MM(/)DD. 8 or 10 bytes.

PACKED AMERICAN

0MMDDYYYY. 5 bytes.

PACKED EUROPEAN

0DDMMYYYY. 5 bytes.

PACKED JULIAN

(YY)YYDDD. 3 or 4 bytes.

PACKED LONG JULIAN

YYYYDDD. 4 bytes.

PACKED YEAR FIRST

0YYYYMMDD. 5 bytes.

UNSIGNED PACKED AMERICAN

MMDDYYYY. 4 bytes.

UNSIGNED PACKED EUROPEAN

DDMMYYYY. 4 bytes.

UNSIGNED PACKED JULIAN

0(YY)YYDDD. 3 or 4 bytes.

UNSIGNED PACKED LONG JULIAN

0YYYYDDD. 4 bytes.

Creating and Working with TS Quality Projects

Date Format

A-9

Table A-3: DDL Date Format


Type

Data Format

UNSIGNED PACKED YEAR FIRST

YYYYMMDD. 4 bytes.

SJIS IMPERIAL DATE

Japanese date format with imperial calendar. CP932


or Shift-JIS encoding only.
Example:

30 1 1
You must use valid month/day
combinations. If the month/day is invalid,
the output data is blanked out.
SJIS JAPANESE DATE

Japanese date format with Gregorian calendar.


CP932 or Shift-JIS encoding only.
Example:

1997 1 1
You must use valid month/day
combinations. If the month/day is invalid,
the output data is blanked out.
ASCII ROMAJI IMPERIAL DATE

Japanese date format with shortened imperial year.


ASCII encoding only.
Example:
S35-1-1
You must use valid month/day
combinations. If the month/day is invalid,
the output data is blanked out.

The Data Dictionary Language and DDL Types

A-10

CLASS Keyword

CLASS Keyword
Class keyword specifies the format to be used for the date field. By
using the class keyword, you can convert any 2-digit year into a 4digit year.
The following table describes all specifications for the CLASS
keyword.

Table A-4: DDL CLASS Keyword


Statement

Description

DATE FORWARD

Converts any 2-digit year into a 4-digit year when the data value is
equal to, or greater than, the current year.
Top of date window = current year + 99
Bottom of date window = current year
Example
If the current year is 2005:
Top of date window = 2104 (2005 + 99 = 2104)
Bottom of date window = 2005

DATE BACKWARD

Converts any 2-digit year into a 4-digit year when the data value is
equal to, or less than, the current year.
Top of date window = current year
Bottom of date window = current year 99
Example
If the current year is 2005:
Top of date window = 2005
Bottom of date window = 1906 (2005 99 = 1906)

Creating and Working with TS Quality Projects

CLASS Keyword

A-11

Table A-4: DDL CLASS Keyword (Continued)


Statement

Description

DATE WINDOW
{nnn}

Converts a 2-digit year into a 4-digit year, according to a userspecified date window. You can specify 1 to 4-digit numbers in
{nnn}.
----------------------------------------------------------------Top of date window = if {nnn} >100 and {nnn} > current year, then
{nnn} is the top of the date window.
Bottom of date window = top of the date window - 99
If the current year is 1999: CLASS IS DATE WINDOW 2030
Top of date window = 2030 (2030 > 100 and > the current year)
Bottom of date window = 1931 (2030 99 = 1931)
----------------------------------------------------------------Top of date window = bottom of the date window + 99
Bottom of date window = If {nnn} >100 and {nnn} < current year,
then {nnn} is the bottom of the date window.
If the current year is 1999: CLASS IS DATE WINDOW 1967
Top of date window = 2066
(1967 + 99 = 2066)
Bottom of date window = 1967 (1967 > 100 but < the current year)
----------------------------------------------------------------Top of date window = If {nnn} > 0 and {nnn} < 100, then top of the
date window is current year + nnn
Bottom of date window = current year + nnn -99.
If the current year is 1999: CLASS IS DATE WINDOW 30
Top of date window = 2029
(30 > 0 but < 100, 1999 + 30)
Bottom of date window = 1930 (1999 + 30 -99).
----------------------------------------------------------------Top of date window = current year nnn + 99.
Bottom of date window = If {nnn} < 0, then bottom of date window
is current year nnn
If the current year is 1999:
CLASS IS DATE WINDOW -30
Top of date window = 2068 (1999 - 30 + 99)
Bottom of date window = 1969 (-30 < 0, 1999 - 30).

The Data Dictionary Language and DDL Types

A-12

CLASS Keyword

Creating and Working with TS Quality Projects

B-1

APPENDIX B

Parser Review Code

Parser Review Code

B-2

Parser Results

Parser Results
The Parser generates Completion Codes and Review Codes to
identify specific conditions that occur for each record being parsed.
You can review these codes to analyze the Parser results.

Parser Completion Codes (CDP/BDP)


Table B-1 shows the return codes that appear in the pr_
completion_code or bp_completion_code field, in output from
the Customer Data Parser or Business Data Parser Repository.
Several errors (2, 3, 4, 5, 8, D) are caused by inaccuracies
in the file path.

Table B-1: Parser (CDP/BDP) Completion Codes


Value

Description

No error

Insufficient Storage. When using DDL field sub-segments (line_01a, line_01b,


etc.) and the sum of the data in these fields exceeds the redefine field length, the
data is truncated and a value of 1 is returned. Processing continues normally for
all other lines.

Table Error Pattern, Word and/or City tables not found

Log File Error

Detail File Error

Pattern-Word-City Tab Error Pattern, Word and/or City tables not readable.

Too Many Tokens

Line Definition Error

Display File Error

Invalid Parser Handle

Invalid Parm File Entry

Creating and Working with TS Quality Projects

Customer Data Parser Review Code/Review Groups

B-3

Table B-1: Parser (CDP/BDP) Completion Codes (Continued)


Value

Description

Invalid Interface Call Type. Must be either: O=Open, P=Process, C=Close

Invalid Service Call Type. Must be either:


D=Send to Display File
E=Supply Error Text

Statistics File Error

Parser not successfully initialized. Settings file may not be correctly defined. Check path and
file name.

Customer Data Parser Review Code/Review


Groups
Review codes are produced for many different data conditions.
These codes can be evaluated in a post parsing process to trigger
specific record handling or review. For instance, if a business
wanted to review every record that had received a review code of
26 (Unknown Name Pattern), a subsequent step following the
parsing process could redirect all records with this condition by
selecting the records with this code.
The code values are represented on the record by position in the
DDL field that corresponds to the line type that contained the
condition. The field names that must be used in the CDP output DDL
to contain these codes are:
pr_name_review_codes
pr_street_review_codes
pr_geog_review_codes
pr_misc_review_codes
pr_global_review_codes

Parser Review Code

B-4

Customer Data Parser Review Code/Review Groups

For each of these fields, a flag value of '1' is placed in the position in
the field that corresponds to the value of the condition. So in our
earlier example, where a review code of 26 was reported, you would
find a '1' in the field pr_street_review_codes at position 26.
Table B-2 lists review codes, review groups, and descriptions for the
Customer Data Parser.

Table B-2: CDP Review Codes and Review Groups


Review
Code

Review
Group

Description

Review Codes Can Belong To Multiple Review Code Fields:


000

000

No review code found

008

Unknown name pattern

009

Standardized first name too long

009

Display first name too long

009

Total number of export names gt max

009

Standardized middle name too long

009

Display middle name too long

009

Too many middle names

009

Standardized last name too long

009

Display last name too long

10

009

Standardized title too long

11

009

Display title too long

12

009

Too many titles

13

009

Standardized connector too long

14

009

Display connector too long

Name Codes

Creating and Working with TS Quality Projects

Customer Data Parser Review Code/Review Groups

B-5

Table B-2: CDP Review Codes and Review Groups (Continued)


Review
Code

Review
Group

Description

15

009

Standardized relation too long

16

009

Display relation too long

17

009

Standardized business too long

18

009

Display business too long

19

009

Derived genders conflict

20

009

Standardized generation too long

21

009

Display generation too long

22

010

More than one middle name

26

011

Unknown street pattern

27

011

Standardized street type too long

28

011

Display street type too long

29

011

Too many street types

30

012

Standardized direction too long

31

012

Display direction too long

32

012

Too many directions

33

013

Standardized street title too long

34

013

Display street title too long

35

013

Standardized complex name too long

36

013

Display complex name too long

37

013

Standardized house number too long

38

013

Display house number too long

Street Codes

Parser Review Code

B-6

Customer Data Parser Review Code/Review Groups

Table B-2: CDP Review Codes and Review Groups (Continued)


Review
Code

Review
Group

Description

39

013

Unusual house number

40

013

Display dwelling too long

41

013

Standardized dwelling too long

42

013

Too many dwellings

43

013

Unusual dwelling value

44

013

Too many dwelling values

45

013

Display box too long

46

013

Standardized box too long

47

013

Unusual box value

48

013

Display route too long

49

013

Standardized route too long

50

013

Standardized route number too long

51

013

Display route number too long

52

013

Unusual route value

53

013

Standardized complex type too long

54

013

Display complex type too long

55

013

Standardized dwelling number too long

56

013

Standardized box number too long

57

013

Display box number too long

58

013

Display dwelling number too long

59

020

Duplicate street line types

Creating and Working with TS Quality Projects

Customer Data Parser Review Code/Review Groups

B-7

Table B-2: CDP Review Codes and Review Groups (Continued)


Review
Code

Review
Group

Description

Geography Codes
61

014

No city name found in records

62

014

No state found in records

63

014

Standardized city too long

64

014

Display city too long

66

015

Standardized state/province/county too long

67

015

Display state/province/county too long

70

015

Standardized country too long

71

015

Display country too long

72

015

Standardized neighborhood too long

73

015

Display neighborhood too long

74

015

Standardized post code too long

75

015

Display post code too long

76

015

Unusual post code value

77

016

Corrected city name too long

78

000

City name change used for city

79

017

Conflicting geographic types

80

018

Domestic city name present but could not be verified

Global Review Codes


83

001

Unidentified token

84

019

Unidentified line

85

001

Invalid token definitions

Parser Review Code

B-8

Review Group Hierarchy

Table B-2: CDP Review Codes and Review Groups (Continued)


Review
Code

Review
Group

Description

86

001

Label or label element too long

87

001

Miscellaneous data for line too long

88

001

Too many categories

89

001

Too many names for export

90

002

Mixed name forms present

91

003

Hold mail element present

92

004

Foreign address element found

93

005

No names identified

94

006

No street identified

95

007

No geography identified

96 - 99

Currently unassigned

Review Group Hierarchy


Table B-3 displays the default review group hierarchy for the
Customer Data Parser. The review group code is placed in the
PREPOS field: pr_rev_group.
The Review Group Order setting (Process, Settings) can
be used to modify the group hierarchy.

Table B-3: CDP Review Group Hierarchy


Review
Group

Text of Parser Report

001

Unidentified token

Description

Creating and Working with TS Quality Projects

Review Group Hierarchy

B-9

Table B-3: CDP Review Group Hierarchy (Continued)


Review
Group

Text of Parser Report

Description

005

No names identified

No name found on the record


For example:
12 main street
Boston MA 01123

006

No street identified

No street information found on the record.


For example:
John Smith
Boston MA 01123

007

No geography identified

No Geography information found on the


record. For example:
John Smith
12 main street

014

No city or county
identified

Record did not contain city or county, or


could not be identified

019

Unidentified Line

Line type could not be determined, and is


set to ?

008

Unknown name pattern

Pattern for name format does not exist in


table. For example:
John Smith B A C D
12 main street
Boston MA 01123

011

Unknown street pattern

Pattern for street format does not exist in


table

013

Unusual or long address

When the length of the street name exceeds


25 bytes as defined in prepos.ddl

012

Invalid directional

Direction is inconsistent

017

Conflicting geography
types

The country default is US and the valid city


state is followed by foreign type postal code.
For example:
Mr John Smith
12 main street
Boston MA A1C 3R4

Parser Review Code

B-10

Review Group Hierarchy

Table B-3: CDP Review Group Hierarchy (Continued)


Review
Group

Text of Parser Report

Description

015

Geography too long

The length of the geography exceeds 30


bytes as defined in prepos.ddl

018

Unable to verify city


name

City name cannot be identified

016

Corrected city name too


long

Table entry for a city change recode exceeds


25 bytes as defined in prepos.ddl

020

Multiple street line types

More than one street line is found on the


record

010

More than one middle


name

Two or more middle names were found on


the name line. For example:
John Adam Wilson Smith
12 main street
Boston MA 01123

009

Derived genders conflict

The title and first name gender value are


different. For example:
Miss John Smith
12 main street
Boston MA 01123

004

Foreign address

Parser found a geography element outside


the country that the Parser is running. For
example:
John Smith
12 main street
Boston France 01123

003

Hold mail

One of the lines on the record is of type H


(such as Return Mail)
For example:
John Smith
Return Mail
12 main street
Boston MA 01123

Creating and Working with TS Quality Projects

Business Data Parser Review Code

B-11

Table B-3: CDP Review Group Hierarchy (Continued)


Review
Group

Text of Parser Report

Description

002

Mixed name forms

A business and personal name were both


found on the record.
For example:
John Smith
ABC corp
12 main street
Boston MA 01123

000

No review code found

No identifiable error on record. For example:


John Smith
12 main street
Boston MA 01123

Business Data Parser Review Code


Table B-4 lists the review codes and descriptions for the Business
Data Parser.

Table B-4: BCP Review Code


Review
Code

Description

Review Codes can belong to multiple Review Code Fields:


082

Unidentified pattern

087

Miscellaneous line too long

086

Label Line too long

088

Too many categories found

083

Unknown token

090

No data found

000

No targeted conditions found

Parser Review Code

B-12

Customer Data Parser Review Codes/Review Groups for Asia-Pacific Countries

Customer Data Parser Review Codes/Review


Groups for Asia-Pacific Countries
Review Codes
The Customer Data Parser for China, Korea, and Taiwan generates
review codes for each record to highlight specific conditions
describing how the record was processed. Review codes are written
to the pr_review_code field. The following table lists the individual
review codes.
Review Code

Description

000

No review codes written.

009

Unknown token remaining after processing.

010

Unknown tokens remaining after matching front and back parts of string.

041

Business branch value too long.

042

Business type value too long.

043

Business name value too long.

044

No business name.

045

Business name and type too long for field.

051

Surname value not found in lookup table.

052

Surname value too long.

053

First (given) name value too long.

054

No surname.

055

No given name.

056

Honorific not found in table.

057

Honorific too long.

058

Unidentified token.

Creating and Working with TS Quality Projects

Review Groups
Review Code

Description

921

Recoded business name according to a word/pattern table entry to


correct a mistyped or misused word.

B-13

Review Groups
Review groups are groups of review codes that illustrate types of
conditions present in the data, whereas review codes describe
actual specific conditions. Thus, review groups provide a way for
users to quickly understand the general types of conditions
occurring in a data. Review groups are written to the pr_review_
group field.
Review Group

Description

002

Missing name (no business name and missing either first or last name).

003

Business name does not contain a business keyword from the word/
pattern file.

004

There are unknown tokens in the business or personal name.

005

There is a contact name in the business name.

Parser Review Code

B-14

Exception Status

Exception Status
The Customer Data Parser for Japan generates exception status for
each record to highlight specific conditions. Exception status are
written to the pr_status/pr_h_status fields. The following table lists
the individual exception status.
The following table shows values for the exception status.
Value

Description

Mode

00

No specific condition occurred.

ALL

20

No input string found (including comments deleted)

ALL

22

Unknown token found in the final mask.

ALL

30

Multiple business types found.

BNP
CLUE

31

Only business type found.

BNP
CLUE

32

The record consists of branch name or branch name suffix only.

BNP
CLUE

33

*All parsed string is in alphabet with word delimiter, and no business clue found.

BNP
CLUE

34

All parsed string is in katakana with word delimiter, and no business clue found.

BNP
CLUE

35

All parsed string is in hiragana with word delimiter, and no business clue found.

BNP
CLUE

36

Multiple words (space delimited) in kanji and no business clue


found.

BNP
CLUE

40

Single person mode is ON and more than two words found.

PNP

* Word separation : spaces between characters, and characters set


for PNP_DELIMITER.

Creating and Working with TS Quality Projects

Index

Index
A
ABSOLUTE
Data Comparison Calculator 11-21
ALPHA
Intrinsic Attributes
Parsing Customization 7-17
Asian Characters
operators 5-33
Associativity
Data Reconstructor 13-6
Asterisks (*)
Parsing Customization 7-21
Attribute
DDL Editor 3-11
Attribute Modifiers
Category 7-11
Function 7-11
Gender 7-11
Parsing Customization 7-10
Recode 7-12
Attributes
Parsing Customization 7-10

B
Batch Script
create a script 14-3
Edit a script 14-4
Run a script 14-5
batch script 14-3
Binary Data Strings
Data Reconstructor 13-9
blanks
Frequency Count Utility 16-17
BNP 6-11
BNP_CLUE 6-11
Build a Conditional Statement 5-35

I-1

Business Attribute 6-18


Business Data Parser
BPREPOS 6-29
include unknowns in standard
original field 6-34
populate unknown patterns 6-34
Repository DDL File 6-30
retain original values 6-34
Business Data Parser (BDP) 6-27
Business Data Parser Review Code
B-11

C
Category
Attribute Modifiers 7-11
Character Translation 5-9
City Directory File 6-17
City Name Changes
Locality 7-15
Post Town 7-15
CJKTOARABICNUM operator 5-30
CJKTOFULL operator 5-29
CJKTOHALF operator 5-28
Class
DDL A-10
DDL Editor 3-11
CLASS Keyword A-10
Class keyword
DDL 2-42
collating sequence
Sort Utility 9-5
Colleting sequence
ASCII 9-6
EBCDIC 9-6
FOLDED_ASCII 9-6
FOLDED_EBCDIC 9-6
MULTI_NATIONAL 9-6
Comman line execution
Program names 15-4

Creating and Working with TS Quality Projects

I-2

Index

syntax 15-3
Comment
DDL Editor 3-11
Comment Lines
Parsing Customization 7-21
Comments
Data Reconstructor 13-7
Common Fields
Create Common Utility 12-6
Commonization
Create Common Utility 12-3
Comparison Routine
Data Comparison Calculator 11-21
Comparison Routines
Relationship Linker 10-13
Completion Codes
Business Data Parser 6-36
Customer Data Parser 6-25
COMPOSE or COMP 5-30
Conditionals 5-21
Logic Builder 5-35
Operators 5-26
Syntax 5-21
IF/ELSE Statement 5-21
Control Center
Data Flow Architect 2-20
Graphics View 2-21
List View 2-26
Project Panel 2-16
Project Viewer 2-17
Step Viewer 2-19
Conventions
Parsing Cutomization 7-21
Country Settings 4-7
Create Common
Decision Routines 12-12
Create Common Utility 12-3
Common Fields 12-6
Commonization 12-3

Decision Routines 12-7, 12-8


Match Key Level settings 12-6
Survivor record 12-8
Survivorship 12-3
Create New Project Wizard 2-3
Csutomer Data Parser
Join name lines 6-22
CTOSIMPCHINESE operator 5-30
CTOTRADCHINESE operator 5-30
Customer Data Parser
Business Attribute 6-18
City Directory File 6-17
INPUT_LINE_01 6-3
Line Definitions 6-19
Name Generation 6-21
Parsing Logic Flow 6-4
PREPOS 6-12
Preprocess House Number 6-18
Repository DDL File 6-14
Review Group Hierarchy B-8
Split address lines 6-23
Word Pattern Definition Fil 6-17
Customer Data Parser (CDP) 6-3
Customer Data Parser Review Code
B-3

Customer Data Parser Review Groups


B-3

Customer Data Pasrser


Exceptions File 6-16
Customized Definitions Table
Parsing Customization 7-3
Customizing the Control Center 17-3
Color settings 17-4
Display tab 17-3
Font settings 17-4
General tab 17-3

D
Data Browser 3-3

Creating and Working with TS Quality Projects

Index

Field Selection 3-4


Save view 3-6
Data Comparison Calculator 11-21
ABSOLUTE 11-21
Comparison Modifiers 11-21
Comparison Routines 11-21
comparison test 11-21
PARTIAL1 11-21
Score 11-21
Data Dictionary Editor 3-9
Data Dictionary Language (DDL) 2-33
Data Flow Architect
Control Center 2-20
Data Reconstructor 13-3
Associativity 13-6
Binary Data Strings 13-9
Comments 13-7
Conditions 13-11
Fields 13-4
literal values 13-4
operators 13-6
Precedence 13-6
Reserved Words 13-5
ruels file 13-3
rule script language 13-4
Rules File
Action Statements 13-15
Logical Operators 13-13
String Variables 13-19
Use Rule 13-22
Data Reconstructor Rules 13-5
Date format
DDL 2-42
DDL
Attributes 2-35
CLASS 2-36
Class keyword 2-42
Comment 2-35
Date format 2-42

I-3

DDL Builder 2-37


Default 2-35
Encoding 2-42
Field Name 2-34
Keywords 2-34
Length 2-35
methods of creating 2-33
Record Length 2-34
Record Name 2-34
Redefine 2-34, 2-40
Start Position 2-34
syntax 2-39
text format 2-33
Type 2-34
Type keyword 2-42
XML format 2-33
DDL Editor
Attribute 3-11
Class 3-11
Comment 3-11
Default 3-11
Field Name 3-10
Length 3-10
Record Length 3-10
Record Name 3-10
Redef 3-10
Start Position 3-10
Type 3-10
Update ORIGINAL_RECORD
Length 3-10
DDL Types A-3
Decision Routines
Create Common 12-12
Create Common Utility 12-7, 12-8
Default
DDL Editor 3-11
Delete
Parsing Customization 7-7
Delimited file

Creating and Working with TS Quality Projects

I-4

Index

creating a DDL 2-38


Delimited Files
DDL considerations 2-33
Delimiters
File Display Utility 16-5
Director
Cleansing Server 14-12
Matching Server 14-12
Real-Time Processing 14-11
Dual Address Information 9-17

E
Encoding
DDL 2-42
Error Report
Field and Pattern Lists 11-18
Exporting projects 14-7

F
Field and Pattern Lists
Error Report 11-18
Field Files
Relationship Linker 10-21
Field List Editor
Comparison Routine 11-14
Description 11-14
Field Name 11-14
Propagation Routine 11-14
Routine Modifier 11-14
Score 11-14
Field Name
DDL Editor 3-10
Field Scanning 5-10
Field Selection
Data Browser 3-4
Field Settings
File Display Utility 16-8
Fields
Data Reconstructor 13-4

File Display Utility 16-3


Delimiters 16-5
Field settings 16-8
Inner Key 16-3
Outer Key 16-3
File Qualifier 5-4
File Update Utility 16-10
master file 16-10
match key 16-10, 16-15
transaction file 16-10
Frequency Count Utility 16-17
Full-width (Zenkaku) and half-width
(Hankaku) Japanese Characters
5-31

Function
Attribute Modifiers 7-11

G
Gender
Attribute Modifiers 7-11
Global Data Router 4-3
Country Rules file 4-6
Country Settings 4-7
DDL Settings 4-9
Fields Settings 4-9
Global Geography Table 4-6
Global Rules file 4-6
NOMATCH file 4-4
Rules Files 4-6
Separate Output 4-3
Single Output 4-4
Global Geography Table 4-6
Grade Pattern Editor
Category 11-14
Field Name Columns 11-15
Pattern ID 11-14
Graphics View
Control Center
Data Flow Architect 2-21

Creating and Working with TS Quality Projects

Index

H
Help
Control Center 2-8
HIRAGANASTOL operator 5-30
How to Use Operators for Asian
Characters 5-33
HYPHEN
Intrinsic Attributes
Parsing Customization 7-17

I
IF Statements
Data Reconstructor 13-10
IF/ELSE
Data Reconstructor 13-3
Import projects
Windows to Unix 14-10
Importing projects 14-7, 14-9
Inner Key
File Display Utility 16-3
Insert
Parsing Customization 7-7
Intrinsic Attributes
Parsing Customization 7-17

J
JKANATOROMAN operator 5-29
JROMANTOKANA operator 5-29

K
KTOROMAN operator 5-29

L
Length
DDL Editor 3-10
Line Definitions 6-19
Line Lengths
Parsing Customization 7-21
Line Type

I-5

Geography 7-8
Miscellaneous 7-8
Name 7-8
Street 7-8
Line Types
Parsing Customization 7-8
Linking File
Relationship Linker 10-18, 10-25
List View
Control Center
Data Flow Architect 2-26
literal data string
Frequency Count Utility 16-17
Literal Values
Data Reconstructor 13-4

M
MALINK
Resolve Utility 16-27
Mask 5-17
Transformer 5-17
mask shapes
Frequency Count Utility 16-17
Masks
Parsing Customization 7-6
Recodes
Parsing Customization 7-12
master file
File Update Utility 16-10
Match Key
File Update Utility 16-10, 16-15
Set Selection Utility 16-31
Match Key Level Settings
Create Commmon Utility 12-6
Match Level Codes
Postal Matchers 9-15
Match Master Duplicate File
File Update Utility 16-13
Match Master File

Creating and Working with TS Quality Projects

I-6

Index

File Update Utility 16-13


Match Tran Duplicate File
File Update Utility 16-15
Match Tran File
File Update Utility 16-14
Matching
TS Quality Analyzer 8-4
Merge Files
Merge Split Utility 16-21
merge keys
Merge Split Utility 16-19
Merge Split Utility 16-19
merge keys 16-19
split rules 16-19
Modify
Parsing Customization 7-7
multi-linking
Resolve Utility 16-27
Multiple Definitions
Parsing Customization 7-15

N
Name and Address Format
project 2-13
NUMERIC
Intrinsic Attributes
Parsing Customization 7-18

O
Operations
Parsing Customization 7-6
Operators
Data Reconstructor 13-6
Operators for Asian Characters 5-28
Outer Key
File Display Utility 16-3

P
Parser Customization Editor

Parsing Customization 7-31


Parser Tables 6-17
Parsing Customization
Attribute Modifiers 7-10
Attributes 7-10
City Name Changes
for non-US cities 7-14
for US Cities 7-14
Comment lines 7-21
Conventions 7-21
Customized Definitions Table 7-3
Delete 7-7
Insert 7-7
Line Lengths 7-21
Line Types 7-8
Masks 7-6
Modify 7-7
Multiple Definitions 7-15
Operations 7-6
Patters 7-15
Phrase 7-5
Quotation Marks 7-21
Special Entries 7-14
Standard Definitions Table 7-3
Sub-tokens 7-5
Synonym 7-12
Syntax of Definitions 7-4
Tokens 7-4
User-defined Attributes 7-10
PARTIAL1
Data Comparison Calculator 11-21
Partition Method
Merge Split Utility 16-23
Pattern Files
Relationship Linker 10-21
pattern problems
Parsing Customization 7-37
Bad Name Patterns 7-37
Patterns

Creating and Working with TS Quality Projects

Index

Pasing Customization 7-15


Phrase
Parsing Customization 7-5
PNP 6-10
Portion of a Field
Data Reconstructor 13-8
Positions
Beginning 7-9
Default 7-9
Ending 7-9
Parsing Customization 7-9
Postal Directory Browser 9-20
City Level 9-20
Street Details 9-22
Street Level 9-21
Postal Matchers 9-9
Census Tables 9-9
DPV Tables 9-9
Match Level Codes 9-15
Postal Base Data File 9-12
Postal Directories 9-9
Postal Form Customer 9-13
Postal Form Database Date 9-13
Postal Form File 9-12
Postal Form Job Number 9-13
Postal Form List 9-13
Postal Level1 Data File 9-12
Postal Level2 Data File 9-12
Prcustom 6-16
Business Data Parser 6-32
Precedence
Data Reconstructor 13-6
Preferences
Customizing the Control Center
17-3

General 2-6, 2-7


Help 2-8
Preprocess House Number 6-18
Program Names

I-7

Command line execution 15-4


project
Control Center
project step 2-28
creating 2-9
input data and input DDL 2-12
multi-country 2-11
Name and Address Format 2-13
Properties 2-16
settings 2-10
summary 2-14
type 2-3, 2-10
custom project 2-3
standard project 2-3
Project Panel
Control Center 2-16
Project Step
Control Center 2-28
Project Viewer
Control Center 2-17
Projects
exporting 14-7
importing 14-7

Q
Quotation Marks
Parsing Customization 7-21

R
Real-Time Processing 14-11
Director 14-11
Recode
Attribute Modifiers 7-12
Record Length
DDL Editor 3-10
Record Name
DDL Editor 3-10
Redef
DDL Editor 3-10

Creating and Working with TS Quality Projects

I-8

Index

Redefine
DDL 2-40
reference file
Relationship Linker 10-24
Reference Level1 Number
Relationship Linker 10-26
Reference Level2 Number
Relationship Linker 10-26
Reference Linking
Relationship Linker 10-13, 10-24
Reference Record ID
Relationship Linker 10-26
Relationship Linker 10-3
Business level 10-13
Comparison Routines 10-13
Consumer level 10-13
Field Files 10-21
Pattern Files 10-21
Reference File 10-24
Reference Level1 Number 10-26
Reference Level2 Number 10-26
Reference Linking 10-13, 10-24
Reference Record ID 10-26
Window Key 10-3
Window Linking 10-13, 10-18
Window Size 10-22
Relationship Linker Results Analyzer
11-2, 11-3
fields to display 11-7
matched records 11-5
records to display 11-9
suspect records 11-5
Relationship Linker Rule Editor 11-12
Field List Editor 11-13, 11-14
Grade Pattern Editor 11-13, 11-14
Reserved Words
Data Reconstructor 13-5
Resolve Utility 16-27
MALINK 16-27

multi-linking 16-27
transitivity 16-27
Review Codes
Business Data Parser 6-36
Customer Data Parser 6-25
Review Groups
Customer Data Parser 6-26
ROMAJITOHIRAGANA or RTH 5-29
Round Robin Keys
Merge Split Utility 16-24
Round Robin Number
Merge Split Utility 16-24
Routine Modifier
Comparison Routine 11-21
Rule Script Language
Data Reconstructor 13-4
Rules File
Data Reconstructor 13-22
rules file
Data Reconstructor 13-3
Rules Files 4-6

S
Save view
Data Browser 3-6
Score
Data Comparison Calculator 11-21
Select and Bypass Records
Data Reconstructor 13-30
Select or Bypass Records 5-37
Select/Bypass Records
Logic Builder 5-37
Set Selection Utility 16-30
Sort Fields
Sort Utility 9-5
Sort Utility 16-33
.srt 9-2
Collating sequence 9-5
for Postal Matchers 9-2

Creating and Working with TS Quality Projects

Index

JUST_DUPS 9-7
KEEP_ALL 9-6
KEEP_NONE 9-6
KEEP_ONE 9-6
Sort Fields 9-5
Source Identification
Transformer 13-31
Special Entries
Parsing Customization 7-14
Split a File
Merge Split Utility 16-23
split rules
Merge Split Utility 16-19
Standard Definitions Table
Parsing Customization 7-3
Standardization
TS Qulity Analyzer 8-4
Start Pos
DDL Editor 3-10
Step Viewer
Control Center 2-19
Sub-tokens
Parsing Customization 7-5
Survivor record
Create Common Utility 12-8
Survivorship
Create Common Utility 12-3
Synonym
Parsing Customization 7-12
Syntax
Command line execution 15-3
Syntax of Definitions
Parsing Customization 7-4

T
Table Recoding 5-17
Title
File Display Utility 16-5

I-9

Tokens
Parsing Customization 7-4
transaction file
File Update Utility 16-10
Transformer 5-2
Character Translation 5-9
Field Scanning 5-10
File Trace Key 5-38
hex conversion 5-9
Source Identification 13-31
Table Recoding 5-17
transitivity
Resolve Utility 16-27
Trillium Types A-6
TS Discovery 3-12
TS Quality Analyzer 8-3
Cleansing 8-4
Master Database 8-8
Matching 8-4, 8-8
Standardization 8-4
Type
DDL Editor 3-10

U
Underscores
in city name changes 7-14
Unmatch Master File
File Update Utility 16-14
Update ORIGINAL_RECORD Length
DDL Editor 3-10
US City Problems
Parsing Customization 7-34
User Rule
Data Reconstructor 13-22
User-Defined Attributes
Parsing Customization 7-10
Using Multiple Input Files to Create an
Output DDL 5-7

Creating and Working with TS Quality Projects

I-10

Index

V
View input data
Data Browser 3-3

W
Window Key
Sort 10-10
Window Key Field 10-7
Window Key Generator 10-3
Window Key Rule 10-3
Window Keys 10-3
Window Key Rule

Window Key Generator 10-3


Window Key Rules
definition 10-6
Window Keys
Window Key Generator 10-3
Window Linking
Relationship Linker 10-13, 10-18
Window Size
Relationship Linker 10-22
Word Pattern Definition File 6-17
Business Data Parser 6-32

Creating and Working with TS Quality Projects

You might also like