TS Quality UserGuide

Creating and Working with TS Quality
Projects
Version 10.5
October 2006
Opening this package indicates your acceptance of the terms and conditions of the HarteHanks license agreement. The customer acknowledges and agrees that (a) the System and
all related documentation are confidential trade secrets of Harte-Hanks or Harte-Hanks
licensors and (b) title to and intellectual property rights in the System and related
documentation (including without limitation all copyright, trademark, trade secret and patent
rights) are and shall remain the confidential proprietary property and information of HarteHanks and Harte-Hanks licensors.
The customer shall use the system only in accordance with this Agreement. The customer
shall not disclose, copy, or reproduce any portion of the system or documentation in any form
to any third person without the prior written consent of Harte-Hanks, nor allow third parties
to do the same. The customer shall keep the System and all confidential information in the
strictest confidence.
Creating and Working with TS Quality Projects
October 2006
Trillium Software System is a registered trademark of Harte-Hanks. UNIX is a registered

trademark of UNIX System Labs, Inc. AIX, AS/400, CICS, OS/390, RS-6000, and NUMA-Q are
registered trademarks of International Business Machines Corporation. HP-UX is a registered
trademark of Hewlett-Packard Company. Windows NT, Windows 98, Windows 2000, Windows
XP are registered trademarks of Microsoft Corporation. Solaris and Java are registered
trademarks of Sun Microsystems. Unisys is a registered trademark of Unisys Corporation. ZIP
Code, ZIP +4 and CASS are registered trademarks of the U.S. Postal Service. PAF is a
registered trademark of the Royal Mail. InstallShield is a registered trademark and service
mark of InstallShield Corporation. All other brand names and products are trademarks or
registered trademarks of their respective companies.
Copyright Trillium Software a division of Harte-Hanks, Inc. 2006

All rights reserved.
TOC-1

CHAPTER 1
Introduction ................................................................ 1-1

Sample Project......................................................... 1-2
CHAPTER 2
Working with a Project ............................................ 2-1

Types of Projects ...................................................... 2-3
Using the Control Center ........................................... 2-4
Start the Control Center .......................................... 2-4
Set Up the Control Center........................................ 2-5
Creating a Project..................................................... 2-9
Understanding the Control Center Features .................2-16
Project Panel ........................................................2-16
Project Viewer.......................................................2-17
Step Viewer ..........................................................2-19
Using the Data Flow Architect....................................2-20
Graphics View .......................................................2-21
List View ..............................................................2-26
Using a Project Step ................................................2-28
The Data Dictionary Language (DDL) .........................2-33
Methods of Creating a DDL .....................................2-33
Creating a DDL Using the DDL Editor........................2-36
Creating a DDL in a Text Editor ...............................2-39
Type Keyword .......................................................2-42
CHAPTER 3
Investigating Your Data .......................................... 3-1

View Data Using the Data Browser .............................. 3-3
View DDLs Using the DDL Editor ................................. 3-9
Analyze Data Using TS Discovery...............................3-12
Identify the Problems with Data ................................3-13
CHAPTER 4
Using the Global Steps ............................................. 4-1

Using the Global Data Router ..................................... 4-3
Input and Output Settings ....................................... 4-3
Process Settings ..................................................... 4-5
Run the Global Data Router and View Results ............4-11
TOC-2
CHAPTER 5
Cleansing Your Data ................................................. 5-1

Using the Transformer .............................................. 5-3
Using Multiple Input Files to Create an Output DDL ..... 5-7
Conditionals............................................................5-21
Syntax .................................................................5-21
Operators in Conditional Statements ........................5-26
Operators for Asian Characters................................5-28
Build a Conditional Statement .................................5-35
Select or Bypass Records........................................5-37
Additional Settings.................................................5-38
Run the Transformer and View Results .....................5-39
CHAPTER 6
Standardizing Your Data ......................................... 6-1

Using the Customer Data Parser ................................. 6-3
Understanding Parsing Logic Flow ............................... 6-4
How the Customer Data Parser Identifies Business Names
6-5
CDP Parsing Process ............................................... 6-5
Customer Data Parser for China, Japan, Korea, and Taiwan
6-8
PREPOS ...............................................................6-12
Input and Output Settings ......................................6-14
Process Settings ....................................................6-16
Run the Customer Data Parser and View Results........6-25
Analyze Results .....................................................6-25
Statistics File ........................................................6-26
Using the Business Data Parser .................................6-27
BDP Parsing Process ..............................................6-27
Run the Business Data Parser and View Results .........6-36
CHAPTER 7
Tuning the Parsing Rules ........................................ 7-1

Understanding the Parser Definitions Tables ................. 7-3
Standard and User Definitions Tables ........................ 7-3
TOC-3
Syntax of Definitions............................................... 7-4
Synonym..............................................................7-12
Special Entries ......................................................7-14
Conventions in Parsing Customization ........................7-21
How to Customize the Parser Definition Tables for Japan.. 7-23
Clue Table ............................................................7-23
Name Tables.........................................................7-26
jp_bnp_name.txt...................................................7-27
jp_bnp_name_h.txt ...............................................7-28
jp_pnp_name.txt...................................................7-29
Using the Parser Customization Editor ........................7-31
View a Standard Definitions Table ............................7-31
View and Correct City Problems ...............................7-33
View and Correct Pattern Problems ..........................7-37
Save the Entries....................................................7-40
Re-Run Customer Data Parser .................................7-40
View Errors in Parsing Customization........................7-40
CHAPTER 8
Analyzing Single Data .............................................. 8-1

Using the TS Quality Analyzer .................................... 8-3
Start the TS Quality Analyzer ................................... 8-3
Data Entry and Cleansing ........................................ 8-4
Advanced Details.................................................... 8-7
Matching ............................................................... 8-8
Organize Database ................................................8-10
CHAPTER 9
Enriching Your Data .................................................. 9-1

Sorting for the Postal Matcher .................................. 9-2
Additional Settings.................................................. 9-6
Run the Sorting Utility and Check Results .................. 9-8
Using the Postal Matchers.......................................... 9-9
Match Levels.........................................................9-15
Dual Address Information .......................................9-17
TOC-4
Browsing the Postal Directory....................................9-20
City Level Directory ...............................................9-20
Street Level Directory ............................................9-21
Street Details........................................................9-22
CHAPTER 10
Linking Your Data .................................................... 10-1

Using the Window Key Generator...............................10-3
Run the Window Key Generator and View Results ......10-9
Sorting the Record by the Window Key ..................... 10-10
Input and Output Settings .................................... 10-10
Process Settings .................................................. 10-11
Run the Sorting Utility and Check Results ............... 10-12
Using Relationship Linker........................................ 10-13
Linking Examples .................................................. 10-14
Window Linking ................................................... 10-18
Run the Relationship Linker and View Results .......... 10-23
Reference Linking ................................................ 10-24
Run the Relationship Linker and View Results .......... 10-29
CHAPTER 11
Tuning the Linking Rules ....................................... 11-1

Using the Relationship Linker Results Analyzer ............11-3
View the Linking Results .........................................11-3
Edit Fields to Display..............................................11-7
Save Fields to Display ............................................11-8
View Records in a Range ........................................11-9
Using the Relationship Linker Rule Editor .................. 11-12
View the Linking Rules ......................................... 11-12
Customize the Field and Pattern Lists ..................... 11-15
Re-Run the Relationship Linker and View Results ..... 11-19
Using the Data Comparison Calculator ...................... 11-21
CHAPTER 12
Selecting the Best Record ..................................... 12-1

Using the Create Common Utility ...............................12-3
Additional Settings............................................... 12-10
TOC-5
Run the Create Common and View Results .............. 12-11
Create Common Decision Routines........................... 12-12
Decision Routine Selections for a Single Field .......... 12-14
CHAPTER 13
Manipulating Your Data ......................................... 13-1

Using the Data Reconstructor ....................................13-3
Rules File .............................................................13-3
Settings for the Data Reconstructor ....................... 13-22
Setting the Rules File ........................................... 13-22
Setting the Use Rule ............................................ 13-22
Additional Settings............................................... 13-24
Run Data Reconstruction and View Results.............. 13-26
Bringing the Data Together ..................................... 13-27
Add a Global Transformer step .............................. 13-27
Process Settings .................................................. 13-31
Run Transformer and View Results......................... 13-32
CHAPTER 14
Packaging Projects ................................................. 14-1

Batch Script............................................................14-3
Create a Script ......................................................14-3
Edit a Script..........................................................14-4
Run a Script .........................................................14-5
Create Multiple Batch Files ......................................14-6
Exporting/Importing Projects ....................................14-7
Export Projects......................................................14-8
Import Projects .....................................................14-9
Import Projects from Windows to UNIX................... 14-10
Real-Time Processing ............................................. 14-11
The Director ....................................................... 14-11
Moving From Batch to Real-Time ........................... 14-14
Linking Single Record Using the TS Quality Analyzer. 14-14
CHAPTER 15
Working from the Command Line ....................... 15-1

Executing TS Quality Modules....................................15-3
Syntax .................................................................15-3
Program Names ....................................................15-4
TOC-6
CHAPTER 16
Working with the TS Quality Utilities ................ 16-1

File Display Utility....................................................16-3
Outer Key and Inner Key ........................................16-3
Title and Delimiters................................................16-5
Field Settings ........................................................16-8
File Update Utility .................................................. 16-10
Match Keys and Fields .......................................... 16-10
Match Key Settings .............................................. 16-15
Transaction Output Settings.................................. 16-16
Frequency Count Utility .......................................... 16-17
Count Settings .................................................... 16-18
Merge Split Utility .................................................. 16-19
Using Multiple Input Files to Create an Output DDL .. 16-19
Merge Files ......................................................... 16-21
Split a File .......................................................... 16-23
Merge and Split Files............................................ 16-25
Resolve Utility ....................................................... 16-27
Link Field ........................................................... 16-29
Set Selection Utility ............................................... 16-30
Select Records .................................................... 16-31
Sort Utility............................................................ 16-33
CHAPTER 17
Customizing the Control Center .......................... 17-1

Changing the Control Center Display Settings..............17-3
APPENDIX A
The Data Dictionary Language and DDL Types .A-1

The Data Dictionary Language.................................... A-2
Data Dictionary Language (DDL) Types ....................... A-3
Encoding (Code Page) ............................................. A-3
Trillium Types ........................................................ A-6
Date Format .......................................................... A-8
TOC-7
CLASS Keyword ....................................................A-10
APPENDIX B
Parser Review Code ..................................................B-1

Parser Results .......................................................... B-2
Parser Completion Codes (CDP/BDP) ......................... B-2
Customer Data Parser Review Code/Review Groups .... B-3
Review Group Hierarchy .......................................... B-8
Business Data Parser Review Code...........................B-11
Customer Data Parser Review Codes/Review Groups for
Asia-Pacific Countries .............................................B-12
TOC-8
CHAPTER 1
Introduction
This book is intended for users who wish to learn how to use TS
Quality. It provides step-by-step instructions to set up a project and
process data. The book assumes that the users have installed TS
Quality Server, TS Quality Client, TS Quality Country Template
Projects and Postal Tables according to Installing TS Quality, and
read the introductory book, Getting Started with TS Quality.
This book covers the basic functions of TS Quality, but users should
also consult companion materials, such as TS Quality Reference
Guide and TS Quality Online Help to utilize the full capabilities of
TS Quality.
See Getting Started with TS Quality for the complete
list of TS Quality documentation and materials.
Introduction
1-2
Sample Project
Sample Project
In this book, a global sample project (TMT project) is used to
illustrate various TS Quality functions. The TMT (TrilMedTech)
project contains customer data from the United States, United
Kingdom, Canada and Germany. The record data consists of typical
business database fields:
Customer business name
Contact name
Phone number
Address information
Product information
Various dates
Account representative
Account status
Customer identification numbers
The goal of this sample project is to create a consolidated customer

view and to eliminate the poor data quality and redundancy in the
sample data. Through this project, you will complete several tasks:
analyze the data and identify issues
cleanse and standardize data elements
enrich address information and identify duplicate records
link duplicate records
package processes and create a batch file
At the conclusion of this initial batch process, the output file will
contain one contact name per business location.
2-1
CHAPTER 2
Working with a Project
2-2
In order to process your data, we strongly recommend that you first
create a project. A project includes a set of steps (core modules) for
centralized access and allows you to manage data processing tasks
easily. Projects are created in the Control Center, the graphical user
interface. Within a project, you can run processes, view data, create
and edit DDLs, modify settings, analyze output and tune the overall
process. Projects within the Control Center are mainly used to
create and test batch process flows for later use in a production
environment.
This chapter focuses on these topics:
Project types
Starting and setting up the Control Center
Creating and working with projects
For an overview of the TS Quality Control Center and
projects, refer to Getting Started with TS Quality.
Types of Projects
2-3
Types of Projects
A project is a combination of one or more modules and tasks that
process a particular set of data in a job flow. Each module in a
project is called a step. A project includes all required data files,
DDL files, settings files, output, statistics files, user-defined tables
and batch scripts for modules. Within a project, you can run the
entire job flow, from the Transformer to the Relationship Linker, or
only part of the flow.
There are two types of projects:
Standard Project - a basic project which includes
predefined modules
Custom Project - a complex project for advanced users
The Create New Project Wizard will guide you through creating a
project. You will be prompted to select a type at the beginning of
the Wizard. Both standard and custom projects may later be
modified by adding and deleting steps, or can be customized by
adding user-defined components.
2-4
Using the Control Center
Using the Control Center

Start the Control Center
To Start the Control Center
Make sure that

TS Quality
Server has
been
configured
correctly. Refer
to Getting
Started with
TS Quality for
server
configuration.
1.
Double-click the TS Quality v10.5 Server icon on the

desktop or select Start, Programs, Trillium Software
System, TS Quality, v10.5, Start Server.
2.
Double-click the TS Quality v10.5 Control Center icon on

the desktop or select Start, Programs, Trillium Software
System, TS Quality, v10.5, Control Center. This starts
the TS Quality Client and the Control Center.
3.
The Start Up screen appears.
Figure 2.1 Start Up Screen
Set Up the Control Center
2-5
The Control Centers main window is behind the Start Up

screen. The main window contains tool bars and a tools
palette to give you quick access to the most commonly used
tools and applications.
Refer to Getting Started with TS Quality for an
overview of Control Center Tools on the Tools Palette.
Main Menu
Tools Palette
Tool Bar
Work Area
Figure 2.2 Control Center Main Window

When you start the Control Center for the first time, you should set
up General Preferences for the basic Control Center settings.
General Preferences include several options:
2-6

Startup options
Default project directories
Project input staging area
Location of the Online Help directory and the Web browser
path
Editors and statistics viewer programs
TS Discovery launch directory
Text and color used within the Control Center
To set up General Preferences
1.
Select Setup from the main menu.
2.
Select Preferences. There are two tabs, General and

Display.
Figure 2.3 General Preference


3.
2-7
The General tab allows you to decide which applications or

functions to launch upon starting the Control Center. Select
or specify options based on the table below:
Option
Description
On Startup
Determines how the Control Center handles projects upon

startup.
Open the last project - The project that you were working
on in your previous session automatically opens upon
startup.
Default - No projects are launched upon startup.
Other Startup
Options
Select one or more of these check boxes to determine which

applications or features will be displayed upon startup.
Show Session Viewer- The Session Viewer opens upon
startup.
Show Toolbar - The Toolbar is displayed upon startup.
Show Tool Palette - The Tool Palette is displayed upon
startup.
Show Startup Page - The Start Up screen is displayed
upon startup.
Automatically Backup Projects (.prj only) - When
checked, a backup file of your .PRJ file is automatically
created.
Checking this option does NOT back up your entire project.
It simply creates a copy of your main .PRJ file.
Default Project
Directory
Enter the directory where project and step files will be stored.
Default: C:\TrilliumSoftware\tsq10r5s\mynewdir
Input Staging
Directory
Enter the directory where input data files for the project or step
will be stored.
Default: C:\TrilliumSoftware\tsq10r5p
Help Directory
Enter the directory where Help files are stored.

Default: C:\TrilliumSoftware\tsq10r5c\doc
My Editor
Enter the path and executable file of your text editor to display
and edit text files within the Control Center.
My Statistics Viewer
Enter the path and executable file of the application used to

display statistics files within the Control Center.
2-8

My browser
Enter the path and executable file of your Internet browser,

used to display on-line documentation under the Help Menu.
Example: C:\Program Files\Internet Explorer\IEXPLORE.EXE
In order to access the online manuals, you must specify a
default web browser.
Discovery Launch
Directory
Enter the directory path used to launch TS Discovery.
4.
Click OK.
See Changing the Control Center Display Settings on

page 17-3 for display settings.
To get Help
Once you have specified a web browser in Control Center
Preferences, you may view the online help manuals.
1.
Select Setup, Preferences.
2.
On the General tab, set My browser to:

C:\Program Files\Internet Explorer\IEXPLORE.EXE
3.
Select OK to close the Preferences window.
4.
From the main menu, select Help. The TS Quality option

opens the home page of the TS Quality documentation set.
5.
The TS Quality Control Center Help opens the documentation

for the Control Center.
6.
TS Quality on the Web will automatically connect you to the

trilliumsoftware.com website if you are connected to a
network. Once on the website, you can access technical
support, software upgrades and downloads, educational
offerings and more.
7.
Program-specific help is also available on the Advanced tab

of each program step.
If you are a new user, be sure to register on the

www.trilliumsoftware.com website for a wealth of
technical user information and support.
Creating a Project
2-9
Creating a Project
The Control Center allows you to create a standard or a custom
project. The standard project option is recommended for new users
and may later be modified to meet your specific data cleansing
needs. The custom project option is used to create a more complex
project and is recommended for more experienced users. The
Project Wizard will guide you through the project creation
process.
In order to create a TS Quality project you will need certain
information:
The name and location of your input data file(s). The input
data file(s) should be either:
a fixed field file or
a delimited file
The name and location of your input Data Dictionary

Language (DDL) file(s). The input DDL file(s) should be in
either:
XML format (.ddx) or
text format (.ddt)

See The Data Dictionary Language (DDL) on page 233 for detailed information on DDL files.
To create a project, follow this process:

Select a project type
Specify project settings
Specify input data and input DDL files
Set up name and address format
Review the project summary
2-10
Creating a Project
To select a project type
1.
From the main menu select File, New Project. The Create
New Project Wizard appears.
2.
On the Choose Project Type window, select either the

Create a standard project option or the Create a custom
project option.
3.
Select Next.
4.
In the Choose Project Option window, select one of the

following options:
Option
Description
Standardize
Identifies, verifies and normalizes data.
Standardize and
Enrich
Identifies, verifies and normalizes data. Improves

data using the Postal Matchers.
Standardize,
Enrich and Link
Identifies, verifies and normalizes data. Improves

data using the Postal Matchers. Groups data by
identifying relationships and by applying specific
linking rules.
Other Custom
Process
Include separate components comprising the options

above.
To specify project settings

1.
Select Next. On the Specify Project Settings window,

configure the following settings:
Settings
Description
Project Name
Name of the project.
Project Directory
Path
Project location on the server.

You can create a project anywhere but it must
be located on the server where the TS Quality
Server application has been installed.
Single or Multiple
Country Project
Specify whether the project contains data from one

or multiple countries.
Creating a Project
Input Files
Country of Origin
2.
2-11
Select a country for your input data.

If you selected Multiple-Country(Global) Project
in the option above, this option is not available.
Select Next.
Figure 2.4 Project Settings

Multi-country project
3.
If you select Multiple-Country (Global) Project, follow these

steps. If not, go to step 4.
The Select Global Project Countries window will
indicate what country template projects are installed on
the server. Select all countries you are using, and Add
them to the box on the right. Use the CTRL key to make
multiple selections.
Specify if you are using a single input file or multiple
country input files. If you are using multiple country
input files, you must select define input files now or
define input files later. If you define input files now,
2-12
Creating a Project
provide the input file name, format, and DDL in the
Specify Multiple Inputs window. Click Next.
Valid
delimiters are
Tab, Space,
Semicolon,
Comma, and
Pipe.
Characters
other than
those listed
must be
enclosed by
quotation
marks.
Figure 2.5 Select Global Project

To specify input data and input DDL
1.
On the Specify Input Data and Format window, use the

File Chooser
to select the input data file name.
2.
Specify whether the input file format is Fixed or Delimited.

If the file is delimited, select the delimiter from the dropdown list and define whether the input file has a header or
not.
If you dont have a DDL file for your delimited input, it
will be created automatically using the header as field
names.
Creating a Project
3.
If you are
using a
delimited file
as input in the
Wizard, the
subsequent
input files and
all output files
in the project
become fixed
field files.
2-13
Use File Chooser

to enter the Data Dictionary Language
(DDL) file name. Select Next.
Figure 2.6 Specify Input Data and Format

4.
If you are creating a custom project, the Select Project

Components window now appears. Select the desired
project components. The order of your selections will
determine the sequence of steps in your project. Select
Next. (If you are creating a standard project, skip this step
and go to the next step.)
To set up name and address format

1.
At the end of the previous step, the Set Up Name and

Address Format window appears. Here, you can drag and
drop name and address field names onto the Name and
Address Palette as in a typical mailing label format. The
Dictionary Field Names box shows all of the field names
2-14
Creating a Project
found on the input DDL file. Select the field and drag it to the
Name and Address Palette. The actual record data is
displayed in the Preview Name Address area.
After dragging selected fields to the palette, you can
make multiple fields single-line by editing them in the
palette.
The Apply
button must
be selected for
the Control
Center to
accept your
desired name
and address
format.
Figure 2.7 Set Up Name and Address Format

2.
Review your records in the specified format, using the View

Records buttons. Click Apply to accept the data format.
To review the summary of project

1.
Select Next. The Summary window indicates the options

that you have selected for this project.
2.
If you need to change these options, click Back to return to

the appropriate window. Click Finish to accept these settings
and create the new project.
3.
The status bar at the bottom of the Control Center will

indicate that it is copying the appropriate country templates
and building the project components.
Creating a Project
4.
2-15
When the process is complete, the Data Flow Architect

area of the Control Center will be populated with the new
project.
Figure 2.8 Data Flow Architect - New Project
2-16
Understanding the Control Center Features
Understanding the Control Center

Features
The Control Center consists of three layers:
Project Panel
Project Viewer
Step Viewer
Project Panel
The Project Panel is displayed when the Control Center is opened.
Existing projects appear as a suitcase icon labeled with the users
hostname and the project name.
To explore the Project Panel
1. Click
to close an open project and to view the Project
Panel.
Project Panel
Figure 2.9 Project Panel

2.
Right-click the project icon. From this contextual menu you

can Open, Delete or view the projects Properties. Select
Properties. There are two tabs, General and Contents.
The General tab displays basic information about the
project such as Name, Type, Owner, Version, Creation
Date, Last Modified, Last Executed, and Location.
Project Viewer
2-17
The Contents tab displays content-related information

about the project such as Country List, Module List and
Comments.
Right-click and
select
Properties
Figure 2.10 Project Properties
Project Viewer
The Project Viewer displays all modules or steps within a project.
To explore the Project Viewer
1.
Double-click the project icon. The Project Viewer opens.
2.
The Project Viewer contains three views:
2-18
Project Viewer
Project Components View
Project Viewer
Figure 2.11 Project Viewer

Project Components View lists the project steps: first
by country, and then by steps within that country.
Graphics View displays steps in order of processing,
using a graphical flowchart format.
List View lists steps in order of processing.
See Using the Data Flow Architect on page 2-20 for
more information about these views.
Step Viewer
2-19
Step Viewer
In the Step Viewer you can set up the module, specify input and
output files, modify program tasks and conditions, customize rules,
run the module, and view and analyze output files, statistics and
logs.
To open the Step Viewer
1.
Double-click either the module icon in the Graphics View,

the module in the List View, or the module in the Project
Components View.
Figure 2.12 Step Viewer
2-20
Using the Data Flow Architect
Using the Data Flow Architect

Once your project has been created, the Data Flow Architect
(DFA) presents your project in the Graphics View. The DFA lets
you review and modify the data quality process. Step modules are
displayed in a flowchart model, with connection arrows used to
identify the flow of data. You can create step connections and job
flows to run in batches. These flow charts can be customized and
printed for easy illustration of the data quality process.
Figure 2.13 Data Flow Architect - Graphic View
Graphics View
2-21
Graphics View
In the Graphics View, you can perform various step-specific tasks:
run, rename, and move steps
delete and connect steps
copy steps
change settings files
Right-click on a
step
Figure 2.14 Menu from a Step

To run steps
1.
To run a single step, right-click it and select Run Selected

from the pop-up menu.
2.
To run multiple steps, use CTRL+click to select several steps.

Once the steps are selected, right-click and select Run
Selected.
3.
To run steps that are connected, right-click on the desired

starting point and select Select All Downstream, All
Dependencies or Whole Flow. Once you make the
appropriate selection, right-click and select Run Selected.
2-22
Graphics View
To rename steps
1.
Right-click a step and select Rename from the pop-up menu.
2.
Enter a unique step name and click OK.
To move steps
1.
To move a single step, click and hold the step and drag it to a
new location.
2.
To move the entire job flow, click on the first step, hold down
the CTRL key, and click all the other steps in the job flow. You
may now drag the complete flow to a new location. Or, rightclick a step and select the Select All Downstream option,
then drag it to a new location.
To connect steps
1.
To connect two steps, right-click the first step and select

Start Connection, then click the second step. OR, click the
connection area on the first step and click the second step to
connect it.
To remove a connection
1.
To remove a connection, right-click the step and select

Remove Incoming Connection or Remove Outgoing
Connection.
To move a Connection Area

1.
Position your cursor over the connection area on the step

until it changes to a cross hair. Right-click and select Move
to Bottom, Move to Top, Move to Left or Move to Right.
Incoming
Connection Area
Outgoing
Connection Area
Figure 2.15 Step Connection Area
Graphics View
2-23
To copy a step module

1.
To copy a step module, right-click the module and select

Copy Selected.
2.
In the list view, select the module to copy from the list and
click the Copy Selected Step button in the toolbar menu
above.
To change a settings file

1.
To change a settings file, right-click a step module and

choose Change Settings File.
2.
Select the settings file you want to use to replace content in

the steps current settings file.
You must select a settings file of the same type, for example,
if the step is a Transformer step, you must select a
transfrmr.stx file.
3.
A confirmation dialog will appear. Click Yes to copy the

contents of the selected settings file.
2-24
Data Flow Architect Settings

In addition to the step-specific tasks, you can make changes to the
Data Flow Architect itself:
Lock steps
Add comment
Select all steps
Add new steps
Print the Data Flow Architect
Set preferences
For Preferences settings, see Set Up the Control Center
on page 2-5. Also see Changing the Control Center
Display Settings on page 17-3.
Figure 2.16 Menu from the DFA

To lock steps
1.
Right-click anywhere inside the DFA (except on a specific

step) and select Lock. This will lock all steps into place.
Remember to unlock the DFA if you wish to add, delete or
move a step.
2-25
To add comment
1.

step) and select Add Comment.
2.
Enter a comment in the Edit Comment window and click OK.

The comment is inserted in the DFA window. You can also
drag this comment to another location.
3.
To edit comments, right-click on the comment and select

Edit, Resize, Hide, or Delete. You can also select Show All
Comments, Show Comment Borders, and Delete All
Comments from the DFA menu.
To select all steps

1.

step) and select Select All Steps. This will select all steps in
the DFA.
To add new steps

1.

step) and select Add New Step from Palette. This will
open the Step Palette on the left side of the DFA.
2.
Select a step from the Step Palette: Drag and drop it on the
DFA. Choose a country, and provide a name for this step.
Then click OK.
To print Data Flow Architect

1.

step) and select Print Data Flow Architect. The Page
Setup window opens.
2.
Specify the page settings and click OK.

You have several print options:
Display landscape paper boundary
Display portrait paper boundary
Display architect title imprint
2-26
List View
List View
In the List View, you can view steps in the order in which they will
be processed. A step may be opened by double-clicking it. From the
List View you can perform several tasks:
Open, rename, add, delete, and reorder steps
Generate a batch script to run selected steps
For information on batch scripts, See Batch Script on
page 14-3.
To open the List View
1.
Select the List View tab to view the project steps.
2.
Click a step in the List View and the tool bar options become
available.
Tool bar
Figure 2.17 List View Tab

To open the step
1.
In the List View, highlight a step.
2.
Click
on the tool bar.
To rename the step

1.
2.
Click
3.
In the Provide a Unique Step Name box, enter the new

name for the step. Click OK.
on the tool bar.
To add steps
List View
2-27
1.
2.
Click
on the tool bar. Step Palette appears on the
left. Drag and drop the desired step into the List View.
3.
In the Choose Country Name box, select a country from

the drop-down list. Click OK.
4.
The new step is added after the step you highlighted.
To delete steps
1.
In the List View, highlight one or more steps.
2.
Click
on the tool bar.
To move steps
1.
In the List View, highlight one or more steps.
2.
Use the up and down arrow keys to move the steps into the
desired order for processing.
Move selected step(s) up
and down
2-28
Using a Project Step

A Project contains a series of steps. The configuration of a step
window is the same for all steps. Therefore, the following steps are
the same for all modules.
To open a step
1.
Double-click the step in the Project Steps By Country list,

or double-click the step icon in the Data Flow Architect
pane. The Step Window appears. The Step Window
contains three tabs:
Input Settings
Output Settings
Results
The input, output and other settings are explained in detail

for each step in subsequent chapters. This section
provides information on the general procedures for a
project step.
Input
Settings
tab
Use the Input Settings tab to specify the Input File Name and
Input DDL Name.
To specify input files
1.
Type a file name in the Input File Name and Input DDL
Name text boxes. You can use the File Chooser button to
select the files.
2.
Click Add. The file name is dynamically added to the table in

the Input Data File Name and Input DDL Name columns.
To replace the input files

1.
Type a file name in the Input File Name and Input DDL
select the input files.

2.
2-29
Click Replace. The file names in the Input Data File Name
and Input DDL Name column are replaced with the files you
just specified.
To delete the input files

1.
Highlight the row in the Input Data File Name and Input
DDL Name column that contains the file names you want to
delete.
2.
Click Delete.
The Data Browser

can be invoked to browse the
input file. The Dictionary Editor
can be invoked to
view or edit the DDL. The Comment icon
allows the
user to add comments and notes related to the step.
Input Settings Tab
Comment
Data Browser
DDL Editor
Entry List
Figure 2.18 Step Window - Input Settings
2-30
Output
Settings
tab
The Output Settings tab lets you specify the Output File Name,
the Output DDL Name, the Statistics File Name, and the
Process Log Name.
To specify output files
1.
Type a file name in the Output File Name and Output DDL
select the files.
2.
Type a file name in the Statistics File Name and Process

Log Name text boxes. You can use the File Chooser button
to select the files.
Output Settings Tab
Figure 2.19 Step Window - Output Settings

Advanced
Settings
Most step configurations are made in the Advanced Settings

Window. Advanced Settings options allow the user to customize
settings for each step. The appearance of the Advanced Settings
window varies depending on the step.
To open Advanced Settings

1.
2-31
Click the Advanced... button from the step.

Advanced Settings
Figure 2.20 Step Window - Advanced Settings

Results tab
The Results tab displays output information related to the steps

execution.
Statistics - The Statistics tab shows statistics from the
run, which may be viewed using the My Statistics Viewer
icon
or the Spreadsheet Viewer
. (You can specify
the editor to use as the My Statistics Viewer when setting
up Preferences for the Control Center.) The Spreadsheet
Viewer displays the statistics in an MS Excel format.
Process Log - The Process Log tab displays processing
statistics from the step run.
Error Log - The Error Log tab displays any errors
encountered during the step run. Process and Error Logs may
be viewed using the Text Viewer
.
2-32
Save and Run a Step
Figure 2.21 Step Window - Results

If the Process Log exceeds the capacity of the window, you
can click the Text Viewer icon to display the entire file in
a separate window.
Save and Run a Step

After you finish configuring your settings, you can save your
settings without running the step, or run the step.
To save a step without running
1.
Click Save to save your settings.
To run a step
1.
Click Run at the bottom of the step, or right-click the step

icon and select Run Selected. Clicking the Run button saves
your settings by default and then runs the program.
Select the Save button on a step to save any changes

made to the settings if you are not going to Run the step.
Changes are automatically saved when a step is Run.
The Data Dictionary Language (DDL)
2-33
The Data Dictionary Language (DDL)

The Data Dictionary Language (DDL) is a collection of English
statements used to define file and record layouts. DDLs are used
throughout the TS Quality system. A file that contains DDL
components is called a DDL file. DDL files are either in xml format
or in text format.
XML Format
File extension is .ddx (Example: input.ddx)
Text Format
File extension is .ddt (Example: input.ddt)
See Chapter 2 in Getting Started with TS Quality for
the location of default DDL files in the directory structure.
Methods of Creating a DDL

You can create DDLs by the following methods:
Data Dictionary Editor (DDL Editor)
You can use the Data Dictionary Editor in the Control Center
to create a DDL or modify an existing DDL. The default
format for the DDL Editor is XML. The users can convert XML
files to text files or text files to XML files in the DDL Editor.
Any Text Editor
You can use any text editor to create DDLs in text format.
Special text formatting, such as underline or bold, should not
be applied because the software will be unable to read it.
Delimited Files Considerations

The input and output files for TS Quality can be fixed-field files or
delimited files. Internally, the delimited file's records will be put into
a fixed format for processing according to the DDL.
2-34
Keywords in a DDL
For delimited files, every field in the DDL should reflect the
maximum field length.
For example, if you have a field on input called ADDR_LINE_1 and
the value is "10 Main St", then a field length of 10 bytes for that
field will be sufficient, but a field length of 8 bytes will truncate to
"10 Main ". If you have that field on output and the line was
changed to "10 Main Street" by processing, then a field length of 10
will truncate the output to "10 Main St". Make sure that you have
enough field length for each field on the DDL for delimited files.
Keywords in a DDL
A DDL uses the keywords shown in Table 2.1. Required keywords
are listed in bold.
Table 2.1 DDL Keywords
Keyword
Description
Record
Name
The name of a record in the DDL. 1 to 32 characters long. If it contains

embedded spaces, they must be enclosed in double quotes.
Record
Length
The total record length in bytes. The total length of the record must be
equal to the sum of the lengths of all fields.
Field Name
Name of the field. If it contains embedded spaces, they must be

enclosed in double quotes.
At least one field statement per file is required.
Maximum 32 bytes.
Field names should only contain letters, numbers, and
underscores.
Type
Data type for the field. You can specify the appropriate character
encoding or other type of value.
See Type Keyword on page 2-42.
Redefine
Redefine the field to a specific byte position in the record.

See The REDEFINE Function on page 2-40.
Start
Position
The relative byte position of a field within the record. DDLs are zerobased. Therefore, the first field of a record generally begins in column
zero.
Keywords in a DDL
2-35
Table 2.1 DDL Keywords (Continued)

Keyword
Length
Description
The length of a field in bytes. The number must be a positive integer
greater than zero.
If the entity is a field, the length must be less than the Record Length.
Two fields cannot occupy the same space, unless one field is a
redefinition of that field. If the entity is a subfield, the length must be
less than the parent field.
The sum of all field lengths must equal the length of the record.
Default
The default value for the field. The value must agree in type with the
Type. Numbers may be positive or negative.
Values:
SPACES fill the field length with spaces.
-1 for a numeric with a negative value.
0 for a numeric.
'0' for a character field.
"0" for a string field.
Comment
The comment for the field.
Attributes
This allows data in the field to be passed through a TS Quality step

without any data interpretation or translation. Any field type can be
used because there will be no data translation.
Value:
NONVALIDATION data in the field will remain as is.
2-36
Creating a DDL Using the DDL Editor

Table 2.1 DDL Keywords (Continued)
Keyword
CLASS
Description
Converts any 2-digit year into a 4-digit year. If used, it must
immediately follow a Field statement.
Values:
DATE BACKWARD
DATE FORWARD
DATE WINDOW {nnn}
If used, CLASS is required to be on the input DDL. See CLASS
Keyword on page A-10 to learn more about the Class
specifications.

You can create a DDL from a fixed-field data file or a delimited data
file. Complete the following steps to create a new DDL.
To create a DDL from a fixed field data file
1.
Open the DDL Editor from the Control Center. Select New
from the File menu. A new empty DDL opens.

2.
2-37
Select DDL Builder from the Tools menu. In the Select

Data File section, enter the file name and record length, and
select the encoding for your data file.
Figure 2.22 DDL Builder

3.
In the Record section, highlight a portion of the record you

want to make a field in the DDL.The Start and End Position
automatically appear in the windows.
4.
Specify the Field Name and select the Field Type.
5.
Click Add to DDL.The new field will be added to the DDL

table.
6.
Repeat this process until all fields are defined in the DDL.
7.
Save the DDL, using a .ddx extension to the dictionary file

name.
2-38

To create a DDL from a delimited data file
If you are
using the
Project Wizard
to create a
project and
you dont have
a DDL file for
delimited
input, it will be
created
automatically
using the
header as field
names.
1.
Open the DDL Editor from the Control Center.
2.
Select New from the File menu. A new empty DDL opens.
3.
Select Tools, Create DDL from Delimited File. Select the

delimited filename and delimiter.
4.
Specify the output DDL filename. The first part of the

delimited file will be displayed in the Sample Data Preview
window.
Figure 2.23 Generate Dictionary from Delimited File

5.
Click Create.The new DDL will be automatically created.

Save the DDL using a .ddx extension to the dictionary file
name.
For delimited files, every field in the DDL will reflect the
Creating a DDL in a Text Editor
2-39
maximum field length.
Creating a DDL in a Text Editor

When creating a DDL in a text editor, make sure to include
keywords and follow the set grammar that must be used in creating
DDLs.
Syntax
Use the following syntax:
Keyword [is, are, in] Parameter
Keywords are case-insensitive.
For example, the following keywords all mean the same
thing:"Field", "FIELD", and "field".
Brackets
The actual brackets [ ] are not physically entered on a DDL
file. All punctuation and noise words such as "is", "are", and
"in" can be used. They are highly recommended to make
subsequent reading more understandable.
Parameters are case sensitive.
All name and string value parameters are case sensitive.
String values are enclosed within double quotes. (example,
Hello World)
Tab characters are not allowed in a DDL.
Always define until the last carriage return.
Comments can be enclosed between the string pairs "/*"
and "*/", or can be indicated by the prefix string "//".
2-40
DDL Components in Text Format
Example
/* This is a comment that extends over two lines
delimited by the slash and asterisk pairs */
//This is a comment to the end of this line
DDL Components in Text Format

A text DDL consists of two main sections: Record and Body
information.
Type is FIXED
Record Information
Length is 200
Field is input_line_1
Type is ASCII
Starts in column 0
Length is 50
Type is NOTRANS
Starts in column 50
Length is 50
Default is 0
Class is DATE FORWARD
Type is ASCII
Starts in column 100
Length is 50
Body Information
Type is ASCII
Starts in column 150
Length is 50
Attributes are NOVALIDATION
The REDEFINE Function

By redefining fields with the REDEFINE keyword, you can use part
The REDEFINE Function
2-41
of the field or the same field with a different name in the output.
Redefining fields requires listing two fields: the field to be redefined,
followed by a field listing that is the redefinition.
The Starts in position may be maintained manually. However, the
automatic renumbering of Starts in position is facilitated through
the //REDEFINE statement. When the Recalculate Positions
function in the DDL Editor encounters the string //REDEFINE ahead
of a pair of field definitions, it will not increment the Starts in
number for the second field definition.
Type is FIXED
Length is 200
//REDEFINE
Field is ORIGINAL_RECORD
Type is ASCII
Starts in COLUMN
0
Length is 200
If you are using a

delimited file for
input, you cannot use
the Redefine function
on the output DDL.
Type is ASCII
Starts in COLUMN
0
Length is
100
Type is ASCII
Starts in COLUMN
100
Length is
100
2-42
Type Keyword
Type Keyword
The Type is required for every field entity. There are two Type
categories: encoding (code page), and date format. The
following list shows the main values used for the Type keyword.
Encoding (Code Page)
Encoding is a mapping of binary values to code position to
represent characters of data. It is also called a code page.
The main character encoding used in TS Quality includes
ASCII, Latin1 and Latin2.
See Appendix A for the complete list of Encoding.
Date format
Date format is a type of data which may contain only valid
dates.
See Appendix A for the complete list of Date format.
Class keyword
Class keyword specifies the format to be used for the date
field. By using the class keyword, you can convert any 2-digit
year into a 4-digit year.
See Appendix A for the complete list of Class
keywords.
3-1
CHAPTER 3
Investigating Your Data
3-2
After you create a project, you must investigate your data before
working with any processes. Investigation helps you determine how
well your data conforms to rules that govern acceptable limits and
requirements for data elements, and helps you understand what
data quality processes need to be put in place. Investigate your
data with the Data Browser, DDL Editor, and TS Discovery.
This chapter focuses on four tasks:
View data using the Data Browser
View DDL using the DDL Editor
Analyze data using TS Discovery
Identify problems with the data
View Data Using the Data Browser
3-3

For detailed
information
on the Data
Browser, see
the Online
Help.
The Data Browser lets you view a data file to verify its format as
described by the data dictionary language (DDL) file. You can verify
the format on either a record-by-record or on a field-by-field
basis.
To open the Data Browser and view the input data
1.
Double-click the projects suitcase icon in the Data Flow

Architect and open the project. Existing projects are shown
as a suitcase icon with the users hostname and project
name.
Project
Figure 3.1 Project in the Data Flow Architect

2.
Double-click on the first step (for example,

inputTransformer), and open the step.
First step
Figure 3.2 inputTransformer Step
3-4

3.
On the Input Settings tab, select the first entry in the entry
listing options. The input file name and corresponding DDL
file name will already be populated. These files were
specified during the Creating Project Wizard process (See
Creating a Project on page 2-9).
Entry Listing
Figure 3.3 Input Settings Tab

4.
Select the Data Browser

icon next to the input file
name. The Data Browser opens the input file with its
corresponding DDL.
You can also open the Data Browser from the Tools
Palette by double-clicking the Data Browser icon. In
this case, you must select the input file and DDL to
view. Opening the Data Browser from within a step
automatically opens the tool, the file, and its
corresponding DDL.
You can sort the

fields. Click on
the Field
Name, Start
Position,
Length or Type
column
headers.
5.
The Field Selection window opens. This window shows all

the fields that exist in the input DDL.
Select the fields you want to display in the upper pane and
click Add. To select all the fields, click Add All.
7. After the fields appear in the Selected Fields list box, you
have several options:
6.
Clear all the fields by clicking Clear
3-5
Change the position of a field(s) or delete it by selecting

the field, and then clicking the Arrow button
or
to
move it up or down, or the Delete button
to delete it
The order of the

fields
determines the
order in which
they will be
displayed when
you browse the
records.
Save the selected fields (called the view) in a file by

clicking the Save button
See the next procedure To save the view for more details
on saving the fields.
Figure 3.4 Field Selection Window

8.
Click Display.
3-6

9.
You can display

the data by
Record
Numbers or
Byte Offsets.
Select either
option in the
Options menu.
Browse the data and verify that the field names reflect the
data contained within them.
Figure 3.5 Input Data

To save the view
You can save or store a view of data in the Data Browser. If you
frequently look at the same fields in a file, saving a view can
save time.
1.
In the Field Selection window, select the fields you want to

view using the CTRL Key. For example, select Phone,
Country, Start_date and Product_type.
2.
Select Add to add the selected fields.
3.
Click Save
to save the selected fields.

4.
3-7
The Save window opens. Name this view and save it in the
desired directory.
The view file will

have the
extension of
.cuv.
Figure 3.6 Save View File

To view a stored view
1.
To view a stored view, click Load

in the Field Selection
window. The Customized View window will show all stored
views.
3-8

2.
Click the view name and select OK. The fields will be loaded
in the Selected Fields window. Select Display to view the
stored fields.
Figure 3.7 Customized View Window

3.
Select File, Exit and close the Data Browser.
View DDLs Using the DDL Editor
3-9

For detailed
information
on the DDL
Editor, see
the Online
Help.
The Data Dictionary Editor (DDL Editor) lets you view existing
data dictionary language (DDL) files.
To open the DDL Editor and view a DDL
1.
On the Input Settings tab, click Dictionary Editor

next
to the input DDL name. The DDL Editor will open the input
DDL file. The DDL is displayed in a table.
Figure 3.8 Data Dictionary Editor
3-10

2.
Window/Option
Description
Record Name
The records name.
Record Length
Total length of the record represented by this DDL in

bytes.
Update
ORIGINAL_
RECORD Length
The ORIGINAL_RECORD length update option. See the

Online Help for details.
3.
You can edit

all items in
the column
from this
table. See
the Online
Help for
details.
The upper frame shows the Record Name, Record Length and
Update ORIGINAL_RECRD_LENGTH option:
The lower spreadsheet shows the details of the selected DDL.

Refer to the following table and verify each item in the DDL:
Column
Description
Field Name
DDL fields listed row by row, in the order that they appear
in the DDL. The standard field names are displayed in blue.
Other unique field names are displayed in black.
Type
Field type (encoding).

See Encoding (Code Page) on page A-3 for details of
encoding.
Redef
(Redefine)
Indicates whether the field is redefined to a specific byte

position in the record.
Y = field is redefined
blank = field is not redefined
Start Pos.
The zero-based byte position where the field begins in the

record.
Length
The length of the field in bytes.
3-11
Default
Default value for the field.
Comment
The comments for a field.
Attribute
Indicates whether the data is to be passed through a step

without any validation.
NOVALIDATION = data in the field remain as is
even if it is in a different field type.
Class
Converts a 2-digit year into a 4-digit year.
4.
Select File, Exit and close the DDL Editor.
5.
Click
in the upper right-hand corner to close the step.
3-12
Analyze Data Using TS Discovery
Analyze Data Using TS Discovery

TS Discovery is a data profiling tool used to discover and analyze
data quality. If you want to analyze data in more detail to reveal
data anomalies, broken data rules, misaligned data relationships,
and other characteristics, we recommend using TS Discovery before
running other TS Quality processes.
One (1) license for TS Discovery is included with the TS Quality
Client. You can launch TS Discovery by clicking the TS Discovery
icon
from the Control Center Toolbar.
Instructions for TS Discovery are not included in this book. Refer to
TS Discovery manuals for more information.
Figure 3.9 TS Discovery
Identify the Problems with Data
3-13

By browsing the input data and input DDL files, you can identify
many issues with data such as misspelling, inconsistent format,
incorrect entries and duplicate records. Depending on the problems
with data, you must decide what cleansing and standardization
processes are necessary.
For example, the following issues have been identified in the data in
the sample TMT project:
The input file contains data from multiple countries (US, CA,
DE, GB)
The Phone number field has variations in the phone formats
The country names in the Country field are inconsistent
The current data format in the Start_date field is different
than the date format in the Last_contact_date field
There are different values for the same products in the
Product_type field
There seem to be misspelled addresses in the data
There are duplicate records across the data
Those issues will be corrected in the subsequent chapters of this
guide. First the global data is separated to four (4) input data files.
Next, data cleansing and standardization are performed at each
country level. After the addresses are validated and corrected, the
records are linked to identify the duplicate data. At the end of the
process, the best records with most recent information will be
output and a batch script for the entire process will be created for
production use.
3-14
4-1
CHAPTER 4
Using the Global Steps
4-2
After you have investigated the data and identified the issues, you
can begin to process the data. First, use the Global Data Router to
separate the multi-country input file into country-specific files. One
advantage to running the Router step before cleansing and
standardizing your data is that it enables data to be standardized at
the country level. This ensures that further processing is done at a
country-specific level.
In this chapter, you will perform these tasks:
Specify the input and output files
Identify the rules files used to determine the country of origin
Identify the Global Geography table, which contains state,
city, locality, post code and word/pattern structures
Define the settings for the Global Data Router. These include
the ability to:
Use a Country Code field to identify country of origin
Review the country list and determine the countries which
are available to the Global Data Router
Modify the default list of fields to scan for the country of

origin
Run the Global Data Router and view results.
Using the Global Data Router
4-3
Using the Global Data Router

The Global Data Router scans an input file that contains record
data from more than one country, identifies the country-specific
data, and then creates one output file per country that contains only
the data specific to the country you selected.
The Global Data Router uses Rules Files that contain countryrelated word definitions and tables. These rules specify how many
output files to generate and which countries are identified. The
Router supports input data from most countries.
Input and Output Settings

Tip:
You can either
edit the file names
manually or click
the File Chooser
icon to browse to
the appropriate
file and
select it.
To view the
contents of your
data file,
click the
Data
Browser
icon.
Use the
Dictionary
Editor to view the
contents
of the
DDL file.
Since the Global Data Router step is usually the first step in the
project, it uses the Input File Name and Input DDL Name specified
in the Project Wizard as the default inputs.
To specify input and output files
1.
Open the Global Data Router step and click the Input
Settings tab.
2.
Enter file names in the Input File Name and Input DDL
Name text boxes.
3.
Click the Output Settings tab.
4.
Enter file names in the Output File Name and Output DDL
Name text boxes.
Separate Output
If you want a separate output file for each country, select
Generate a separate output file per country. When
this option is selected, an underscore(_) and an asterisk
(*) will be added automatically to the filename you
specified in the Output File Name text box. After
processing, each output filename will include a country
suffix in lower case. For example, the US data will be
named <filename>_us, and the Canadian data will be
named <filename>_ca.
4-4

Single Output
If you are generating a single output file for all countries,
deselect Generate a separate output file per country.
In this case, all data, separated by country, will be
written to the single output file you specified.
If you provided Name and Address data in the Project
Wizard, the output DDL will contain a series of redefines for
the Name and Address data. Redefines are used to map the
customer-defined field name to the TS Quality Name and
Address reserved DDL field name. The input fields are
mapped to the reserved TS Quality name and address field
names INPUT_LINE_01 through INPUT_LINE_10. If a
name or address line contains multiple fields, the input fields
are mapped to INPUT_LINE_02a, INPUT_LINE_02b, etc.
5.
A red flag
indicates a
REQUIRED field
for this operation.
Enter file names in the Statistics File Name and Process

Log Name text boxes.
To specify the input/output file qualifiers

A File Qualifier is a unique name given to a data file. Each
input and output data file must have a unique file qualifier.
1.
Click Advanced and navigate to Input, Settings.
2.
Select Input Data File Qualifier (default is INPUT).
3.
Click Advanced and navigate to Output, Settings.
4.
Select Output Data File Qualifier (default is OUTPUT).
You may also specify the following settings:

Tip:
You can either
edit the file names
manually or click
the File Chooser
icon to
browse
for and
select the file.
To specify the NOMATCH file

The NOMATCH file contains records where the Global Data
Router was not able to determine country of origin.
1.
2.
Click Advanced and navigate to Process, Settings.

Locate Nomatch File and specify the file.
To specify the starting record

1.
Process Settings
2.
4-5
Enter a numeric value in Start at Record. This specifies the

record in the input data file at which the Global Data Record
will begin processing (default is 1).
To specify the maximum number of records to process

1.
2.
Enter a numeric value in Process a Maximum of. This

specifies the maximum number of records to process. By
default, all records will be processed.
To process every nth record only

1.
2.
Enter a numeric value in Process Nth Sample. This

specifies that only every Nth record will be processed. By
To use a delimited file

Valid delimiters
are Tab, Space,
Semicolon,
Comma, and
Pipe. Characters
other than those
listed must be
enclosed within
quotation marks.
If you are using a delimited file for input and/or output, you
must specify delimited settings.
1.
2.
Select Input Data File Delimiter Encoding and Input

Data File Delimiter from the drop-down list.
3.
For output, click Advanced and navigate to Output,

Settings.
4.
Select Output Data File Delimiter Encoding and Output

See Encoding (Code Page) on page A-3 for more
information on encoding.
Process Settings
Once you have specified input and output files, you are ready to
specify the settings to process your data. Do this in the Advanced
Settings window.
4-6
Rules Files
Rules Files
The Global Data Router uses two rules files to determine country of
origin. Rules files contain entries that define the resource tables
used by the Global Data Router program, as well as country-specific
data.
Global Rules FileDefines rules that apply to all
countries. It also contains translation tables, street types,
city definitions, and other rules that require lengthy entries.
Country Rules FileDefines rules that apply to specific
countries.
See Global Data Router in the TS Quality Reference
Guide for details of these rules files.
To specify the Rules Files
Tip:
You can either
edit the file names
manually or click
the File Chooser
icon to
browse
for and
select the file.
1.
Open the Global Data Router step.
2.
3.
Locate the Global Rules File and Country Rules File and
specify the files.
Default Global Rules File:
\TrilliumSoftware\tsq10r5s\tables\
general_resources\rtrules1.win
Default Country Rules File:
\TrilliumSoftware\tsq10r5s\tables\
general_resources\rtrules2.win
You can edit the Rules Files. You may also use the
Customer Rules File which allows you to add your own
user-definied rules. See Global Data Router in the TS
Quality Reference Guide for details.
Global Geography Table

In addition to the Rules Files, the Global Data Router uses a Global
Geography Table that contains state, city, locality, post code and
word/pattern structures.
Country Settings
4-7
This table is read-only and may not be changed.

To specify the Global Geography Table
1.
2.
Locate the Global Geography Table and specify the file.

Default Global Rules File:
TrilliumSoftware\tsq10r5s\tables\
general_resources\GLOBRTR.tbl
For China, Japan, Korea, and Taiwan, you must specify

the APGLBRTR.tbl geography file using the Global Geog
APAC File Name settings. If you want to include other
countries such as the US, you must also specify the
regular GLOBRTR.tbl geography file.
Country Settings
If the data has a country code field, you must specify the field name
for the country code. This ensures that the Global Data Router uses
the data in this field to identify and score country of origin.
To specify a Country Code Field
1.
2.
Locate the Country Code Field. Select the appropriate field

name from the drop-down list.
Country
Code
Field
4-8
Country Settings
Figure 4.1 Country Code Field

To review the country list
Make sure the Country List identifies the valid country
choices for your data.
1.
Navigate to the Country List, Country settings. The

Country Names are automatically entered based on your
selection in the Project Create Wizard.
2.
Review the list and confirm that the Country List identifies
the valid country choices for your data.
Country List
Figure 4.2 Country List
Fields Settings
4-9
Fields Settings
You must tell the Global Data Router which fields contain country of
origin codes. When there is no valid country code or the country
code is suspect, the Field Settings will determine which fields the
GDR will inspect.
To specify fields to scan for country of origin data
Navigate to Fields, Field. Select the field name that
contains information for country of origin.
If you have a valid country code field, you can select that
field. This means that the program will only scan that field
for country of origin data.
Figure 4.3 Field Settings
DDL Settings
If you choose, you can specify separate output DDLs for each
country. If this is not specified, the output DDL specified in the
Output Settings will be used.
To specify a separate DDL for each country
1.
Click Advanced and navigate to DDL, Settings.
2.
Select the DDL file for each country from the drop-down list.
4-10
Additional Settings
Additional Settings
You can specify the following additional settings:
See Global
Data Router in
TS Quality
Reference
Guide for the
complete
settings
information.
To enable debug function

1.
2.
Select Enable Debug Output.
3.
In the Debug File text box, accept the default path and file
name, or enter a new file name. Debugging information will
be written to this file.
To count number of records processed

1.
2.
Enter a value in the Sample Count text box. This value

determines how frequently TS Qualty will report while
processing data. The number that you enter is the number of
records that TS Quality will process before printing a
progress report to the screen. For example, if you enter 50,
TS Quality will print a message after processing 50, 100, 150
and so on.
This count will be written to the Process Log file. To
display the Log file, select the Results tab and
navigate to the Process Log tab after the program is
run. The default is always 1.
To specify settings file encoding

1.
In Settings File Encoding, select the correct encoding from

the drop-down list.
Run the Global Data Router and View Results
4-11

To run the Global Data Router and view results
When you click

Run,
TS Quality
automatically
saves your
settings.
To save your
settings
without running
the step, click
Save.
1.
Click OK to close Advanced Settings.
2.
Click Run to run the Global Data Router.

You can also right-click on a step and select Run Selected.
3.
4.
Select OK.
On the Results tab, select the Statistics sub-tab. The
Statistics sub-tab will show the number of records included in
each country-specific file. The NOMATCH file contains any
records where the Global Data Router was unable to
determine country of origin.
Figure 4.4 Global Data Router Statistics

4-12
5-1
CHAPTER 5
Cleansing Your Data
Cleansing Your Data
5-2
After you separate the input data into country-specific data, you can
start the cleansing process. This chapter explains how to cleanse
the data using the Transformer.
Specify the input and output files
Use character translation to convert particular hexadecimal
values
Use field scanning to change field values
Use table recoding to recode the values in a field using a
literal or mask shape
Use conditionals to control the field scan and table recode
settings
Run the Transformer and review the results
Using the Transformer
5-3
Using the Transformer

The Transformer converts input data from one or more files and
formats to a single output, based on fields specified by one or more
Data Dictionary Language (DDL) files. The Transformer lets you
convert and merge records from up to ten input files into a single,
standard format.
The Transformer performs several functions:
Scan data records for defined shapes (masks) and literal
values, and then move, recode, or delete the data
Apply sophisticated conditional logic to perform an unlimited
number of data transformations
Modify field lengths
Recode character fields, based on a user-defined external
table
Identify and separate records that reject the conversion
process so that they can be more closely examined

The Transformer uses the output from the Global Data Router step
as input. If the Transformer step is the first step in your project, it
will use the Input File Name and Input DDL Name specified in the
Project Wizard as the default inputs.
1.
Open the Transformer step and select the Input Settings

tab.
2.
Specify a file name in the Input File Name and Input DDL
Name text boxes.
3.

- OR Cleansing Your Data
5-4
Tip:
You can either
edit the file names
manually or click
the File Chooser
icon to
browse
for and
select the file.
To view the
contents of your
data file,
click the
Data
Browser
icon.
Click Replace. The default file names in the Input Data File
Name and Input DDL Name column are replaced with the
files you just specified.
The Transformer can use up to ten input files
simultaneously.
4.
Navigate to the Output Settings tab.
5.
Specify a file name in the Output File Name and Output

DDL Name text boxes.
6.
Specify a file name in the Statistics File Name and

Process Log Name text boxes.
If you provided the Name and Address data during the
Project Wizard, the output DDL will contain a series of
redefines for the Name and Address data. Redefines map the
customer defined field name to the TS Quality Name and
Address reserved DDL field name. The input fields are
mapped to reserved TS Quality name and address field
names INPUT_LINE_01 through INPUT_LINE_10. If a
name or address line consists of multiple fields, the input
fields are mapped to INPUT_LINE_02a, INPUT_LINE_
02b, etc.
Use the
Dictionary
Editor to view the
contents
of the
DDL file.

input and output data file must have its own unique file
qualifier.
1.
2.
Specify Input Data File Qualifier (default is INPUT).
3.
4.
Specify Output Data File Qualifier (default is OUTPUT).
You can also specify the following settings:
5-5
To specify multiple input files

If you have multiple input files, make sure that the settings
will be applied to your desired input file.
1.
Click Advanced and navigate to Input, Setting.
2.
Select the appropriate input file from the Input Files text
box on top.
Figure 5.1 Transformer Multiple Input Files

3.
Specify your settings for the desired input file.
To specify an exceptions file

1.
2.
In the Exceptions File text box, accept the default file or

specify the path and name of the file that contains
exceptions records. Exceptions records contain data such as
incorrect records or field types.

Valid
delimiters are
Tab, Space,
Semicolon,
Comma, and
Pipe.
Characters
other than
those listed
must be
enclosed by
quotation
marks.
1.
2.

3.

Settings.
4.

Cleansing Your Data
5-6

To specify the origin of record
1.
2.
3.
The File
Source and
Source Field
work together.
If you specify
one of these
values, you
must specify
the other
value. If you
delete one of
these values,
you must
delete the
other value.
4.

In File Source, enter text to specify the origin of the data
file.
Select File Source Encoding from the drop-down list.
Navigate to Output, Settings. In Source Field, select the
DDL field to receive the origin of record you specified in File
Source.

1.
2.

record in the input data file at which to begin processing
(default is 1).

1.
2.


1.
2.
Enter numeric value in Process Nth Sample. This specifies

that only every Nth record will be processed. By default, all
records will be processed.
Using Multiple Input Files to Create an Output DDL
5-7
Using Multiple Input Files to Create an Output

DDL
You can specify up to a maximum of ten (10) input files and their
associated DDLs and use these to create a common output file for
later processing by modules downstream in your workflow. This
process requires that after you specify the input files, you map
input fields from the associated DDLs to a common output DDL file.
To add multiple input files and map
1.
Double-click a Transformer step to open the Transformer

Input Settings window.
2.
In the Input Data File field, type or browse to the input file
you wish to use.
3.
In the Input DDL File field, type or browse to the inpt DDL
file associated with the input data file you specified in Step 2.
4.
Click Add.
5.
Repeat Steps 2-3 until youve added all DDL files you want to
use to create the common output format.
6.
Click the Define Output DDL button (bottom left).
7.
The Define Output DDL dialog appears.
Cleansing Your Data
5-8
Figure 5.2 Define Output DDL dialog

8.
Use the Input DDL drop-down menu to select the DDL file
you want to use to map fields to an output DDL file. The
input DDL fields appear in the left pane and the final output
DDL fields appear in the right-pane.
9.
Use the buttons in the center panel to refine the output DDL
list of fields. You can choose from these options:
Addadds the selected input DDL field to the output DDL
list.
Deletedeletes a selected output DDL field from the list.
Process Settings
5-9
Move Upmoves the selected field in the output DDL list up

one row.
Move Downmoves the selected field in the output DDL list
down one row.
Redefineredefines an input field as a portion of an output
field. Use this option to map multiple input fields to the same
redefined output field.
Consolidateconsolidates an input field with an existing
output field. Use this option when two or more fields have
different names but contain the same data, such as zipcode,
ZIP5, and postal_code.
For Redefine and Consolidate, make sure that the
lengths of the input fields do not exceed the overall
length of the redefined or consolidated output DDL
field.
10.
When you are ready, click Save to save the output DDL field
mapping. When the Transformer step runs, it will create an
output DDL file that uses this mapping.
Process Settings
Once you have specified the input and output files, you can
configure the settings to process your data. The settings for
processing are managed in the Advanced Settings window.
Character Translation
The Transformer lets you convert the original hexadecimal value to
another hexadecimal value.
To convert the hex value
1.
Click Advanced and navigate to Input, Character

Translation.
Cleansing Your Data
5-10
Field Scanning
2.
Specify a value for Input Field Name. This is the field to

which the hex translation is applied.
3.
Specify a value for From Hex Value. This is the original hex
value which will be translated to another hex value.
4.
Specify a value for To Hex Value. This is the hex value to

which the original value is translated.
Field Scanning
The Field Scanning function converts the
values in the field. You can scan values and
then Change, Copy, Cut, and Flag the
values.
Scan and Change

To scan a field and change its value
A red flag
indicates a
REQUIRED
field for this
operation.
1.
Click Advanced and navigate to Output, Field Scanning.

Select the Change tab.
2.
Refer to the following table and specify values for Change in

the Field Scanning window:
Setting
Scan Field
Description
Field in the DDL file that specifies the location in which
to perform the scan.
Scan and Change

Setting
5-11
Description
Field Justification
Specifies how data contained in the field is aligned:

Left/Right Adjust- remove all spaces around the
value, pad the field with spaces, and change
multiple spaces between the value to a single
space.
Left/Right Trim - remove all spaces around the
value and pad the field with spaces
Left/Right Pack - remove all spaces, pack left/
right, and pad the field with spaces
No Justification (default) - no action is taken
Note for Asian Character Data: There is no
distinction between full-width spaces and halfwidth spaces in the Field Justification operation.
Full-width spaces within the text are converted to
half-width spaces.
Scan Format
Indicates the format of the value for which to scan:

either a Literal value (the actual value) or a Mask
value (the shape of the value)
Scan Value
User-defined value for which to scan in a specified scan

field
Change Value
User-defined value that replaces the scan value
Change Occurences Numeric value that indicates how many times to scan
for a value in a particular word or field
Scan Position
The physical location in the field at which to begin

scanning for the value: the exact Beginning of the
field, anywhere in the field (Default), or the exact End
of the field
Scan Level
Indicates whether to scan for a value at either the

Field level or at the Word level
Scan Direction
Indicates the direction of the scan: Right-to-Left or

Left-to-right
Between Substring
String of user-defined characters between which to

scan
And Substring
Ending substring between which to scan
Retain Between
Characters
Whether to retain the scanned-for value between

characters (check box)
Scan Value
Encoding
Specifies the code page used by the scan value
Cleansing Your Data
5-12
Scan and Change

Setting
Description
Change Value
Encoding
Specifies the code page used by the change value
Between Substring
Encoding
Code page used by a string of characters between

which to scan
Example
In this example, the phone number currently has dashes and
spaces. To match more accurately, you should remove the dashes
and spaces from the phone number. To change the phone number
format, scan the Phone field for the Literal value - (a dash) using
the following criteria:
Scan Field
Phone
Field Justification
Left Pack
Scan Format
Literal Value
Scan Value
- (a dash)
Scan Position
Default
Scan Level
Field
Change Value
(two sets of double quotes)
Change Occurrences
A (for All)
These settings will cause the Transformer to scan the Phone field
for the literal value - at the Field level. If the value is found, the
Transformer will left-pack the value and change it to nothing.
Phone Field
207-555-4423
Phone Field
207555442
Two sets of double quotes as the Change Value will change

the value to nothing.
Scan and Copy/Cut
5-13
Scan and Copy/Cut

To scan a field and copy or cut its value
A red flag
indicates a
REQUIRED
field for this
operation.
1.

Select the Copy or Cut tab.
2.
Refer to the following table and specify values for Copy or

Cut in the Field Scanning window:
Field
Description
Scan Field
Field in the DDL file that specifies the location in which to

scan
Target Field
Specifies the field in which to store the scan result
Field
Justification
Specifies how data contained in the field is aligned:

Left/Right Adjust- remove all spaces around the
value, pad the field with spaces, and change multiple
spaces between the value to a single space.
Left/Right Trim - remove all spaces around the
value and pad the field with spaces
Left/Right Pack - remove all spaces, pack left/right,
and pad the field with spaces
Note for Asian Character Data: There is no distinction
between full-width spaces and half-width spaces in
the Field Justification operation. Full-width spaces
within the text are converted to half-width spaces.
Scan Format
Indicates the format of the value for which to scan: a

Literal value or a Mask value
Scan Value
User-defined value for which to scan in a specified field
Retain Scan
Value
When checked, retains the scanned-for value in the target

field
Scan Level
Indicates whether to scan at either the Field level or the

Word level
Scan Position
Indicates the physical location in the field at which to

begin scanning: the exact Beginning of the field,
anywhere in the field (Default), or the exact End of the
field
Scan Direction
Indicates the direction of the scan: Right-to-Left, or Leftto-right
Cleansing Your Data
5-14
Scan and Flag

Field
Description
Scan Capture
Indicates the data to capture, based on the position of the

scanned-for value in the word or field
Word Delimiter
Specifies the delimiter used to separate words within a

field
Between
Substring
String of user-defined characters between which to scan
And Substring
Retain Between
Substring
When checked, retains the scanned-for value between

substrings
Scan Value
Encoding
Word Delimiter
Encoding
Specifies the code page used by the word delimiter
Between
Substring
Encoding
Code page used by a string of characters between which

to scan
Scan and Flag

To scan a field and flag its value
1.

Select the Flag tab.
2.
Refer to the following table and specify values for Flag in the
Field Scanning window:
Setting
Description
Scan Field
Field in the DDL file that specifies the location in which to

scan
Target Field
Field that stores the result of the scan
A red flag
indicates a
REQUIRED
field for this
operation.
Scan and Flag

Setting
5-15
Description
Field Justification Specifies how data contained in the field is aligned:

Left/Right Adjust- remove all spaces left/right to
the value, pad the field with spaces, and change
multiple spaces between the value to single space
Left/Right Trim - remove all spaces left/right to
the value and pad the field with spaces
Left/Right Pack - remove all spaces, pack left/
right, and pad the field with spaces
Note for Asian Character Data: There is no
distinction between full-width spaces and half-width
spaces in the Field Justification operation. Full-width
spaces within the text are converted to half-width
spaces.
Scan Format
Indicates the format of the value for which to scan:

either a Literal value (the actual value) or a Mask value
(the shape of the value)
Scan Value
User-defined value for which to scan in a specified field
Retain Scan Value When checked, retains the scanned-for value in the
target field
Scan Level
Indicates whether to scan for a value at the Field level

or the Word level
Scan Position
Indicates the physical location in the field at which to

begin scanning for the scan value: the exact Beginning
of the field, anywhere in the field (Default), or the exact
End of the field
Scan Direction
Indicates the direction of the scan, either from

Right-to-Left, or from Left-to-right
Word Delimiter
Specifies the delimiters used to separate words within a

field
Flag Value
Specifies the user-defined value for a flag
Between
Substring
String of user-defined characters between which to scan
And Substring
Retain Between
Substring
When checked, retains the scanned-for value between

substrings
Scan Value
Encoding
Cleansing Your Data
5-16
Scan and Flag

Setting
Description
Word Delimiter
Encoding
Specifies the code page used by the word delimiter
Flag Value
Encoding
Indicates the code page used for a flag value
Between
Substring
Encoding
Code page used by a string of characters between which

to scan
Example
For example, to flag the Doctor_flag field in this example, scan the
Title field for the Literal value DR using the following criteria
Literal values are always case sensitive.
Scan Field
Title
Target Field
Doctor_flag
Field
Justification
No Justification
Scan Format Literal Value

Scan Value
DR
Retain Scan
Value
Check on
Scan
Position
Default
Scan Level
Field
Flag Value
These options direct the Transformer to scan the Title field for the
literal value DR at the Field level. If the value is found, the
transformer will retain the scan value (DR) in the source field, and
place the flag value Y in the Doctor_flag field.
Title Field
DR
Doctor_flag Field
Y
Table Recoding
5-17
Table Recoding
The Transformers Table Recoding function converts the values in a
field using an external user-defined recode table. You can recode
literal or mask values.
Mask
Masks are character representations of a data value which define
each character in the data value as follows:
Code
Represents
Represents any letter (a-z, A-Z)
Represents a numeral (0-9)
explicit
Any data value element that is not a number or

a letter is shown exactly as it appears in the
data value, including spaces.
Value
Pattern shown in TS Quality
Jane Smith
aaaa aaaaa
5.00E+02
n.nna+nn
$400.00
$nnn.nn
05/31/2005
nn/nn/nnnn
jane_smith@abc.com
nnnn_nnnnn@nnn.nnn
Example
To perform table recoding

1.
Create a user-defined recode table. You can create a recode

table in any text editor.
2.
Table recoding uses a comma-delimited file with one

column for the original value and a second column for the
recoded value.
You can use any filename or suffix that you want, as
long as the file itself is comma-delimited.
Cleansing Your Data
5-18
Mask
Example
In this example, the Start_date field has a variety of data shapes
and formats, such as 1/1/2005 and 1/01/2005. Create a recode
table as shown to change the mask shapes for the Start_date field,
so that every Start_date has the format of MM-DD-YYYY.
Original Mask
Recode Mask
N = Numeric
Figure 5.3 Sample Recode Table

3.
The table requires a DDL that assigns field names to the two
columns. Create a DDL file that corresponds to the recode
table. For example, a DDL file for the table above would look
like this:
Figure 5.4 Sample DDL for the Recode Table
Mask
You can specify

up to five (5)
fields for Lookup
Table Fields,
Lookup Output
Fields and Recode
Output Fields.
When specify
multiple fields,
separate them by
commas.
5-19
4.
After you create a recode table and associated DDL, click

Advanced in the Transformer step and navigate to
Output, Table Recoding.
5.
Enter a Table Qualifier. A table Qualifier is an unique name

given to a table file. Each table file must have its own unique
file qualifier.
6.
Enter names for the Table File and Table DDL File.
7.
Specify Table File Delimiter.
8.
Specify Lookup Table Fields. These fields are a list of DDL

field names in the recode table where the original values are
described.
9.
Specify Lookup Output Fields. These fields are a list of DDL

field names in the output file where the program looks for
the original values.
10.
Specify the Lookup Output Fields Format: Literal or

Mask.
11.
Specify Recode Table Fields. These fields are a list of DDL

field names in the recode table where the recoded values are
described.
12.
Specify the Recode Table Fields Format: Literal or

Mask.
13.
Specify Recode Output Fields. These fields are a list of

DDL field names from the output DDL which are used to
store the recode value from the recode table.
Below are the sample settings for the Start_date field.

Table Qualifier
TBL1
Table File
datamask.csv
Table DDL File
datamask.ddx
File Delimiter
Comma
Lookup Table Fields
originalmask
Lookup Output Fields
Start_date
Lookup Output Fields Format
Mask
Reocode Table Fields
recodemask
Cleansing Your Data
5-20
Mask
Reocode Table Fields Format
Mask Value
Recode Output Fields
Start_date
These settings tell the Transformer to scan the Start_date

field for Mask, and recode the value according to the recode
table (datamask.csv). After running the Transfomer, every
Start_date will have the format of MM-DD-YYYY.
The .ddx and .csv suffixes are not required for the files
to work, however, we recommend that you use them
to avoid confusion.
14.
You can also specify the following setting:
Setting
Lookup Fields Case-Sensitive
Description
Enables or disables the case-sensitive table
lookup. By default, the lookup is caseinsensitive. For example, Rick will match
either RICK or riCK.
Conditionals
5-21
Conditionals
Conditionals control the flow of TS Quality processes by
performing specific operations on data records, or by running
functions. In the Transformer, the Conditionals function controls all
other functions including character translation, field scanning and
table recoding. The conditionals settings are specified in the
Advanced Settings, Conditionals window. This section explains
the conditionals syntax and sample usage, and then teaches you
how to build a conditional statement.
If you are using translation, recode, and/or scan
functions in the Transformer, you must specify
Conditionals. See Build a Conditional Statement on
page 5-35 for instructions.
In addition to the Transformer, you can use conditional statements
for the following TS Quality modules:
Customer Data Parser
Business Data Parser
File Display Utility
File Update Utility
Set and Selection Utility
Syntax
An IF/ELSE statement is used to describe the condition. The
following syntax must be used to build the conditional statement:
The IF keyword allows you to conduct conditional tests on values in
the field. When conditions are True, the RUN and/or SET keywords
following IF are executed. When condition(s) are False, the RUN
and/or SET keywords following the ELSE keyword are executed. A
conditional statement always closes with ENDIF. Refer to the
Cleansing Your Data
5-22
IF Statement
IF [condition]
RUN [function1]
SET [function2]
ELSE
RUN [function3]
SET [function4]
ENDIF
following table for a list of keywords used in conditional statements.
Table 5.1 Keywords of Conditional Statements
Keyword
Description
IF
Begins a statement. When conditions are True, the action

statements following the IF keyword are executed. Required.
RUN
Precedes action commands.
SET
Precedes assignment commands.
ELSE
When conditions are False, the action statements following the

ELSE keyword are executed.
ELSEIF
When IF conditions are False, ELSEIF condition is evaluated.
ENDIF
Closes out a conditional statement. Required.
IF Statement
The IF statement sets the condition. The IF statement is defined by:
DDL field names
Operators (arithmetic/comparison/logical)
Field value(s)
Literal field values such as Boston must be enclosed in
double quotation marks. Field names and numeric values
do not need the quotation marks. If numeric values such
as 123 are enclosed in the quotation marks, they are
read as literal values instead of numeric values.
RUN/SET Statements
5-23
Example
IF (age > 18 AND state IN (NY, MA) ) OR first_name LIKE *ob
The IF statements can be
nested as long as the
corresponding ENDIF
statement closes out the each
IF statement. See the nested
IF sample at right.
IF [condition1]
IF [condition2]
SET [function1]
ELSE
RUN [function2]
ENDIF
SET [function3]
ELSE
RUN [function4]
ENDIF
RUN/SET Statements
The RUN/SET statement contains the function to perform.
RUN
The RUN statement is defined by:
Function names as defined in the Transformers settings file
Entry ID (list of entries) to be executed (comma-delimited
values or ranges of values)
Example
IF (age > 18)
RUN FIELD_SCANNING(2,3)
RUN CHARACTER_TRANSLATION(3-5)
ENDIF
In the first RUN statement of this example, the numbers in
parentheses (2,3) apply to ENTRY_ID 1 and ENTRY_ID 2 under
FIELD_SCANNING. In the second RUN statment in this example, the
numbers in parentheses (3-5) apply to ENTRY_ID 3, 4, and 5 under
CHARACTER_TRANSLATION.
Cleansing Your Data
5-24
SET
SET
The SET statement takes as arguments:
DDL field names
The equal sign assignment operator (=)
Value or field data arithmetic
Example
IF (age > 18)
SET age = processing_date birth_date
ENDIF
ELSE Statement
The ELSE statement will run certain statements if a specified
condition is False. In other words, you can use an IF/ELSE
statement to define two blocks of executable statements: one block
to run if the condition is True, the other block to run if the condition
is False.
Example
IF (age >
RUN
SET
ELSE
SET
ENDIF
18)
FIELD_SCANNING(2, 3)
age = processing_date birth_date
record_notes = "Invalid"
In this example, if (age > 18) evaluates as True, FIELD_SCANNING

(2, 3) and SET age = processing_date - birth_date are executed. If
(age > 18) evaluates as False, then statement SET record_notes =
Invalid is executed.
ELSEIF Statement
5-25
ELSEIF Statement
A variation on the IF/ELSE statement allows you to choose from
several alternatives. Adding ELSEIF clauses expands the
functionality of the statement so you can control program flow
based on different possibilities.
Example
IF (age > 21)
RUN FIELD_SCANNING(2,3)
ELSEIF (age > 18)
RUN CHARACTER_TRANSLATION(3-5)
ELSE
RUN FIELD_SCANNING(1)
ENDIF
In this example, if (age > 21) evaluates as True, FIELD_SCANNING
(2, 3) is executed. If (age > 21) evaluates as False, the ELSEIF
(age > 18) condition is performed. If ELSEIF condition (age > 18)
evaluates as True, CHARACTER_TRANSLATION (3-5) is executed. If
all conditions (age > 21) and (age > 18) evaluate as False, then the
statement RUN FIELD_SCANNING (1) is executed.
You can add as many ELSEIF statements as you need to
provide alternative choices. However, note that extensive
use of ELSEIF clauses often becomes cumbersome.
Cleansing Your Data
5-26
Operators in Conditional Statements

The following operators can be used in conditional statements:
Table 5.2 Operators in Conditional Statements
Operator
Description
ALL
Perform every defined function entry.

Example: CHARACTER_TRANSLATION (ALL)
ALWAYS
Always returns True; always performs the specified operation.

Example: IF (ALWAYS)
AND
Connects two action statements (both should be True)

Example: IF (age>18 AND gender = M)
OR
Connects two action statements (at least one should be True)

Example: IF (age>18 OR year_of_birth > 1987)
UCASE
Converts a literal value or field data to uppercase to evaluate the IF statement.

Example: IF (UCASE(last_name)=SMITH)
SET last_name=UCASE(NAME)
This example tests the field for the literal of any case combination of SMITH,
and if TRUE, it makes the string in the field uppercase.
LCASE
Converts a literal value or field data to lowercase to evaluate the IF statement.

Example: IF (LCASE(last_name)=smith)
SET last_name=LCASE(name)
This example tests the field for the literal of any case combination of smith,
and if TRUE, it makes the string in the field lowercase.
=
Is equal to
!=, <>
Is NOT equal to
>
Is greater than
<
Is less than
>=
Is greater than or equal to
<=
Is less than or equal to
5-27
Table 5.2 Operators in Conditional Statements

Operator
Description
LIKE
Links a literal with a wild card asterisk (*) in a field that is used to look for a
match. You can place the asterisk before the literal (for example, *LE) to
search for all matches to the beginning of a string, or place it after the literal
(for example LE*) to search for matched endings. You cannot place an
asterisk in the middle of a literal, however, for example L*E.
Example: IF first_name LIKE *OB
IN
Means field value is in

Example: IF house_number IN 1,2,3,4
BETWEEN
Means field value is between

Example: IF house_number BETWEEN 12,34
Sum of
Difference of
||
String concatenation
Divided by
Multiplied by
Cleansing Your Data
5-28
Operators for Asian Characters

In addition to the operators in the previous section, TS Quality
supports a wide range of operators specific to Asian character data.
The following table shows the list of operators that you can use in
conditional logic statements for Asian data.
Table 5.3 Operators for Asian Characters

Name
Description
JTOKATAKANA
Transforms Hiragana characters to full-width

Katakana.
(Japan)
Example:

JTOHIRAGANA
Transforms full-width Katakana characters to

Hiragana.
(Japan)
Example:

CJKTOHALF
(China, Japan,
Korea, Taiwan)
Transforms full-width characters to their half-width

form. This operator automatically processes Katakana
accent marks (dakuten and handakuten)
appropriately.
Example:
Harte-hanks
5-29

Name
Description
CJKTOFULL
Transforms half-width characters to their full-width

form. This operator automatically processes Katakana
accent marks (dakuten and handakuten)
appropriately.
(China, Japan,
Korea, Taiwan)
Example:
Harte-hanks
JKANATOROMAN
(Japan)
Transform Hiragana and full-width Katakana

characters to Hebon style Romaji.
Example:
jouzousho
JROMANTOKANA
Transforms Romaji (Kunrei or Hebon) characters to

full-width Katakana.
(Japan)
Example:
haatohankusu
KTOROMAN
Transforms Korean Hangul characters to their

Romanized forms.
Korea
Example:
daechidong
Cleansing Your Data
5-30

Name
Description
HIRAGANASTOL
Transforms small size yo-on and soku-on to its large

equivalent.
(Japan)
Zenkaku
Hankaku
Example:

CTOTRADCHINESE
Transforms all Simplified Chinese characters to their

Traditional Chinese equivalent.
(China, Taiwan)
Example:

CTOSIMPCHINESE
Transforms all Traditional Chinese characters to their

Simplified Chinese equivalent.
(China, Taiwan)
Example:
CJKTOARABICNUM
(China, Japan,
Korea, Taiwan)
Transforms Chinese number symbols to their Arabic

decimal equivalents.
Example:
150
Please make sure that you are applying this
operator to the field where Chinese numbers
only represent NUMBERS. Otherwise, following
may happen. ---->
Full-width (Zenkaku) and half-width (Hankaku) Japanese Characters
5-31
Full-width (Zenkaku) and half-width (Hankaku)

Japanese Characters
The following list shows Japanese full-width and half-width
characters that can be converted using these operators.
Blank character
Romaji
ABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz
Number
0123456789
Symbol
~ ! @ # $ % ^ & * _ + ` - = { } | [ ] \ : ;
<>?,./
Katakana
Cleansing Your Data
5-32
Full-width (Zenkaku) and half-width (Hankaku) Japanese Characters
Table of Katakana/Hiragana and Their Hebon/

Kunrei Romaji Equivalents
Hira gana
Katakana Hebon
a
i
u
e
o
ka
ki
ku
ke
ko
sa
shi
su
se
so
ta
chi
tsu
te
to
na
ni
nu
ne
no
ha
hi
fu
he
ho
ma
mi
mu
me
mo
ya
Kunrei
a
i
u
e
o
ka
ki
ku
ke
ko
sa
si
su
se
so
ta
ti
tu
te
to
na
ni
nu
ne
no
ha
hi
fu
he
ho
ma
mi
mu
me
mo
ya
Hira gana
Katakana Hebon
Kunrei
ga
gi
gu
ge
go
za
ji
zu
ze
zo
da
di
du
de
do
ba
bi
bu
be
bo
pa
pi
pu
pe
po
ga
gi
gu
ge
go
za
zi
zu
ze
zo
da
di
du
de
do
ba
bi
bu
be
bo
pa
pi
pu
pe
po
sha
shu
sho
cha
chu
cho
ja
ju
jo
sya
syu
syo
tya
tyu
tyo
zya
zyu
zyo
How to Use Operators for Asian Characters
yu
yo
ra
ri
ru
re
ro
wa
n
wo
5-33
yu
yo
ra
ri
ru
re
ro
wa
n
wo
How to Use Operators for Asian Characters

Asian Pacific (APAC) operators:
JTOKATAKANA, JTOHIRAGANA, CJKTOHALF, CJKTOFULL,
JKANATOROMAN, JROMANTOKANA, KTOROMAN, HIRAGANASTOL,
CTOTRADCHINESE, CTOSIMPSCHINESE, CJKTOARABICNUM.
The following are some simple examples of conditional statements
for APAC operators.
Syntax 1
This syntax is used to convert a literal value or field data in the DDL
field.
IF [condition]
SET [DDL field name] = [Operator](DDL field name)
ENDIF
Example 1
In this example, all full-width characters in the INPUT_LINE_01 field
are converted to their half-width form.
IF (ALWAYS)
SET INPUT_LINE_01 =
ENDIF
CJKTOHALF(INPUT_LINE_01)
Cleansing Your Data
5-34
Syntax 2
Syntax 2
This syntax is used to convert a literal value or field data in the DDL
field 2 and compare it against the value in the DDL field 1 to
evaluate the IF statement.
IF [DDL field name 1] = [Operator] (DDL field name 2)
RUN [function]
ENDIF
Example 2
In this example, the program converts the Traditional Chinese
characters in the CUSTOMER_NAME field to Simplified Chinese, and
compares it against the value in the INPUT_LINE_01 field. If that
value is equal to the value in INPUT_LINE_01, it will run
FIELD_SCANNING. If the value is not equal, it will run the
TABLE_RECODING function.
IF INPUT_LINE_01 = CTOSIMPCHINESE (CUSTOMER_NAME)
RUN FIELD_SCANNING(ALL)
ELSE
RUN TABLE_RECODING(ALL)
ENDIF
Build a Conditional Statement
5-35

The Conditional Statements are built in the Conditionals Logic
Builder in Advance Settings.You can specify conditional settings
for your input data or output data.
To build a Conditional Statement
1.
Click Advanced and navigate to Output, Conditionals.
2.
Click Edit Condition to open the Logic Builder window.

Notice that the default setting IF (ALWAYS), RUN FIELD_
SCANNING (ALL) has been specified. This means that the
Field Scanning function will always run for all records.
Figure 5.5 Conditionals Logic Builder
Cleansing Your Data
5-36

3.
Click the
button on the upper right and select your
Qualifiers For Input Data Files from the pop-up list.
4.
Select condition encoding from the Condition Encoding

drop-down list.
5.
In the middle pane, place the cursor after RUN FIELD_

SCANNING (ALL).
6.
In the Key Words box in the lower pane, double-click RUN.

The keyword RUN is inserted into the expression at the
cursor location.
7.
In the Function box in the lower pane, double-click TABLE_

RECODING. The function TABLE_RECODING is inserted
into the expression at the cursor location.
8.
In the middle pane, place the cursor after TABLE_

RECODING and type in the opening parenthesis.
9.
In the Operators box in the lower pane, double-click ALL

and close the parentheses. Your expression should now look
like this:
10.
Click Apply and Close. In this example, the field scanning

and table recoding will always run for all records.
Select or Bypass Records
5-37
Select or Bypass Records

While the Conditionals function is applied to perform specific
operations on a record, the Select/Bypass Records function can
be used to either Select or Bypass input or output records under
certain conditions. The Select/Bypass function uses the Logic
Builder located in the Advanced Settings window.
To build a Select/Bypass Condition
You can use
Select/Bypass
Conditions for
most of the TS
Quality
modules.
1.
Click Advanced and navigate to Input or Output,

Settings.
2.
Select the Select Record Conditions or Bypass Record

Conditions tab.
3.
Click Edit Condition. The Logic Builder window displays.
Figure 5.6 Select and Bypass Condition Logic Builder

4.
In the upper pane, select Condition Encoding from the

drop-down list.
Cleansing Your Data
5-38
Additional Settings
5.
In the list of DDL Fields in the right pane, double-click a DDL

field name. This is the field to which you will apply the select/
bypass conditions.
6.
In the Operators box in the lower right pane, double-click

an operator.
7.
When you have finished, click Apply and Close.

In the example above, LINE_01<18 indicates that only
records in which the value in the DDL field Line_01 is less
than 18 will be included and selected for further processing.
You can use any of the operators for the conditional

statements to create a select/bypass definition. See
Operators in Conditional Statements on page 5-26 for
the conditional operators.
Additional Settings
You can also specify the following additional settings:
See
Transformer
in the TS
Quality
Reference
Guide for the
complete
settings
information.
To include record sequence on output

1.
2.
In File Sequence Field, select the DDL field which will

receive the record sequence number.

1.
Click Advanced and navigate to Additional....
2.
3.
In the Debug File text box, accept the default or specify the
path and file name of the debug file.
4.
Optionally, you can enable the File Trace Key function. If

the File Trace Key is specified (field name), the debug file
uses the value of that field when reporting.
5.
Click Advanced and navigate to Input, Settings. Specify a

DDL field name for File Trace Key.
Run the Transformer and View Results
5-39
For example, if a record has a Field Scan performed on it,

then a line is added to the debug file describing the recode.
The value of the specified field is used to identify the record
in the report. If this function is not used, each record that is
read gets a record number assigned based on the order in
which it was read.
To specify a mask file
1.
2.
In the Mask File text box, specify the path and file name for
the mask file.
To count the number of records processed

1.
2.
In the Sample Count text box, specify the number that

indicates the increment sample of records to read and
attempt to process from an input data file.

1.
2.
In Settings File Encoding, select the appropriate encoding

from the drop-down list.

To run the Transformer and view results
1.
Click OK to close the Advanced settings.
Cleansing Your Data
5-40

2.
When you click

Run,
TS Quality
automatically
saves your
settings.
To save your
settings
without running
the step, click
Save.
Click Run to run the Transformer.

You can also right-click a step and select Run
Selected.
3.
Select OK.
4.
On the Results tab, view the Statistics sub-tab. Notice the

records affected by the Field Scan and Table Recode.
5.
On the Output Settings tab, use the Data Browser

view the Phone and Start_date fields.
6.
Run the inputTransformer step. On the

Results>Statistics tab, review the output statistics using
My Statistics Viewer
and the Spreadsheet Viewer
. Both viewers allow the user to print statistics for other
applications.
7.
View the fields on the Output Settings tab using the Data
Browser
to be sure the scan and recode occurred.
to
6-1
CHAPTER 6
Standardizing Your Data
6-2
In this chapter, you will standardize the name and address elements
using the Customer Data Parser, then standardize the non-name
and address elements using the Business Data Parser.
This chapter explains the parsing logic used to standardize data
elements. You will perform these tasks:
Specify input and output files
Define the settings for the Customer Data Parser and
Use name generation to determine how many additional
records are generated
Set line definitions for input data
Run the Customer Data Parser and Business Data Parser and
view results
For Asia-Pacific countries (China, Japan, Korea, and
Taiwan), the Customer Data Parser identifies and
standardizes the name elements only. Parsing and
standardization of address elements for those countries'
data is performed by country-specific Postal Matchers.
Using the Customer Data Parser
6-3
Using the Customer Data Parser

The Customer Data Parser (CDP) identifies freeform name and
address data. The CDP identifies elements of data from the input
data file using the data in the fields INPUT_LINE_01 through
INPUT_LINE_10.
Only the data in the fields INPUT_LINE_01 through INPUT_
LINE_10 will be parsed.
The CDP uses country-specific tables in order to verify and identify
data according to each countrys postal rules and idioms. Once the
data is identified, output is generated.
The following two field data types are output:
original data
recoded or standardized data
The parsing process is highly table-driven. This allows users to
customize name and address identification for specific business
requirements.
The CDP identifies and standardizes name and address
data elements. To parse non-name and address data
elements, such as product name, use the Business Data
Parser (BDP).
If the CDP cannot identify a piece of data in the record, an
exception is written to the exception file.This file can then be used
to add customized entries to the Parser Definitions Table.
See Chapter 7, Tuning the Parsing Rules for instructions
on analyzing exceptions and customizing the Parser
Definitions Tables.
6-4
Understanding Parsing Logic Flow
Understanding Parsing Logic Flow

The CDP assigns all possible attributes to the input name and
address data. Based on the attributes, the CDP identifies line types
and assigns final attributes based on known patterns. The CDPs
output includes original data as well as recoded or standardized
values. The Customer Data Parser follows this process:
1.
Assign all possible attribute(s) to the word/phrase (tokens) in

the Input Name and Address Area, such as Title-Prefix,
Given-Name1, Surname, etc.
2.
Identify line types according to attribute weights and counts.
3.
Search for known patterns and assign final word/phrase

attributes.
4.
Generate output.
Example
Assume that you have the following name and address data in an
input file:
INPUT_LINE_01
Lexington Drug
INPUT_LINE_02
Ben K Pike MD
INPUT_LINE_03
10 Lois Lane
INPUT_LINE_04
Lexington 02420
In the above example, INPUT_LINE_01 will be defined as a

BUSINESS NAME line because the word Drug has a BUSINESS
definition in the word/pattern table, and because Lexington is the
same as the city name.
INPUT_LINE_02 will be defined as a PERSONAL NAME line
based on its relation to the other lines in the input area and because
of the name definitions found on the line. (A detailed explanation of
how this particular line was processed follows this section.)
How the Customer Data Parser Identifies Business Names
6-5
INPUT_LINE_03 will be defined as a STREET line based on its

relation to other lines in the input area and because of the HSNO
and STR-TYPE attributes found on that line.
INPUT_LINE_04 will be defined as a GEOGRAPHIC line because
of the POST CODE mask found in the line, and because the
combination of POST CODE and CITY are found in the parsing city
table. In this example, the CDP will add the state abbreviation of MA
to the output record.
How the Customer Data Parser Identifies

Business Names
The Customer Data Parser uses the following criteria to
determine if a name is a business name: a line...
Contains at least one word of attribute BUSINESS
Pattern
processing
provides the final
attribute
assignment for a
line, enabling
compound
business and
personal names
to be displayed
on the same line.
Does not contain a word of an attribute of personal nature

(for example, GIVEN-NAME1 or SURNAME)
Begins with the same value as the city and is not further
qualified
Contains a word that uses an apostrophe followed by the
letter s (possessive form)
Contains an unidentified word that consists of all consonants
and is at least four characters long
Does not pass Name Pattern Validation (it will have a reject
name form, but will be stored in the PREPOS business name
field)
Contains more than one comma on the name line
CDP Parsing Process

This section details the specific processing for INPUT_LINE_02.
6-6
CDP Parsing Process
Step 1
Assign all possible attributes
Name
BEN
PIKE
MD
GVN-NM1
1ALPHA
ALPHA
TITLE-SUFFIX
RELATIONSHIP
Street
ALPHA
1ALPHA
TYPE
ALPHA
Geog
COUNTRY
1ALPHA
ALPHA
STATE
First, the CDP assigned all possible attributes for each components
of data in INPUT_LINE_02.
Step 2
Line type and specific attribute assignment
Name
BEN
PIKE
MD
GVN-NM1
1ALPHA
ALPHA
TITLE-SUFFIX
The CDP identified this line as a Name line because it had more
name definitions than street or geography definitions. BEN is no
longer considered a RELATIONSHIP attribute since it is not located
at the END of the name line.
Step 3
Pattern lookup and assign final word/phrase attributes
Name
BEN
PIKE
MD
GVN-NM1
1ALPHA
ALPHA
TITLE-SUFFIX
GVN-NM1
GVN-NM2
SURNAME
TITLE-SUFFIX
Once the CDP identified the line types and the attributes on those
lines, a pattern was created. The CDP then looks the pattern up in
the Parser Definitions Table. If the pattern is found, the recode
value is returned, as in this example. If the pattern is not found, the
CDP will not be able to recode the unknown attributes and it will
CDP Parsing Process
6-7
send the bad name pattern to the parsing exception file for review.
Entry from Parser Definitions Table (using allowable abbreviations):
GVN-NM1 1ALPHA ALPHA TITLE-SUFFIX PATTERN NAME
RECODE=GVN-NM1(1) GVN-NM2(1) SURNAME(1) TITLE-SUFFIX(1)
The numbers after the attributes in the recode line are

referred to as Name Numbers, indicating that the CDP
identified one person on this record.
Step 4
Generate Output
Original Input Data
Standardized Output Data
BUS-NAME:
Lexington Drug
BUS-NAME:
LEXINGTON DRUG
GVN-NAME1:
Ben
GVN-NM1:
BENJAMIN
GVN-NAME2:
GVN-NM2:
SURNAME:
Pike
SURNAME:
PIKE
TITLESUFFIX:
MD
TITLE-SUFFIX:
MD
HSNO:
10
HSNO:
10
STREETNAME:
Lois
STREET-NAME:
LOIS
STREET-TYPE:
Lane
STREET-TYPE:
LN
CITY:
Lexington
CITY-NAME:
LEXINGTON
STATE:
MA
POST CODE:
02420
STATE:
POST CODE:
02420
The CDP can identify name and address elements for many
countries, using country-specific definitions tables. The CDP
identifies up to ten lines (100 bytes each) of input Name/Address
data. It can also identify up to ten names per input record.
6-8
Customer Data Parser for China, Japan, Korea, and Taiwan
Customer Data Parser for China, Japan, Korea,

and Taiwan
The Customer Data Parser (CDP) identifies personal and business
names for China, Japan, Korea, and Taiwan as follows.
Step 1 - Token Identification

The first step in parsing Asian words is to isolate words and phrases
into tokens. Tokens may contain one or more characters (and/or
symbols) that are identifiable as a word or word/phrase element. If
commas or spaces are present, these are used to determine where
one token ends and another begins.
Step 2 - Parsing Definition Table Lookup

The second step is to scan each token against one or more parsing
definition tables (also known as a lookup or word/phrase and
pattern table). This process verifies which tokens are a personal or
business name. It also identifies the surname character(s) and
uncovers new tokens based on the look-up results.
How CDP Identifies Tokens for China, Korea, and

Taiwan
The first step in parsing is to isolate all words and phrases by
breaking up the input field(s) into recognizable tokens. During the
initial scan, the Parser uses commas or space characters in the
input field to determine where one token ends and the next begins.
Example - China
Input data: 22 135800
Initial token results: (1 name token)
How CDP Identifies Chinese and Korean Names
6-9
Example - Korea
Input data: , 973-2 3 , 135-280
Example - Taiwan
Input data: 2 3 106
How CDP Identifies Chinese and Korean Names

The Customer Data Parser (CDP) uses parsing definition tables
(also known as word/phrase and pattern tables) to identify each
name element.
After initial tokens are created, the Parser scans each token against
the appropriate parsing definition tables. During this process, all
word elements that can be further identified as part of a name, for
example, a surname and given name, are created as separate
tokens.
Example - China
Token results: 2 tokens)
Previous Results
New Results
Reasoning
Based on surname lookup
6-10
How CDP Identifies Names for Japan
Example - Korea
Token results: 2 tokens
Previous Results
New Results
Reasoning
Example - Taiwan
Previous Results
New Results
Reasoning

The basic functionality of the Parser consists of the following three
parsing methods.
Personal name parsing (PNP)
Business name parsing (BNP)
Personal/Business parsing (BNP_CLUE)
Personal Name Parsing (PNP)

PNP separates personal names. The Parser separates the input field
into a last name, a first name, a title and an honorific. It is assumed
that the input data contains only the name of one person, but you
can create multiple output records when you encounter multiple
personal names.
Input data:
6-11

Last Name
First Name
Honorific
Business Name Parsing (BNP)

BNP separates business names. The Parser separates the input field
into a business name, a business type and a branch name. You can
create a consistent business name by registering the business name
pattern to the principal business name table. See How to
Customize the Parser Definition Tables for Japan in Chapter 7.
Input data: ( )AB
Business Name
Principle Name
Business
Type
Branch
AB
()
Principal business name table: B,AB, ,
BNP_CLUE (Personal/Business Name Parsing)

BNP_CLUE determines personal/business category and separates
the input record accordingly.
Input data 1:
Input data 2: ( )AB
Last
Name
First
Name
Honorific
Business
Name
Principle
Name
Business
Type
Branch
AB
()
6-12
Zenkaku and Hankaku Parse
Zenkaku and Hankaku Parse

The Parser for Japan can take both zenkaku (full-width) and
hankaku (half-width) fields as input. The zenkaku and hankaku
input fields are specified by the Pr Inp Field Name (zenkaku) and Pr
H Inp Field Name (hankaku) settings in Advanced Input Settings.
You must have zenkaku data in the zenkaku field and hankaku data
in the hankaku field. Then you can specify whether to parse only
the zenkaku field or hankaku field, or both fields using the Field
Type Parsing Mode settings in Advanced Process Settings.
Example:
Zenkaku field
Hankaku field
INPUT_LINE_01
FURIGANA_NAME
Zenkaku/Hankaku Mixed: The Parser cannot

process the fields where Zenkaku and Hankaku data
are mixed (except for spaces).
NULL Mixed: The data with null value in the input
field cannot be processed correctly.
PREPOS
The CDP then passes a comprehensive data block called the
PREPOS (Parser Repository). The PREPOS contains fixed-fielded
character data including error codes, identification indicators, name
information, street information and geographic information. The
Output DDL determines which of these fields are returned to the
Output file.
See Appendix B of TS Quality Reference Guide for a
complete list of PREPOS fields and descriptions.
PREPOS
6-13
Example (PREPOS Fields)
Figure 6.1 Sample PREPOS Fields
6-14

The CDP uses the output from the Transformer step as its input.
Tip:
You can either
edit the file names
manually or click
the File Chooser
icon to
browse
for and
select the file.
To view the
contents of your
data file,
click the
Data
Browser
icon.
Use the
Dictionary
Editor to view the
contents
of the
DDL file.

1.
Open the Customer Data Parser step and select the Input
Settings tab.
2.
Name text boxes.
3.
Click the Dictionary Editor icon and view the input DDL.
4.
The CDP only scans the fields defined as INPUT_LINE_01

through INPUT_LINE_10. These mappings were provided
from the Project Wizard when you specified the Name and
Address data. These are reserved field names and they
represent the Input data to the CDP.
5.
Close the DDL Editor.
6.
Select the Output Settings tab.
7.

8.
Enter a file name in the Statistics File Name and Process


qualifier.
1.
2.
3.
4.
To check the Repository DDL File

1.

2.
6-15
In Repository DDL File, make sure the correct repository

DDL file is specified. This DDL contains the layout of the
PREPOS fields.
Country-specific repository DDLs are provided with the
program. We recommend that you not change these
default PREPOS DDL files. See Appendix B of TS
Quality Reference Guide for a complete list of
PREPOS fields and descriptions.

1.
2.

(default is 1).

1.
2.


1.
2.


1.
2.

6-16
Process Settings
Valid delimiters
are Tab, Space,
Semicolon,
Comma, and
Pipe. Characters
other than those
listed must be
enclosed by
quotation marks.
3.

Settings.
4.

To specify an exceptions file

1.
2.
In the Exceptions File text box, enter or specify the path

and name of the file that contains exception records.
Exception records contain data such as incorrect records or
field types.
To specify a mask file

1.
2.
In the Mask File text box, enter or select the path and file
name for the mask file.
You can specify records to either Select or Bypass under

certain conditions in both input and output files. See
Select or Bypass Records on page 5-37 for instructions
on how to specify select and bypass definitions.
Process Settings
Once you have specified input and output files, you can specify the
settings used to process your data. The settings for processing are
managed in the Advanced Settings window.
The navigation pane of the Advanced Settings window contains
two tabs:
Parser
Prcustom
Parser Tables
6-17
The Parser tab contains settings for the Customer Data Parser. The
Prcustom tab is used to define settings for the Parser
Customization process. The Parser Customization process is
explained in the next chapter.
The settings for China, Japan, Korea, and Taiwan differ
slightly from other countries. Refer to the online Help or
the TS Quality Reference Guide for those countries'
settings.
Parser Tables
The Customer Data Parser uses two table files to parse the name
and address elements of the input data.
Word Pattern Definition FileDefines word patterns for a
given country. It contains standard definitions for words and
phrases (tokens), and the patterns associated with each line
type.
City Directory FileDefines state names, city names, and
postal codes for a given country.
To specify the Parser tables
1.
2.
Specify the Word Pattern Definition File and City

Directory File.
Default Word Pattern Definition File (US):
\TrilliumSoftware\tsq10r5s\<project>\tables
\USCDPDEF.len
Default City Directory File (US):
\USCITY.len
These files are read-only. These tables and parsing city
directories carry a two letter prefix to indicate the
country: (US = United States, CA = Canada, GB =
United Kingdom, DE = Germany and so forth.)
6-18
Business Attribute
Business Attribute
You must specify whether to enable or disable the business
assignment function.
To specify the business attribute
1.
2.
Refer to the table below and select one of these options for
Assigned Business Attribute:
Setting
Description
Automatic
Business
For any word assigned a BUSINESS attribute, the entire

line becomes BUSINESS.
Business via
Pattern
Business, possible business, business-descriptive, and

business-redefine attributes are all turned off. Business
names are generated only from patterns.
No Business
Assignment
Turns off the setting of token meanings of business

attributes and possible business attributes.
Preprocess House Number

The parser normally pre-processes house numbers before
processing street patterns. You must choose whether to preprocess
the house number.
To specify whether house numbers are preprocessed
1.
2.
Refer to the table below and select one of these options for
Preprocess House Number:
Setting
Description
No Preprocessing
Disable preprocessing.
Minimum
Preprocessing
A fractional number like 1 1/2 becomes 11/2. Note

that "1 1/2" becomes a HSNO token (the fraction portion
must be 3 characters in length and include the /).
Line Definitions
6-19
Setting
Description
Maximum
Preprocessing
Option 1: A number like 1 1/2 becomes 11/2. Note

that "1 1/2" becomes a HSNO token (the fraction portion
must be 3 characters in length and include the /).
Option 2: A number like 2420-36 becomes 2420 36
(this option does not work for New York, New Jersey and
Hawaii).
Line Definitions
In this example, the input file has two names on each record. The
first is a business name, and the second is a personal name. The
first input line consists of a business name and the second line is a
personal name (contact name). This is a very common data
structure. You can pre-define these two line types to the CDP, thus
allowing the CDP to work more efficiently.
To set line definitions
1.
On the Advanced settings, navigate to Input, Settings.
2.
Select the Line Definitions tab. No Pre-definition is set by

default. This setting allows the CDP to determine the line
type.
3.
For each Input Address Line, choose one of the following

options:
Setting
Description
Name Line
Input Address Line that contains personal names
Business Name
Line
Input Address Line that contains business names
Street Line
Input Address Line that contains street components
Geography Line
Input Address Line that contains geography

components (neighborhood, city, state, county, and
postal code)
No Pre-definition
None: the Parser determines the line type (default)
Prohibit Name
Line definition
The Parser will not determine the line type
6-20
Generate Name Sections

4.
On Input Address Line 1, select Business Name Line

from the drop-down list. This pre-defines the line as a
business name line.
5.
Select Name Line from the drop-down list for Input

Address Line 2. This pre-defines the line as a name line.
Figure 6.2 CDP Line Definitions
Generate Name Sections

By default, the CDP is set to generate a record for each name
found. If you do not want to generate additional output records and
would like to identify all names found on the input records, you can
create an additional name section so that all names are stored in
the same record. In this case, the output DDL must be modified to
store the information from the second name identified by the CDP:
the consumer name. In this example, you will create two name
sections.
To create two name sections in the output DDL
1.
In the DDL Editor, select Tools, Parser Output DDL

Generator.
2.
In the Country box, select the appropriate country from the

drop-down list.
3.
In the Number of Name Segments box, select the number

of name sections you want to generate.
Name Generation
4.
6-21
Specify the ORIGINAL_RECORD DDL and the Output DDL

file.
Figure 6.3 Parser Output DDL Generator

5.
Select Create.
6.
Select Yes to redefine the section.
7.
Select Yes to see the update.
8.
Select File, Exit to close the DDL Editor.
Name Generation
After the Parser processes the input data, it generates name and
address records. This process is called name generation. In many
cases, one record in the input data contains multiple business or
personal names. You must specify how many records to generate
when more than one business/personal names are found in the
input data.
To define name generation settings
1.
Select Advanced and navigate to Output, Name

Generation.
2.
The right pane of the Customer

Data Parser Output Name
Generation window contains
two tabbed dialog boxes:Field Settings and Entry Settings.
6-22
Additional Settings
3.
Click the Field Settings tab. Refer to the following table and
specify the values for these settings.
Setting
Description
Generate Business
Records for
Additional Names
Numeric value between 0-9 that specifies how many

business records to generate when more than one
business name is present on the input (whether on the
same record or on generated name records).
Generate Personal
Records for
Additional Names
Numeric value between 0-9 that specifies how many

personal records to generate when more than one
personal name is present on the input (whether on the
same record or on generated name records).
Max Original Lines

to Generate Names
For
Numeric value that indicates the maximum number of

original lines for which to generate names. The default
is to process all records.
4.
For example, the settings below instruct the CDP not to

generate additional records for personal or business names.
Figure 6.4 CDP Field Settings
Additional Settings
See Customer
Data Parser in
TS Quality
Reference
Guide for the
complete
settings
information.
To join name lines

You can join the second name line (INPUT_LINE_02) to the
first name line (INPUT_LINE_01) for re-parsing purposes.
Both lines must have a valid pattern identified for this to
work.
1.
Click Advanced and navigate to Input, Join Lines.
Additional Settings
6-23
2.
Specify the From Line Index and To Line Index. From

Line Index is the number (1 to 10) of the name line to be
joined. To Line Index is the number (1 to 10) of the name
line to which the line specified in From Line Index will be
joined.
3.
Specify the From Line Begin Value and To Line End

Value. From Line Begin Value is the character string that
is to be at the beginning of the joined line. To Line End
Value is the character string that is to be at the end of the
joined line.
4.
Select either Literal or Mask for From Line Begin Value

Format and To Line End Value Format. They are the
format for the value specified in From Line Begin Value
and To Line End Value.
To split address lines

You can split the address line before parsing. The CDP works
more efficiently if the two address lines are split into two
physical lines, rather than storing two addresses on one line.
1.
Click Advanced and navigate to Input, Split Lines.
2.
Select either First Occurrence or Last Occurrence for

Split Occurrence. First Occurrence splits on the first
occurrence of the matching From Line End Value and To
Line Begin Value. Last Occurrence splits on the last
occurrence of the matching From Line End Value and To
Line Begin Value.
3.
Specify the From Line Index and To Line Index. From

Line Index is the number (1 to 10) of the address line from
which to split. To Line Index is the number (1 to 10) of the
line where the new line will be inserted.
4.
Specify the From Line End Value and To Line Begin

Value. From Line End Value is the character string that is
to be at the end of the split line. To Line Begin Value is the
character string that is to be at the beginning of the split line.
5.
Select either Literal or Mask for From Line Begin Value

Format and To Line End Value Format. They are the
6-24
Additional Settings
format for the value specified in From Line End Value and
To Line Begin Value.
If an address has ten lines and a split line is perfomed,
then the last line will be dropped.
1.
2.
3.
name, or specify the name of the file to which debugging
information will be sent.

1.
2.


1.
2.

the drop-down list.
Run the Customer Data Parser and View Results
6-25
Run the Customer Data Parser and View Results

To run the Customer Data Parser and view results
When you click

Run,
TS Quality
automatically
saves your
settings.
To save your
settings
without running
the step, click
Save.
1.
Click OK to close Advanced settings.
2.
Click Run to run the CDP.

You can also right-click on a step and select Run
Selected.
3.
Select OK.
4.

Statistics file indicates the number of records read into and
out of the CDP and displays name, street and geographic
review information.
5.
Review the Statistics file. Verify that no additional names

were generated by the CDP. Be sure to use My Statistics
Viewer
to review the
CDP output statistics.
Analyze Results
After running the CDP, the Parser generates Completion Codes
and Review Codes to identify specific conditions which occurred
for each record being parsed. You can review those codes to
analyze the Parser results.
The completion codes are written to the CDP Repository Output
Record (PREPOS) in the following field:
pr_completion_code
The review codes are written to the CDP Repository Output Record
(PREPOS) in three character pairs in the following fields:
pr_name_review_codes
pr_street_review_codes
pr_geog_review_codes
6-26
Statistics File
pr_misc_review_codes
pr_global_review_codes
To change the
review group
order, Review
Group Order
(Process,
Settings) can
be used to
specify the
review group
hierarchy.
When a record receives a review code, Review Groups are also

written to the following field:
pr_rev_group
For multiple review codes, the review group is determined by a
default hierarchy table.
See Appendix B for the complete list of Completion Codes,
Review Codes and Review Group for the Customer Data
Parser.
Statistics File
The Parsing Statistics Report is generated by the CDP and
summarizes the number and percentage of records distributed over
each review group. A brief description of each review group also
appears on the statistics report.
Review Groups# of Records
0
945
1
0
2
22
3
0
4
0
5
0
6
0
7
2
8
4
9
8
10
11
11
1
12
0
13
0
14
3
%
Descriptions
94.5%No Targeted Conditions Found
0.0% Unidentified Item
2.2% Mixed Name Forms
0.0% Hold Mail
0.0% Foreign Address
0.0% No Names Identified
0.0% No Street Identified
0.2% No Geography Identified
0.4% Unknown Name Pattern
0.8% Derived Genders Conflict
1.1% More Than One Middle Name
0.1% Unknown Street Pattern
0.0% Invalid Directional
0.0% Unusual or Long Address
0.3% No City or County Identified
Figure 6.5 Sample CDP Statistics

Using the Business Data Parser
6-27
Using the Business Data Parser

For parsing of
names and
addresses, use
the Customer
Data Parser.
The Business Data Parser (BDP) uses pattern-recognition

technology to identify, verify, and standardize non-name and
address components of free-form text. The parsing process is
driven by business rules that you can customize to meet your
specific business requirements.
Use the Business Data Parser to perform several tasks:
Identify words and phrases in free-form text
Produce standardized and identified output in useful formats
Use customized user-defined attributes
Offers flexibility through an externally-edited set of tables
for business rules
Identify words and phrases by their values or their masks
Correct misspellings and enable word or phrase recodes
using external tables
Categorize any unique words and phrases using
user-defined conditional text
Identify data for review by numerous methods
Produce standard output, so that applications may easily
choose needed data elements
Display results in a log file to use for tuning business rules
Collect run statistics to quickly identify development areas
Produce a log that identifies problems to help refinement of
the external word, phrase, and pattern tables
BDP Parsing Process

The Business Data Parser parses data and identifies patterns
based on the following criteria:
6-28
Step 1
BDP Parsing Process
Assign all possible attributes

The BDP identifies each word and phrase and compares them to the
business rule table supplied by the Parsing Customization
process. When the BDP finds a word or phrase in the table, it
assigns the associated specific attribute for that table entry. For
example:
Attribute
1995
Toyota
Camry
YEAR
MAKE
MODEL
If a word or phrase isnt specified in the table, the BDP

assigns it an intrinsic attribute, such as ALPHA or
NUMERIC.
Step 2
Pattern lookup and assign final word/phrase attributes

The BDP looks up the entire combination of words, called a
pattern, in the pattern list.
If a match to the pattern list exists, then the BDP assigns a
final attribute to all words, based on the pattern.
If no match exists, then the BDP writes the pattern details to
the log file for further review and tuning.
Step 3
Line type and specific attribute assignment

Each line is then assigned a line type. The default line type of M
(Miscellaneous) is assigned to a line unless both of the following
conditions are true:
The line matches a pattern in the word and pattern table,
and
A line attribute for that pattern is defined.
For example:
1995
Toyota
Camry
YEAR
MAKE
MODEL
BDP Parsing Process
6-29
You can assign up to fifty user-defined attributes, named

USER-NN, where NN = a numeric value between 1-50,
inclusive.
Step 4
Generate Output
The BDP produces a comprehensive data block called the BPREPOS
(Business Data Parser Repository). The BPREPOS consists of fixedfielded character data including error codes and identification
indicators. The Output DDL determines which of these fields are
returned to the Output file, and can be customized by the user.
See Appendix B of TS Quality Reference Guide for a
complete list of BPREPOS Fields and descriptions.
Example (BPREPOS Fields)
Figure 6.6 Sample BPREPOS Fields

6-30

Tip:
You can either
edit the file names
manually or click
the File Chooser
icon to
browse
for and
select the file.
To view the
contents of your
data file,
click the
Data
Browser
icon.
Use the
Dictionary
Editor to view the
contents
of the
DDL file.
1.
Open the Business Data Parser step and select the Input
Settings tab.
2.
Name text boxes.
3.
4.

5.


qualifier.
1.
2.
3.
4.
To specify the parser field

You must specify the input DDL field that contains the data to
be parsed.
1.
2.
Select Parse Field from the drop-down list.
To check Repository DDL File

1.
2.
In Repository DDL File, make sure the repository DDL file

is specified. This DDL contains the layout of the BPREPOS
fields.
6-31
The country-specific BPREPOS DDL is provided with

the program.
1.
2.

(default is 1).

1.
2.

Valid delimiters
are Tab,
Space,
Semicolon,
Comma, and
Pipe.
Characters
other than
those listed
must be
enclosed by
quotation
marks.
1.
2.


1.
2.

3.

Settings.
4.

6-32
Process Settings
You can specify records to either Select or Bypass under certain
conditions in both input and output files. See Select or Bypass
Records on page 5-37 for instructions on how to specify select or
bypass definitions.
Process Settings
Once you have specified input and output files, you can specify
The navigation pane of the Advanced Settings window contains
two tabs:
Parser
Prcustom
Settings for the Business Data Parser are shown on the Parser tab.
The Prcustom tab contains settings for the Parser Customization
process. The Parser Customization process is explained in the next
chapter.
Parser Tables
The Business Data Parser uses the Word Pattern Definition table file
to parse the non-name and address elements of the data. The Word
Pattern Definition table for the Business Data Parser is created from
the Parsing Customization process.
For instructions on the Parsing Customization process,
see Chapter 7, Tuning the Parsing Rules and
Appendix B.
Word Pattern Definition FileDefines word patterns for a
given country. It contains definitions for words and phrases
(tokens), and the patterns associated with each line type.
These tables use a two letter prefix to indicate the
Parser Tables
6-33
country: US = United States, CA = Canada, GB =

United Kingdom, DE = Germany, and so forth.
To specify the Parser table
1.
2.
Specify the Word Pattern Definition File.

Default Word Pattern Definition File (US):
\TrilliumSoftware\tsq10r5s\<project>tables
\USBDPRUL.win
Example
For example, you can create a Word Pattern Definitions table for
automobile classification. At least one definition and one pattern
entry must be present in the Word Pattern Definitions table.
Entry from Word Pattern Definitions Table
'ACURA'
INSERT MISC DEF ATT=MAKE
'ALFA'
INSERT MISC DEF ATT=MAKE,RECODE='ALFA ROMEO'
'ALFA ROMEO' INSERT MISC DEF ATT=MAKE
'AMC'
'AUDI'
'BERTONE'
'BMW'
'BUICK'
'CADDY'
INSERT SYNONYM='CADILLAC'
'CADI'
'CADILLAC'
INSERT MISC DEF ATT=MAKE,RECODE='CADILLAC'
'CADY'
'CHEVROLET'
'CHEVY'
INSERT MISC DEF ATT=MAKE,RECODE='CHEVROLET'
Figure 6.7
Sample BDP Word Pattern Definition Table
6-34
Additional Settings
Additional Settings
See Business
Data Parser in
TS Quality
Reference
Guide for the
complete
settings
information.
To retain original values

You can specify whether you want to retain the original input
data in the Parser output field. The field must be defined as
ORIGINAL by the output DDL. If this setting is not checked,
the Parser formats the data as uppercase and removes
erroneous punctuation.
1.
2.
Select Retain Original Value.
To include Unknowns in Standard Original Field

This setting controls whether unknown or undefined tokens
are populated into label lines. When checked, label lines are
populated with the complete input lines, including unknown
or undefined words/tokens. These tokens are standardized
and appear in the same left-to-right order as in the input
line.
1.
2.
Select Include Unknowns in Std Original Field.
To populate unknown patterns

When checked, this setting ensures that the Parser populates
user fields with known attributes, even in the event of a
pattern failure.
1.
2.
Select Populate Unknown Patterns.

1.
2.
Additional Settings
3.
6-35
name, or enter a file name where debugging information will
be written.

1.
2.


1.
2.

the drop-down list.
6-36
Run the Business Data Parser and View Results
Run the Business Data Parser and View Results

To run the Business Data Parser and view results
When you click

Run,
TS Quality
automatically
saves your
settings.
To save your
settings
without running
the step, click
Save.
1.
Click OK to close the Advanced settings.
2.
Click Run to run the BDP.

Selected.
3.
Select OK.
4.

Statistics file indicates the number of records read into and
out of the BDP. It also displays number of records that
contain blank data, other lines, and unknown lines.
5.
Review the Statistics file. Be sure to use My Statistics Viewer

to review the output
statistics of the BDP.
After you run the BDP, the Parser generates Completion Codes
and Review Codes to identify specific conditions which occurred
for each record being parsed. You can review those codes to
analyze the parser results.
The completion codes are written to the BDP Repository Output
Record (BPREPOS) in the following field:
bp_completion_code
The review codes are written to the BDP Repository Output Record
(BPREPOS) in three character pairs in the following fields:
bp_misc_review_codes
See Appendix B for the complete list of Completion
Codes and Review Codes for the Business Data Parser.
There are no Review Groups for the Business Data
Parser.
7-1
CHAPTER 7
Tuning the Parsing Rules
7-2
If the Customer Data Parser cannot recognize the name or address
component such as city name or surname on a record, an exception
is reported. When that occurs, you must change the parsing rules
using Parsing Customization.To use Parsing Customization, you
must first understand how the parser definition tables work.
This chapter explains the parser definition tables. You will also
perform these tasks:
View parser exceptions
Identify and create an entry for a misspelled city name
Identify and create an entry for a bad name pattern
Review the new entries in the Customized Definitions file
Run Parsing Customization and re-run the Customer Data
Parser
Check errors in the Parsing Customization process
This chapter focuses on the Parsing Customization process
for the Customer Data Parser. See Online Help to tune the
parsing rules for the Business Data Parser.
Understanding the Parser Definitions Tables
7-3
Understanding the Parser Definitions

Tables
Standard and User Definitions Tables
For the
Business Data
Parser, the
default
standard
definitions
table is empty.
You must
create a table
for the
Business Data
Parser to run.
The Parser Definitions tables contain both definitions and word/

phrase pattern information.These files are used by the Parser to
identify the components of the input data.
Standard Definitions Table
Standard Definitions tables include all standard definitions
for titles, first names, business names, street components
(type and direction) as well as patterns and masks for other
name and address components. They are supplied with the
program.
Default Standard Definitions Table:
\TrilliumSoftware\tsq10r5s\tables
\parser_rules\xxCDPRUL.win
(xx = 2-digit country code)
Standard Definitions tables are identified by a two letter
prefix to indicate the country. (Example: US = United
States, CA = Canada, GB = Great Britain and DE =
Germany)
Customized Definitions Table
Customized Definitions tables contain user-created
definitions.
Default Customized Definitions Table:
\xxUSERCDP.win (CDP, xx = 2-digit country code)
\xxBDPRUL.win (BDP, xx = 2-digit country code)
7-4
Syntax of Definitions
Entries in Standard and User Definitions tables require a special
syntax.This section describes the syntax for definition entries.
Syntax
TOKEN [OPERATION] LINE-TYPE [POSITION] KEYWORD=ATTRIBUTE, [ATTRIBUTE MODIFIER]
An entry is composed of Token, Operations, Line-type, Position,

Attributes and Attribute Modifiers. Brackets [] indicate the
enclosed item is optional. The brackets [] are NOT typed on an
actual entry line.
Example
MARY
INS NAME
BEG ATT=GVN_NM1, GEN=F
MARY
INS
NAME
BEG
ATT=GVN_NM1
GEN=F
Token
Operation
Line Type
Position
Keyword=Attribute
Attribute
Modifier
Token
A token is any word or phrase in the data, or a mask of any word, or
phrase. Tokens are informally called the left side of the equation
in a definition table entry. In this example, the token is Mary.
MARY
Token entries
can be no
more than 100
characters in
length.
INS NAME BEG ATT=GVN-NM1,GEN=F
Tokens cannot wrap to a second line. This also affects

word and phrase definitions, masks, and pattern entries.
The Parser identifies four different types of token structures:
Token
Token
7-5
Sub-token
Phrase
Mask
The table below describes the token structures and provides
examples.
Table 7.1 Parser Token Structures
Token
The smallest entity that has a meaning by itself.
A token may or may not contain one or more sub-tokens.
Example: 'PIZZA' NAME ATT=BUS
Sub-token
String entity that has a meaning within a token (e.g. strasse;). A sub-token may appear at the
beginning or end of the token.
If your data contained BERGENSTRASSE:
Example: STRASSE STREET END-TKN ATT=STR-TYPE
Where:
STREET
the line type
END-TKN
location of the sub-token within the word (also indicates this is a sub-token)
ATT=STR-TYPE
the attribute assignment for table lookup
Beginning-Token (BEG-TKN)
Used for the sub-token position. This keyword indicates that the sub-token position lies at the
beginning of a token.
Example: STRASSE STREET BEG-TKN ATT=STR-TYPE
Ending-Token (END-TKN)
Used for the sub-token position. This keyword indicates that the sub-token position lies at the
end of a token.
Example: STRASSE STREET END-TKN ATT=STR-TYPE
BEG-TKN and END-TKN are only allowed on street lines. See line types in the following
section for more information.
Phrase
One or more tokens grouped together that have a meaning.
Example: 'HOLD MAIL' STREET ATT=HOLD
7-6
Operations
Table 7.1 Parser Token Structures
Mask
A mask is a description of a word or phrase, using alpha, numeric or special characters to
represent letters, numbers, and special characters. Masks define characters of data elements
using:
n to represent a number (0 -9)
a -z to represent alphabetic letters (lowercase only)
Every character that is not a letter or number is represented by the character itself:
/ (forward slash), @ (at symbol), and so forth.
For example, a mask can define any series of five numerals as a ZIP code, instead of entering
each of the 99,999 possible combinations in the table. This mask entry looks like:
nnnnn
MASK GEOG DEF ATT=POSTCODE
Masks may include special characters if they are part of the word representation. For example,
a mask for a nine-digit ZIP code is:
nnnnn-nnnn
MASK GEOG DEF ATT=POSTCODE
Appendix D in the TS Quality Reference Guide lists the valid token tags for Asia-Pacific
countries.
Operations
The Parser identifies three types of operations:
Insert
Modify
Delete
In this example, the operation is INS (INSERT).
MARY
INS NAME BEG ATT=GVN-NM1,GEN=F
Operations
7-7
Underlined letters indicate allowable abbreviations.

Table 7.2 Parser Operations
INSERT
This operation inserts an entry in a table.
MARY
INSERT NAME BEG ATT=GVN-NM1,GEN=F
If omitted, INSERT is assumed by default.
MODIFY
This operation replaces an existing entry in a table. The original entry is deleted and the
modified entry is inserted.
Example:
MARY MODIFY NAME BEG ATT=GVN-NM1,GEN=F
Modify is used to change definitions in the standard definition table by creating the
entry in the user definitions file. The Parsing Customization process will combine
entries from the two tables into one output (to be used by the Parser).
DELETE
This operation deletes an entry from a table.
Deleting Definitions:
Example:
MARY
DELETE
Deleting Synonyms: With the SYNONYM keyword, you must enter the actual synonym:
Example:
BV
DELETE SYNONYM=BOULEVARD
Deleting Patterns: You must enter the actual pattern followed by DELETE PATTERN.
Example:
GVN-NM1 1ALPHA ALPHA
DELETE PATTERN
7-8
Line Types
Line Types
Each definition entry requires a line type assignment. The Parser
identifies four types of lines:
Name
Note that
attributes do
not cross line
types. For
instance, an
attribute of
GVN-NM1
cannot be
used with a
line type of
STREET.
Street
Geography
Micellaneous
In this example, the line type is NAME.
MARY INS NAME BEG ATT=GVN-NM1,GEN=F
Line Type
Description
NAME
Name of a person or business. Names are usually the first one or two
lines in an address record.
BOOKSTORE
STREET
All descriptions of streets and numeric addressing, including box

numbers, rural routes, and apartment numbers. A street line is usually
in the middle of a record, and may be one or more lines.
LANE
GEOGRAPHY
NAME DEF ATT=BUS,CAT=S5942
STREET END ATT=STR-TYPE,REC=LN
The city, state, postal code, and country in the address. Geography
line(s) are usually at the end of an address record.
MASSACHUSETTS GEOG END ATT=STATE,REC=MA
MISCELLANEOUS
Information that does not fit into the other line types, such as account
name or a comment.
HOLD MAIL
INSERT MISC DEF ATT=HOLD
Positions
7-9
Positions
A token may be defined in relation to its position within the name or
address line. There are three types of positions:
Beginning
Ending
Default
In this example, the position is BEG (BEGINNING).
MARY INS NAME BEG ATT=GVN-NM1,GEN=F

BEGINNING
This includes the first word in a line, any word that follows a title, or any words that appear
before a first name, including the first name. For example, consider the line:
MR JOSEPH SMITH
Every word except Smith is considered to be at the beginning of the line.
DEFAULT
(optional)
When the physical location of the word in the line is irrelevant, Default is used.
A default word may appear anywhere on the line, including the beginning or end. If this
keyword is omitted from the entry, Default is assumed.
ENDING
The last word and any further non-alphabetic characters are the ending of a line. For
example, consider the line
BRIARWOOD ESTATES APT 3
Both APT and the apartment number 3 are considered to be at the end of the line.
7-10
Attributes
Attributes
Attributes (ATT=) are line-specific definitions and assign a specific
meaning to a word or mask shape. The following table lists available
attributes organized by line type.
For the complete list of Attributes, see Appendix D in the
TS Quality Reference Guide.
Note that
attributes do
not cross line
types. For
instance, an
attribute of
GVN-NM1
cannot be
used with a
line type of
STREET.
UserDefined
Attributes
Attribute
Description
Name Line Attributes Attributes used in NAME lines of patterns

Street Line
Attributes
Attributes used in STREET lines of patterns.
Geography Line
Attributes
Attributes used in GEOGRAPHY lines of

patterns.
Miscellaneous Line
Attributes
Attributes used in MISCELLANEOUS lines of

patterns.
If a particular word or phrase does not meet any of the pre-defined

attributes, you may assign it a user-defined attribute. For example:
n-nn-nan
MASK NAME DEF ATT=USER1
Once a user defined attribute is assigned in the User Definitions

table, the corresponding field name must be included in the CDP
output DDL. For instance, if a USER1 attribute is assigned a value in
the User Definitions table, the field name PR_USER_FIELD_01 must
be added to the CDP output DDL.
Attribute Modifiers
Attributes can be further described by various Attribute Modifiers.
The following section lists all definition modifiers that can be used
after the attribute assignment. All modifiers must be separated
from the attribute by a comma. Valid attribute modifiers are
Gender, Category, Function and Recode.
Attribute Modifiers
Gender
7-11
The Gender (GEN=) keyword assigns a gender to a name

component. It applies only to definitions for name lines and is
required if the attribute used is GVN-NM1, 2, 3, or 4.
Valid gender codes:
M = Male
F = Female
N = Neuter (gender unknown)
MARY
Category
NAME BEG ATT=GVN-NM1,GEN=F
The Category (CAT=) keyword is a user-defined, free-form means

of categorizing data elements. Categories should be limited to six
characters (based on assigning multiple categories throughout a
record) with a maximum of 50 bytes per record for all categories.
A category can be any value that may prove useful as a group
during Parsing of name and address components. For example,
assigning SIC codes to company names allows the distribution of
customer business verticals to be analyzed after the parsing
process is complete.
BOY SCOUTS
Function
NAME DEF ATT=BUS,CAT=S8641
The Function (FUNC=) keyword is used when special functions

should be performed on the entry. This keyword specifies a certain
subroutine, and the functions of that subroutine act on the entry.
BOY SCOUTS
NAME DEF ATT=BUS,FUNC=BES01
There are Special Functions used with the FUNCTION

keyword. See Appendix D in the TS Quality Reference
Guide.
7-12
Synonym
Recode
The Recode (REC=) keyword is used to recode the value. The

value assigned after REC= is the value the Parser will assign to the
recode output field when the defined word is encountered on input.
ROAD
STREET END ATT=STR-TYPE,REC=RD
In the above example, the parser recodes the word ROAD to RD.
ROAD would be the value stored in the original data field
on Parser output. This is the pr_street_type1_original
field in the Parser repository.
RD would be the value stored in the recoded data field on
Parser output. This is the pr_street_type1_recoded field
in the Parser repository.
Recode for
Masks
Masks may be used to introduce and/or exclude literals and special

characters in their recodes. For example, a mask for a telephone
number is entered in this manner:
nnn nnn-nnnn MASK MISC DEF ATT=IGN,REC=(nnn)nnnnnnn
This entry would recode the entry 978 663-9955 to (978) 6639955.
Synonym
A synonym is a shortcut for defining a token entry with the same
value as a prior entry. For example:
PBOX
SYNONYM=PO BOX
This entry identifies PBOX as a synonym of PO BOX.

Synonyms are used to correct common spelling errors. The two
fields in the Parser affected by synonym entries in the definitions
table are called the original and recoded output fields.
Synonym
7-13
It is important to understand the behavior of synonym entries in

conjunction with the recode entry of the resulting definition in
Parser output. See the example below.
Example
The definitions table contains the following entry:
'CENTRE COMMERCIAL'
STREET DEF ATTRIBUTE=TYPE,

REC=CCAL
The Parser knows this entry is a TYPE, with a recode value of

CCAL.
Parser puts CENTRE COMMERCIAL in the original output
field pr_street_type1_original and
puts recoded value CCAL in recoded output field pr_
street_type1_recoded
Now a synonym entry is added to use the original definition entry:

'CENTRE COMMERC'
SYNONYM=CENTRE COMMERCIAL
The Parser knows that this is a synonym for CENTRE

COMMERCIAL. It places CENTRE COMMERCIAL (NOT CENTRE
COMMERC) in the original data output field, and places the recoded
value of CCAL in the recoded data output field. This ensures that
you have the correct spelling in the entry.
Manage this behavior through the Retain Original Data settings
(click Advanced, Process, Settings in the Customer Data Parser
step). If this setting contains a value of 1, the original data output
field would contain the original value (not the synonym value as
shown above). See page 6-32 for information on Retain Original
Data settings.
7-14
Special Entries
Special Entries
In addition to the basic syntax described in the previous sections,
the Parser uses some special entries. This section explains special
entries including:
US city name changes
Non-US city name changes
Multiple definitions for one entry
Patterns
US City Name Changes

City name change entries are entered with an underscore (_) as
the last character of the entry. This notifies the Parser that this is a
city-change, and tells the program to look up the recoded entry in
the City Directory Table.
This directory is used for city verification and correction, and is
based on a primary geography, secondary geography lookup (such
as state or city).
Example
MABEVERLEY_ GEOG DEF ATT=CITY-CHG,
REC=MABEVERLY
CASAN FRAN_
GEOG DEF ATT=CITY-CHG,

REC=CASAN FRANCISCO
Non-US City Name Changes

Countries other than the US use another level of city-name
changes. It allows for additional city verification and correction
based on a complex City Directory Table. An underscore is required
7-15
as the last character of the entry.

Level
Description
Post
Town
In this example, the program looks up Cheltenham as a valid post town in

Gloucestershire county. Note that the recode contains only the corrected
spelling of the post town.
GLOUCESTERSHIRE CHELTENHAN_ GEOG DEF ATT=CITY-CHG,
REC=CHELTENHAM
Locality
In this example, the program looks up Gotherington as a valid locality in the

post town of Cheltenham. Note that the recode contains only the
corrected spelling of the locality.
CHELTENHAM GOTHERINGTEN_
GEOG DEF ATT=CITY-CHG,

REC=GOTHERINGTON

Occasionally, an entry may contain multiple meanings. This is often
the case when a word has a meaning for more than one line type.
The first definition is entered in the standard way previously
described. Subsequent definitions must be INDENTED under the
initial operational value.
CENTER
NAME DEF ATT=BUS
STREET END ATT=SEC-TYPE,REC=CTR
GEOG DEF REC=CENTER
Note that for Geography definitions, tokens are allowed
without an attribute.
Patterns
A pattern consists of attributes and/or intrinsic attributes, which
include any alpha, numeric, or special character representation of a
data element.
Changes can be made to an existing pattern by adding another tag
to the first line, using the MODIFY operation.
7-16
Patterns
'ALPHA ALPHA' MODIFY PATTERN NAME
REC=GVN-NM1(1) SRNM(1)
See MODIFY on page 7-7 for details.
Token identification is converted into meaningful information
through pattern processing. Patterns are created in the same text
file as the Definition entries. The Parser understands the difference
between a definition and a pattern and processes each
appropriately. Because of this, it is not necessary to create the
various entries in any particular order. For organizational purposes,
however, it makes sense to organize the entries by type.
Pattern
Structure
The pattern structure uses one or two lines, using the following
structure.
FIRST LINE:
Inbound combination of
tokens
This is the combination of attributes the Parser program will

attempt to find in the table. If the exact match of attribute
combination is found, the program changes the attribute
values on output to match the values defined in the RECODE
portion of the pattern (see following information on
RECODE).
In this example, two words containing letters only are
present, such as two names. The actual data entry could be
John Smith and the required association to the pattern
would be:
John
ALPHA
Smith
ALPHA
Here, both words are identified as ALPHA attributes. See the

section Intrinsic Attributes for more information. This
portion of the entry must be enclosed in single quotes:
ALPHA ALPHA
Keyword indicating this is a pattern
ALPHA ALPHA PATTERN NAME
Intrinsic Attributes
Keyword indicating to which line
type this pattern entry applies
7-17
ALPHA ALPHA PATTERN NAME
Valid line type keywords

NAME
STREET
MISC
SECOND LINE: (Optional: both sets of elements can be on one line.)
The recode keyword followed by an
= symbol
The attribute values that follow this keyword redefine the

tokens from their inbound values.
The outbound pattern recode

values
This is the combination of attribute values the Parser will use

on output for the data provided on this line. The values that
follow the recode value must be enclosed in single quotes.
Name lines require the name number following each attribute
name. Please see the section on Constructing Name
Patterns for details.
An intrinsic attribute is one that represents an individual entity
that did not have a definition entry in the table. This table lists the
main intrinsic attributes used for patterns.
For the complete list of Intrinsic Attributes, see Appendix D
in the TS Quality Reference Guide. .
Only the inbound portion of the pattern entry may contain
intrinsic attributes. All outbound portions (recode line)
must contain only non-intrinsic attribute values.
INTRINSIC
ATTRIBUTE
ABBR.
DESCRIPTION
ALPHA
Letters only
HYPHEN
A hyphen ()
7-18
Controlling meanings when a sub-token is present
INTRINSIC
ATTRIBUTE
NUMERIC
ABBR.
DESCRIPTION
Numerals only
Controlling meanings when a sub-token is

present
Assume your data contains BERGENSTRASSE 12. A definition
entry might exist in this format:
STRASSE
STREET ENDING-TOKEN ATT=STR-TYPE-S
The following pattern is required in order to separate the subtoken

from the word:
ALPHA STR-TYPE-S
NUMERIC PATTERN STREET
REC=STR-NM STR-TYPE HSNO
Or, the following pattern is required in order to keep the subtoken
attached:
ALPHA
NUMERIC PATTERN STREET

REC=STR-NM HSNO
Assigning a Line Type Through a Pattern

Line Type
A
Apartment or house name lines can be set to line type A by the

Parser to represent an apartment line. This allows separate storage
of street components in the Parser output, such as street name,
house number, and apartment or house name information. Do this
Constructing Patterns for Name Lines
7-19
simply by adding ATT=APT on any street pattern definition, as in:

'ALPHA COMPLEX-TYPE ALPHA-1NUMERIC' PATTERN STREET ATT=APT
REC='COMPLEX-NAME COMPLEX-TYPE APT-NUM'
If the above pattern had been entered as just a street pattern

(without using the APT attribute) then the following would have
occurred:
Original data:
HAWTHORNE COTTAGE
B1F
10 MAIN STREET
Original street only

pattern:
(Z) HAWTHORNE COTTAGE
B1F
(S) 10 MAIN STREET
New pattern:
(A) HAWTHORNE COTTAGE
B1F
(S) 10 MAIN STREET
Where the Z line sets all data to IGNORE attributes and no

individual storage of the tokens occurs, the A line identifies the
tokens properly and parses them into the appropriate Parser output
fields:
pr_dwelling1_number
pr_complex1_name_recoded
pr_complex1_type_recoded

Unlike street patterns that simply convert an inbound attribute
combination to another version on output, name patterns perform
an additional function. They often contain multiple individual names
on the same line. In some cases, only one last name may have
been given along with three first names, and it is implied that the
last name should be associated with all three first names.
One of the powerful features of the parsing engine uses parsing
customization pattern structures to understand these relationships.
Assume you have this record:
JOHN SMITH & MARY & ROBERT
7-20

There are three individuals given but only one last name. In order
to ensure that each first name receives a last name on output, a
pattern can be constructed to perform this association:
GVN-NM1 ALPHA CTR GVN-NM1 CTR GVN-NM1 PATTERN NAME
REC=GVN-NM1(1) LAST(123) CTR(2) GVN-NM1(2) CTR(3)
GVN-NM1(3)
The numbers in the parentheses following each attribute value in

the recode line indicate the physical name to which that particular
token value is associated. For the last name attribute, the values in
parentheses indicate that this token is associated with all three
individuals.
Conventions in Parsing Customization
7-21

This section lists elements the user needs to be aware of in order to
ensure that the Parsing Customization process functions properly.
Comment
Lines
Comment lines are specified in entries in two different ways:

using an asterisk (*) in column 1
AARON NAME BEG ATT=GVN-NM1,GEN=M
* Gender is required with a GVN-NM1 attribute.
using a double forward slash (//) on the same line as the
entry.
AARON NAME BEG ATT=GVN-NM1,GEN=M // Gender is required
with a GVN-NM1 attribute.
Everything following the // will be ignored.
There must be a space after the double forward slash for
the comment to be valid.
Comments may only contain alpha-numeric characters.
Line
Lengths
Table entries longer than one line may span multiple lines. Each
additional line within each entry must be indented. Each new entry
must begin in column 1.
The maximum line length for entries is 189 characters, including the
newline character. The entry definition length may not exceed 100
characters. Components of an entry may not exceed more than one
line.
Quotation
Marks
Entries enclosed by single quotation marks are processed as one

entity. If you wish to include a single quotation mark within a
SYMBOL or VALUE, use double quotation marks.
Double quotation marks () specified within a SYMBOL or within a
7-22

VALUE are converted to single quotation marks() by the system.
For example:
OBRIENNAME END ATT=SRNM
If a recode string contains more than one word, the entire string
must be entered in single quotes.
AS TRUSTEE FOR
SYNONYM=TRUSTEE FOR
MEBAR HARBER_ GEOG DEF ATT=CITY-CHG,

REC=MEBAR HARBOR
How to Customize the Parser Definition Tables for Japan
7-23
How to Customize the Parser Definition

Tables for Japan
For Japan, special Parser Definition tables are used by the Parser in
addition to the built-in personal and business name dictionaries.
There are two types of tables (Clue Tables and Name Tables) and
they are stored in the ..\tables\aptables\ directory. If the Customer
Data Parser cannot recognize the name component on a record, you
can create an entry to those tables.
Clue Table
The Clue table (jp_clue.txt) is used to store keywords that the
Parser uses to separate input text into tokens and to determine
business/personal classification. You can customize this table. The
following types of keywords are included in the Clue table.
Table 7.3 Tokens for jp_clue.txt
Token Type
Item
Description
Business Keyword
T
Business Type
Words to describe business type.

Ex. ,.
Business Name
Parse as business name if this token is

found at the beginning of the string
(excluding business type).
Business Name Suffix
Words such as , .
Branch Name
It can be a branch name by itself.

Ex. , .
Branch Name Suffix
Usually this token is merged into the

previous token and constitutes a branch
name.
Ex. , .
7-24
Clue Table
Table 7.3 Tokens for jp_clue.txt
Business Keyword
Words that can be part of business

name or branch name.
Ex. , .
Honorific
Words for honorific.

Ex. ,
Title (position)
Words for Title.

Ex. ,
Region
Ex. ,
Format
The table consists of the following 4 items. The delimiter for each
item is a comma. If the format is not correct, that line will be
ignored and the subsequent lines will not be recognized properly.
Table 7.4 Format for jp_clue.txt
Position
Item
(Not set) NULL
Token type
Not allowed
Zenkaku field
Allowed
Hankaku field
Allowed
User comment
Allowed
Clue Table
7-25
Example:
D, , , user comment
T,( ),( )
If the user comment is null, the comma between the third
item and the fourth item can be omitted.
Input
After token separation
Output
Business
type
(T)
Unknown
word
Branch
name
(D)
Business
type
Field
Business name
Field
Branch name
Field
In this case, in the input text

matches one of business type keywords (T type), and
matches one of branch name keywords (D type), therefore the
token type for the each word was determined.
In the final output, the unknown word was
recognized as business name and each word was written out in the
proper output field.
If there are unregistered keyword found, the users can add that
word to this table.
Duplicate words: when you register the new keyword,
try not to register duplicate words in different type.
Character Code: use CP932 for registration.
Words with Spaces: The only keyword that can include
7-26
Name Tables
spaces is N type. If you register the keyword that includes
spaces, delete all spaces before and after the entry and
change all spaces within the entry to one hankaku space.
Ex. N, ,Hart Hanks
Name Tables
Name tables contain additional personal and business names that
are not included in the personal and business name
dictionaries.They also include principal business names. You can
customize these tables.
Table 7.5
List of Name Tables
File
Description
jp_bnp_name.txt
Contains business principal name patterns and business type standard patterns (for zenkaku field)
jp_bnp_name_h.txt
Contains business principal name patterns and business type standard patterns (for hankaku field)
jp_pnp_name.txt
Contains last and first names that are

not in the name dictionary (initial status
is blank)
jp_bnp_name.txt
7-27
jp_bnp_name.txt
This table is used to register principal business names and principal
business type for zenkaku field. It is not used to separate business
name and business type.
Table 7.6 Token for jp_bnp_name.txt
Token Type
Item
Description
Business name
This is used to obtain principal business

name for the business name field, and
write out the principal business name in
the output field.
Business type
This is used to obtain principal business

type for the business type field, and
write out the principal business type in
the output field.
Format
The table consists of the following 4 items. The delimiter for each
item is comma.
Table 7.7 Format for jp_bnp_name.txt
Position
Item
Not set (NULL)
Token type
Not allowed
Business name
Not allowed
Principal name
Not allowed
User comment
Allowed
7-28
jp_bnp_name_h.txt
Example:
B,JR ,
T,, ,
If the user comment is null, the comma between the third
item and fourth item can be omitted.
Input
JR
Output
Business
type
Field
Principal
business type
Field
JR
Business
name
Field
Principal business
name
Field
Branch
name
Field
By standardizing the business data using this table, you can achieve
more accurate matching.
jp_bnp_name_h.txt
This table is used to register principal business names and principal
business type for hankaku field. It is not used to separate business
name and business type. The usage and function of this table is as
same as jp_bnp_name.txt except that the field for this table is
Kana.
jp_pnp_name.txt
7-29
jp_pnp_name.txt
This table is used to register additional personal names. If you
found last names or first names that are not in the personal name
dictionary, you can add them in this table.
Table 7.8 Token for jp_pnp_name. txt
Token Type
Item
Description
Last name
Register additional last names.

Reading of Kanji name can be
registered.
First name
Register additional first names.

Reading of Kanji name can be
registered.
Format
This table consists of the following 5 items. The delimiter for each
item is comma.
Table 7.9 Format for jp_pnp_name.txt
Position
Item
Not set (NULL)
Token type
Not allowed
Last name or First name (zenkaku).
Allowed
Last name in Kana or First

name in Kana (hankaku)
Allowed
When parsing zenkaku first/last
name, use this field as reading
of the name.
Not used
Allowed
User comment
Allowed
7-30
jp_pnp_name.txt
Example:
F, , ,,user comment
Words with Spaces: for integration purposes, small
characters must be converted to large characters when
adding hankaku kana last and first names.
Using the Parser Customization Editor
7-31
Using the Parser Customization Editor

Parsing Customization is the process of creating entries for words
and phrases in the Customized Definitions Table. Those entries are
created using the Parser Customization Editor. After the new
entries are created and saved, you must re-run the Customer Data
Parser to apply the new parsing rules.
View a Standard Definitions Table
For detailed
information
on the Parser
Customization
Editor, see the
Online Help.
Before making entries to the Customized Definitions Table, take a

look at the Standard Definitions Table to see how it is constructed.
Standard Definitions Table vary from country to country.
To view a standard definitions table
1.
Open the Customer Data Parser step and click

Customization Editor. The Parsing Customization
Editor opens.
Customization
Editor button
Figure 7.1 Opening Customization Editor

2.
Select File, Open Standard Definitions.
7-32
View a Standard Definitions Table

3.
Locate the Standard Definitions Table in the Open dialogue

box: for example, c:\TrilliumSoftware\tsq10r5s\tables
\parser_rules\USCDPRUL.win.
4.
Click Open. The Standard Definitions table for the selected

country appears.
Figure 7.2 Standard Definitions Table (US)

5.
From the Main Menu select Search, Find Entry to review the
entries in this file.
6.
Select File, Exit to leave the Customization Editor.
View and Correct City Problems
7-33

City problems are reported to the exceptions file any time a US city/
state combination cannot be verified. The usual cause is a
misspelled city name.
To view and correct city problems
1.
Open the Customer Data Parser step and click

Customization Editor. The Parsing Customization
Editor opens. When the Customization Editor opens, the
country-specific Customized Definitions file and the country
specific Word/Pattern Problems file will also open.
2.
The left window of the Customization Editor contains a

Navigation area which allows the user to move from
Customized Definitions to specific Word/Pattern Problems.
Click Customized Definitions in the Navigation area. The
screen will show the current customized definitions file,
which is empty by default.
3.
Click below the line of asterisks. This will position your cursor
to enter customized definitions.
Be sure to position the cursor below the line of
asterisks before applying an entry.
7-34
Navigation Area
Cursor position
Cursor Position
Figure 7.3 Parsing Customization Editor

4.
Click US City Problems in the Navigation area. The screen

will display city problems found in the US data.
Figure 7.4 US City Problems

The Frequency column lists the number of times this city
problem occurred, followed by the percentage of total
7-35
occurrences this entry represents. Zip, State and City data

is listed as it appears on the input record.
In this example, the cities FAIRBANK and BAR HARBOR
are misspelled. The right side window displays the record
number(s) for the selection.
5.
Right-click FAIRBANK to start the new entry process. The

cursor appears in the Input Correct City Name box.
Cursor position
Figure 7.5 New Entry Box

6.
Enter the correct spelling of this city as FAIRBANKS. Click

OK. The two letter state abbreviation followed by the
corrected city name will appear after the RECODE = in the
New Entry box.
Figure 7.6 Input Correct City Name

7-36

7.
Click Apply. The entry will be added to the Customized

Definitions file wherever the cursor is positioned in the
Customized Definitions file.
8.
In the Navigation area, click Customized Definitions to

view the new entry. The entry would look like this:
Figure 7.7 US City Entry in the Customized

Definitions File
If you
accidentally
hit the Apply
button or if
an entry is
incorrect, you
can modify or
delete entries
directly in the
Customized
Definitions
file.
9.
In the Navigation area, click US City Problems. Repeat the

correction steps for the city BAR HARBER.
10.
Click Apply. In the Navigation area, click on Customized

Definitions to view the new entry.
Figure 7.8 Multiple US City Entries in the

Customized Definitions File
View and Correct Pattern Problems
7-37

Bad name patterns occur when data that the CDP cannot identify
appears on a name line. Any pattern of data that cannot be
completely identified is written to the exceptions file for review.
To view and correct pattern problems
1.
In the Navigation area, click Bad Name Patterns.
2.
Click on the first Bad Name Pattern. The data appearing in

the lower portion of the screen corresponds to the bad
pattern selected. If the Frequency for the pattern is 2 then
the data for the two corresponding records is displayed in the
Pattern Examples window.
Unknown attribute
Figure 7.9 Bad Name Patterns

3.
To correct the Bad Name Pattern you must change any

unknown attributes to a known attribute. Unknown attributes
are displayed in red.
4.
To change the unknown attribute, right-click on the attribute

name (ALPHA). A pop-up list of possible attributes will
appear.
7-38

5.
If there are
elements of
data that you
do not wish
to maintain,
assign an
IGNORE
attribute to
the piece of
data.
Double-click on the desired attribute (for example,

SURNAME) in the list. The ALPHA attribute will be replaced
with SURNAME and will appear in italics and blue.
Corrected attribute
Figure 7.10 Corrected Name Pattern

6.
Click the Confirm button

to verify the entry before it is
placed into the Customized Definitions file.

7.
7-39
Click Apply to add this pattern to the Customized Definitions.
Confirm button
Figure 7.11 Name Pattern New Entry

If there are
multiple
names on
one name
line, the
number in
parentheses
will
determine
which data
element goes
with which
person.
8.
In the Navigation area, click on Customized Definitions to

view the new entries.
Figure 7.12 Complete New Entries in the Customized

Definitions File
7-40
Save the Entries
Save the Entries

After the correction have been completed, save the entries in the
Customized Definitions file. These entries will be merged with the
Standard Definitions entries before the Parser step is run.
To save the entries
1.
Select File, Save from the Main Menu.
Re-Run Customer Data Parser

After the new entries are created and saved, you must re-run the
Customer Data Parser to apply the new parsing rules.
To apply new parsing rules
1.
Run the Customer Data Parser step. When asked Would you
like to run parsing customization prior to running the
step?, select Yes.
2.
The Customized Definitions will be merged with the Standard

Definitions and the Parser will run using the complete set of
parsing rules.
3.
When the Parser step has run, click on the Customization

Editor button. Navigate to US City Problems and then to
Bad Name Patterns. Notice that the exceptions are no
longer displayed. The entries in the Customized Definitions
file have instructed the Customer Data Parser on how to
handle these situations.
4.
Close the Customization Editor.
View Errors in Parsing Customization

When the Customer Data Parser has run, any errors in the Parsing
Customization process will be identified with the following message:
7-41
Figure 7.13 Parsing Customization Error Message

If you get this message, view and correct the errors using the
following steps:
To view errors in Parsing Customization
1.
Open Customization Editor and select File, Open Error

log.
2.
The log displays the error message and indicates the line
number, as well as the entry where the error occurred. A
sample error log is shown below:
Figure 7.14 Parsing Customization Sample Error Log

7-42

3.
To correct errors, edit the entry in the Customized Definitions

file. The error in Figure 7.14 indicates that the entry was
duplicated in the Customized Definitions file. One entry
should be deleted from this file.
4.
Save the Customized Definitions file and re-run the Customer

Data Parser.
8-1
CHAPTER 8
Analyzing Single Data
8-2
Sometimes users need to test and analyze the results of cleansing,
standardization and linking on a single data record. TS Quality
Analyzer allows the user to parse, geocode, and match name and
address data interactively. It is a useful way to test and view
modifications you make to the parsing rules.
Start the TS Quality Analyzer
Input name and address data
View the cleansed results
Show details for name/address parsing and standardization
Show details for address validation
Match data against your database
Review results of matching
The TS Quality Analyzer is not available for Asia-Pacific
countries.
Using the TS Quality Analyzer
8-3
Using the TS Quality Analyzer

The TS Quality Analyzer processes a single data record for a
specific country. The Analyzer processes each countrys data using
the appropriate parsing and geocoding tables. If you have changed
the Customer Data Parsers parsing rules, using the TS Quality
Analyzer is a particularly effective way to test the new rules.
Use the TS Quality Analyzer for several functions:
Review details for name/address parsing and standardization
Review details for address validation
View Customer Data Parser and Postal Matcher details for
name/address record data
View Customer Data Parser Review Group descriptions
View Postal Matcher Return Code descriptions
Add the record to a database file for interactive reference
matching
Match a transaction record to records in database file
See Linking Single Record Using the TS Quality Analyzer
on page 14-14 for reference matching processing.
Start the TS Quality Analyzer

To start the TS Quality Analyzer
1.
From the Tools palette, select TS Quality Analyzer.
2.
In the Select a Country window, select the country you wish

to work with. Click OK.
8-4
Data Entry and Cleansing

3.
The TS Quality Analyzer application opens.
Main Menu
Tool Bar
Figure 8.1 TS Quality Analyzer

There are two tabs in the TS Quality Analyzer:
Standardization - The Standardization tab is used for
cleansing, parsing and postal matching processes. Customer
Data Parser and Postal Matcher will be automatically run for the
selected record and standardization results will be displayed.
Matching - The Matching tab is used for the matching process.
Relationship Linker will be automatically run for the selected
record and matching results will be displayed.
You must first run the Standardization and then run the
Matching. The Matching process takes as input the
cleansed data generated by the Standardization process.

To enter a new record and cleanse the data
1.
Select the Standardization tab.
8-5
2.
Select File from the main menu to select the input and
output mode for the record. For input, select from Input
Mode, Input Fields or Free Form Input. For output, select
Output Mode, Output Fields or Free Form Input.
3.
If you select Input Fields mode, enter the record line by line.
If you select Free Form mode, you can enter the record in
free text format.
4.
Enter the new record data in the Input frame.
To clear the
Input frame,
click Clear or
select Input
from the
Reset menu.
Input Fields mode
Free Form Input Mode
Figure 8.2 Input Mode

5.
Click Cleanse to parse and geocode the data.

8-6

You can also click the Cleanse button
bar.
6.
in the tool
The cleansed data will appear in the Cleansed window on

the Standardization tab.
Figure 8.3 Cleansed Data

7.
Look at Customer Parser Message and Postal Matcher

Message under the Cleansed window. These messages
indicate whether the data entered is valid or not.
If the data is valid, you will see the following messages:
If the data is invalid, you will see messages like this:
8.
Click Show Details to see the parsing, standardization, and

validation details. The results of the Customer Data Parser
Advanced Details
8-7
are shown in the lower left window and the results of the
Postal Matcher are shown in the lower right window.
Figure 8.4 Parsing, Standardization, and Validation

Details
Advanced Details
In addition to the parsing, standardization, and validation details,
you can review the advanced details of the results of the Customer
Data Parser and Postal Matcher results.
To review advanced details of data
1.
Click Advanced Detail from the main menu. Refer to the

table below and select the desired information:
Select...
To...

and Postal Matcher
Details
Review the PREPOS information returned from the

Customer Data Parser and Postal Matcher

Review Group
Descriptions
Look up the description of the Customer Data Parser

Review Group returned
Postal Matcher Return

Code Descriptions
Look up the description of the Postal Matcher Return

Code
DPV Return Code

Descriptions
Look up the description of the DPV Return Code
8-8
Matching
Matching
Once the Cleansing step has run, you can match the record against
records in your database.
To match data against database
1.
Select the Matching tab. Notice the window key for the
cleansed record is shown.
Window Key
Figure 8.5 Matching Tab

2.
Click the Plus sign (+) to show the Master Database. The
records in the database are shown in the lower window.
Figure 8.6 Records in Master Database

Matching
8-9
3.
Click either Match Individual or Match Household to set

the level of matching.
4.
Click Match.
You can also click the Match button
bar.
5.
in the tool
The match results are displayed in the Window Key

Matched Records from the Master Database and
Relationship Linker Matched Records on the right side of
the window.
Window Key Matched Records from the Master
Database shows all records in the database with the same
window key as the input record.
Figure 8.7 Window Key Matched Records

Relationship Linker Matched Records shows all matched
records from the Window Key Matched records.
Figure 8.8 Relationship Linker Matched Records

To view and edit linking rules
If you want to see and edit the field and/or pattern list files for
this matching, launch the Relationship Linker Rule Editor within
the TS Quality Analyzer.
8-10
Organize Database
1.
Select Match Rules from the Tuning menu.
2.
Select either Consumer or Business, and then select either

Level1 or Level2.
3.
The Relationship Linker Rule Editor opens with the field and/
or pattern files for this matching process.
4.
Review, edit and save the field and/or pattern files.
5.
To match the data again using the updated field and/or

pattern files, go back to the Standardization tab and cleanse
the data again.
6.
Go to the Matching tab and re-run matching by clicking

Match.
Organize Database
To add data to database
At this point, if you decide to keep the input record in the
database, you can add the record.
1.
Click Add to DB. The cleansed and matched input record is

added to your database.
To remove data from database

1.
In the master database, highlight the data you want to

remove.
2.
Click
to remove that data.
To reset database
3.
Select Master Database from the Reset menu.
4.
At the confirmation message, select Yes.
9-1
CHAPTER 9
Enriching Your Data
Enriching Your Data
9-2
Sorting for the Postal Matcher

Once the name and address data is parsed, the address data must
be verified and enriched by the Postal Matchers. With the Postal
Matchers, data is matched to directories and appropriate
geographic fields are populated with postal geocoding data. The
Postal Matchers help you locate customers, verify address data, and
improve that data. All Postal Matchers rely on output from the
parsing process to provide addresses for linking purposes.
Sort the output file from the Customer Data Parser
Specify input, output, and the postal tables for the Postal
Matcher
Run the Postal Matcher and view results
Identify the match level code for a record
View the record and analyze the match to the Postal
Directory
Browse the postal directories for each country
We strongly recommend that the output file from the CDP
be sorted by geographic fields so that the records will be in
geographic order to permit the Postal Matchers to work
most efficiently.
Sorting for the Postal Matcher

The Postal Matchers use output from the parsing process as inputs.
To obtain optimum performance, the input files to the Postal
Matchers must first be sorted in geographic order, using the Sort
Utility. The output file will have the extension .srt to indicate that
the data have been sorted.

The Sort Utility uses the output from the Customer Data Parser step
as inputs.
9-3
Tip:
You can either
edit the file names
manually or click
the File Chooser
icon to browse to
the
appropriate
file and
select it.
To view the
contents of your
data file, click the
Data
Browser
icon.
Use the
Dictionary
Editor to view the
contents
of the DDL
file.
1.
Open the Sorting Utility step and click the Input Settings
tab.
2.
Name text boxes.
3.

OR
4.
5.
Enter the Output File Name and Output DDL Name file
names. The Output File Name must have the extension .srt
to indicate this is a sorted file.
6.

To specify the output file qualifier

A File Qualifier is a unique name given to a data file. For
the Sort Utility, the output data file must have its unique file
qualifier.
1.
2.

You may also specify the following settings:

1.
2.
Enter a numeric value in Start at Record. This value

determines the record in the input data file at which the Sort
Utility will begin processing (default is 1).
Enriching Your Data
9-4

1.
2.

value specifies the maximum number of records to process.
By default, all records will be processed.
Valid
delimiters are
Tab, Space,
Semicolon,
Comma, and
Pipe.
Characters
other than
those listed
must be
enclosed by
quotation
marks.
1.
2.
Enter a numeric value in Process Nth Sample. This value


1.
2.

3.

Settings.
4.


Select or Bypass Records on page 5-37 for instructions
on how to specify select/bypass definitions.
Process Settings
9-5
Process Settings
Once you have identified the input and output files, you are ready
to specify the settings used to process your data. The settings for
processing are managed in the Advanced Settings window.
Sort Fields
To specify sort fields
A red flag
indicates a
REQUIRED
field for this
operation.
1.
2.
Click the Entry Settings tab.
3.
Select the input DDL fields from the drop-down list in the
Key box. These are the fields used in the sort process.
Sort fields are pre-determined according to the
country-specific step. You can change the default
fields by selecting different sort fields.
4.
Select the sort order from the drop-down list in the Order
box. Values are either Ascending Order or Descending
Order.
Geographic
fields used in the
sort process
Figure 9.1 Sort Entry Settings

To specify collating sequence
You can specify the collating sequence for the sort order. This is
optional.
Enriching Your Data
9-6
Additional Settings
1.
2.
Click the Entry Settings tab.
3.
In the Collating Sequence box, specify the collating

sequence. Values are ASCII, EBCDIC, FOLDED_ASCII,
FOLDED_EBCDIC, or MULTI_NATIONAL. If omitted, the
default collating sequence defined by the operating system is
used.
For detailed information on the collating sequence, see
the Sort Utilitys Online Help.
Additional Settings
You can specify the following additional settings.
See Sort in
the TS Quality
Reference
Guide for the
complete
settings
information.
To retain the order of same-key records

If you want output data to retain the order of same-key records,
use Stable Sort.
1.
2.
Click the Main Settings tab.
3.
Select Stable Sort.
To specify how equal-keyed records are handled

You can specify how duplicate records are handled when there
are duplicate keys.
1.
2.
3.
In the Duplicates box, select an option from the drop-down

list. Values are:
KEEP_ALL - Keeps all the records.
KEEP_ONE - Keeps one record. It does not guarantee
that a particular record within the duplicate set will be
retained.
KEEP_NONE - Keeps none of the records.
Additional Settings
9-7
JUST_ DUPS - Keeps just the duplicates.

1.
2.
3.
4.
name, or enter a new file name to receive debugging
information.

1.
2.
3.
In the Sample Count text box, enter the number that


1.
2.
3.
In Settings File Encoding, select the appropriate encoding

Enriching Your Data
9-8
Run the Sorting Utility and Check Results

To run the Sort Utility and view results
When you click

Run,
TS Quality
automatically
saves your
settings.
To save your
settings
without running
the step, click
Save.
1.
Click OK to close the Advanced Settings.
2.
Click Run to run the Sorting Utility.

Selected.
3.
Click OK.
4.
In the Results window, you will see the Statistics sub-tab.

The Sort Key summary is shown on this sub-tab.
TS Quality offers a number of utilities to perform
specific tasks. See Chapter 16, Utilities, for a review
of these tools.
Using the Postal Matchers
9-9
Using the Postal Matchers

Postal Matchers match your data to the country-specific TS
Quality Postal Directories and return address details and
database matches.
Postal Matchers perform these functions:
Verify and assign postal codes to name and address data
Assign delivery point identifier (DPID)
Standardize and correct address components
Provide linked data in a presentation form that meets the
country addressing standards
The TS Quality Postal Directories are included in the
package. Country-specific directories were installed
during the TS Quality installation process. You can
browse the postal directories using the Postal
Directory Browser. See Browsing the Postal
Directory on page 9-20.

The Postal Matcher uses the output from the Sort Utility step as
input to this step.
1.
Open the Postal Matcher step and select the Input

Settings tab. Specify the Input File Name and Input DDL
Name.
2.
If you are using the Census tables and/or DPV tables, select
the Include Census Tables or Include DPV Tables box.
3.
Select the Output Settings tab. Specify the Output File

Name and Output DDL Name.
Enriching Your Data
9-10

4.


input and output data file must have its own unique file qualifier.
1.
2.
Select Input Data File Qualifier (default is INPUT).
3.
4.

1.
2.
Enter a numeric value in Start at Record. This identifies the

record in the input data file at which the Postal Matcher will
begin processing (default is 1).

1.
2.


1.
2.


Process Settings
Valid
delimiters are
Tab, Space,
Semicolon,
Comma, and
Pipe.
Characters
other than
those listed
must be
enclosed by
quotation
marks.
9-11
1.
2.

3.

Settings.
4.

You can specify records to either Select or Bypass
under certain conditions in both input and output files.
See Select or Bypass Records on page 5-37 for
instructions on select and bypass definitions.
Process Settings
Once you have identified input and output files, you are ready to
specify settings to process your data. The settings for processing
are managed in the Advanced Settings window.
Postal Directories
The country-specific postal directories are included in TS Quality
and were installed when you installed the software. These
directories must be accessible to all projects.
See Installing TS Quality for a complete list of postal
directories for all countries and the locations of those
tables.
To specify postal directories
1.
Click Advanced in the Postal Matcher step and navigate to

Process, Settings.
Enriching Your Data
9-12
Postal Directories
The Process Settings window will vary from country
to country. See TS Quality Reference Guide for a
complete list of settings for each country.
2.
If you are using the US Postal Matcher, the settings are

displayed in Figure 9.2:
A red flag
indicates a
REQUIRED
field for this
operation.
Figure 9.2 Postal Matcher Settings (US)

3.
Refer to the table below to define each setting.
Setting
Description
Postal Base Data File
The file that contains street details

information: for example, USBASE.tbl.
Postal Level1 Data File
The file that contains level1 street name

information: for example, USINDEX1.tbl.
Postal Level2 Data File
The file that contains level2 city information:

for example, USINDEX2.tbl.
Postal Form File
The file that contains the postal certification

report. Required for USPS form.
Additional Settings
9-13
Setting
Description
Postal Form Database Date
Format of date to display on the report: for

example, 'MMM YYYY'.
Postal Form List
Name of list to be matched against US

tables: for example, 'DATA FILE'.
Postal Form Customer
Client name to display on the report: for

example, 'CUSTOMER NAME'.
Postal Form Job Number
The job number to print on the form: for

example, 99999.
4.
If you have checked the Include Census Tables and/or

Include DPV Tables box in the Input Settings tab,
Census Settings and/or DPV Settings window will be
enabled under Process. In this case, you must select your
census/DPV tables in each window.
Additional Settings
You can also specify the following additional settings.
See Postal
Matchers in
the TS Quality
Reference
Guide for the
complete
settings
information.

1.
2.
3.
In the Debug File text box, choose the default path and file
name, or enter a different file name to receive debugging
information.

1.
2.

Enriching Your Data
9-14
Run the Postal Matcher and View Results

1.
Click the Advanced button and navigate to Process,

Settings.
2.

the drop-down list.
information.
Run the Postal Matcher and View Results

To run the Postal Matcher and view results
When you click

Run,
TS Quality
automatically
saves your
settings.
To save your
settings
without running
the step, click
Save.
1.
Click OK to close Advanced Settings.
2.
Click Run to run the Postal Matcher.
3.
Select OK.
4.
On the Results tab, the Statistics subtab appears. Record

Matches, Processing, Changes and Failures are shown
on this tab, as seen in figure 9.3.
Figure 9.3 Postal Matcher Statistics
Match Levels
9-15
After running the Postal Matchers, the Match Level Codes are
generated to identify specific conditions which occur for each record
being processed. You should review those codes to analyze the
postal matcher results.
Match Levels
The Match Level Codes indicate the accuracy of the match
between country geography data to the appropriate postal table.
The match level codes are written and the output record in the xx_
gout_match_level field.
In actual use, the xx in the descripton above will be
replaced with a two-letter country code (Example: US
= United States, CA = Canada, GB = Great Britain and
DE = Germany). Thus, xx_gout_match_level would
become US_gout_match_level for United States data.
Figure 9.4 Match Level Codes
Enriching Your Data
9-16
Match Levels
There are several Match Level Codes. Some common codes include:
A 0 in the US_GOUT_MATCH_LEVEL field indicates that the
input data successfully matched to the Directory
An Y in the US_GOUT_STREET_NAME_CHANGE field
indicates that the street name was changed
Misspelled street name was corrected
The full street name was given to the abbreviated name
See the TS Quality Reference Guide for a complete
list of Match Level Codes for the Postal Matchers.
Dual Address Information
9-17
Dual Address Information

Dual Address On the Same Line
In accordance with CASS requirements, if there are two addresses
on the same line, referred to as a dual address, the US Postal
Matcher may require both addresses for look up. Therefore, the
Customer Data Parser (CDP) needs to pass both addresses to the
US Postal Matcher.
Dual address information is passed to the US Postal Matcher from
the CDP using the us_gin and us_gout areas. The following rules
describe how a dual address is handled:
1.
Of the two addresses in a dual address, if one address is general

delivery, then us_gin_street_name will contain the other
address, and a G is set in the first position of us_gout_
secondary_type.
2.
If one of the addresses is a post office box (PO box), then us_
gin_street_name will contain the other address, and a P is
set in the first position of us_gout_secondary_type. The PO
box number is also stored, starting at the second position of
us_gout_secondary_type.
3.
If the dual address contains both a general delivery address and

a PO box number, then PO BOX is stored in us_gin_street_
name, and a G is set in the first position of us_gout_
secondary_type.
4.
If the dual address contains both a street name and a rural

route, then the street name is stored in us_gin_street_name,
and a 'R' is stored in the first position of us_gout_secondary_
type. In addition, the route number is stored starting at the
second position of us_gout_secondary_type, and the box
number is stored starting at the second position of us_gout_
secondary_number.
Currently, the Customer Data Parser handles the following dual

address cases:
street name / general delivery
Enriching Your Data
9-18
Dual Address Information Handling

general delivery / street name
street name / PO box
PO box / street name
general delivery / PO box
PO box / general delivery
rural route / general delivery
general delivery / rural route
rural route / PO box
PO box / rural route
street name / rural route
rural route / street name
Dual Address Information Handling

The following table shows where dual address information is stored
for the above cases:
Table 9.1 Dual Address Information Handling
Dual Address
us_gin_street_
name
Dual Addr Flag

(us_gout_
secondary_type)
street name /
general delivery
street name
general delivery /
street name
street name
street name /
PO box
street name
PO box
number
PO box /
street name
street name
PO box
number
general delivery /
PO box
PO box
PO box /
general delivery
PO box
us_gout_
secondary_
type[1]
us_gout_
secondary_
number[1]
Dual Address On Different Lines
9-19
Table 9.1 Dual Address Information Handling

Dual Address
us_gin_street_
name
Dual Addr Flag

(us_gout_
secondary_type)
us_gout_
secondary_
type[1]
us_gout_
secondary_
number[1]
rural route /
general delivery
rural route
general delivery /
rural route
rural route
rural route /
PO box
rural route
PO box
number
10
PO box /
rural route
rural route
PO box
number
11
street name /
rural route
street name
route
number
PO box
number
12
rural route /
street name
street name
route
number
PO box
number
The maximum length of the PO box number in cases 3, 4, 9 and 10

is 9. It extends into us_gout_secondary_number. The maximum
length of the PO box number in cases 11 and 12 is 6.
Dual Address On Different Lines

When dual address occurs on different lines, the address that is the
closest to the geography line is passed to the US Postal Matcher.
Changes to the PREPOS

If an address contains both a PO box and a rural route with a PO
box number, there is no room to store the second PO box number.
Therefore, the literal PO BOX is stored in pr_dwelling3_name_
recoded and the PO box number is stored in pr_dwelling3_
number.
Enriching Your Data
9-20
Browsing the Postal Directory
Browsing the Postal Directory

You can browse the postal directories using the Postal Directory
Browser. The Postal Directory Browser contains separate
interactive browsers to view the postal directories for all countries
included in the package. There are three levels for browsing: City
Level, Street Level, and Street Detail.
The Postal Directory Browser is not available for AsiaPacific (APAC) countries.
City Level Directory

To browse a city level directory
For detailed
information
on the Postal
Directory
Browser, see
the Online
Help.
1.
Select Postal Directory Browser on the Tools Palette. The

Configuration Dialog box appears.
Figure 9.5 Postal Directory Browser

Configuration Dialogue
2.
From the drop-down menu, select the country whose postal

directory you want to browse.
3.
Select the directory containing your pdb_settings directory.
Street Level Directory

4.
9-21
Click OK. The City Level window for the selected country
opens. This window lists cities, zip codes, and finance codes.
Figure 9.6 City Level Directory (US)

5.
To search for a particular city, use one of the search boxes in

the upper part of the window. For the US, the search boxes
are CITY, STATE, ZIPCODE, FINANCE CODE and US
Census Search.
6.
As you enter data in the search box, the program searches

for your entry. You need only enter information in one of the
search windows in order for the program to determine the
others.
7.
To clear the search boxes, click Clear.
Street Level Directory

To browse a street level directory
1.
Once you have selected a city, double-click the entry or click

Run to bring up the Street Level window. For the US, the
Enriching Your Data
9-22
Street Details
street level window contains all the street names for the
selected city.
Figure 9.7 Street Level Directory (US)

2.
To search for a certain street, use the search box. As you

enter information in the search box, the program will search
for the appropriate entry.
3.
To clear the search box, click Clear.
Street Details
To browse the street details
1.
Once you have selected a Street Name, double-click the

entry or click Run to bring up the Street Level Details
window.
2.
The Street Level Details window displays street details

under the fields for the selected street.
Street Details
3.
9-23
These fields vary from country to country. For example, the

US fields would look like this:
Figure 9.8 Street Name (US)

For detailed
information
on the
country
fields, see
Postal
Directory
Browsers
Online Help.
4.
The Postal Directory Browser displays the Street Detail.

Scroll to view all data presented by the Postal Directory
Browser.
Figure 9.9 Street Details (US)
Enriching Your Data
9-24
Street Details
10-1
CHAPTER 10
Linking Your Data
Linking Your Data
10-2
This chapter explains how to link your data. Linking is the process of
identifying records with a matching relationship (consumer/
business) in a file or duplicates in several files. Linking compares
records to determine the level of similarity between them.
The result of the comparisons is categorized as either a passed,
suspect, or failed match, based on the similarity of data elements in
the records, as well as the assigned score of their exceptions.
Data linking involves three steps:
Create window keys using the Window Key Generator
Sort records by the window key using the Sort Utility
Match records using the Relationship Linker
Using the Window Key Generator
10-3
Using the Window Key Generator

The Window Key Generator lets you create window keys that
are used to match records in the Relationship Linker. The
Relationship Linker tries to match records in the same window
key set so that it does not need to compare every record in the
database to every other record.
A window key is constructed from elements of input fields, such as
the first character of a business name and the first five characters
from a postal code field. To generate a window key, you must first
create a Window Key Rule that defines which part of each
element to include in the key. You can use one or more keys to filter
selected records for comparison.
Example
Input Records:
CENTER HOSPITAL
25 BRATTLE LN
ARLINGTON MA 02476
CHEMIST ASSOCIATES
12 BRANTWOOD RD
ARLINGTON MA 02476
Window Keys are generated from one of the window key rules
provided by the Window Key Generator. For example, Key_List_10
is set to generate the window key as follows:
Key_List_10 rule:
Use the first three character of postal code.
Append to this the first character of the business name.
Append to this the first character and subsequent
consonants of the street name.
Append to this a 1 if this is a personal name and a 2 if this
is a business name.
Linking Your Data
10-4

Window key that is generated:
024CBR2
024CBR2
The same window key is generated for both records, bringing them
into the same match window for comparison purposes. Subsequent
matching rules will indicate that these records are not matches.

The Window Key Generator uses the Postal Matcher output as input.
1.
Open the Window Key Generator step and click the Input
Settings tab.
2.
Enter the Input File Name and Input DDL Name.
3.
Click the Output Settings tab and enter the Output File
Name and Output DDL Name.
4.

You can also specify these additional settings:

1.
2.

(default is 1).

1.
2.

Process Settings
10-5

1.
2.


Valid
delimiters are
Tab, Space,
Semicolon,
Comma, and
Pipe.
Characters
other than
those listed
must be
enclosed by
quotation
marks.
1.
2.

3.

Settings.
4.


Select or Bypass Records on page 5-37 for instructions.
Process Settings
Once you have specified input and output files, you can define the
settings to process your data. The settings for processing are
specified in the Advanced Settings window.
Linking Your Data
10-6
Create Window Key Rules
Create Window Key Rules

The Window Key is generated from a window key rule selected from
the Key_List. Before you can apply key rules, you must first
construct them.
To define window key rules
You can create

up to 30
window keys.
The maximum
window key
length is 50
bytes.
1.
Click Advanced. Navigate to Window Key Rules.
2.
Select a key file from the list of Key_List_01-30.
3.
In Primary Field Name, select the primary field name you

want to use in building the window key from the drop-down
list.
4.
In Number Characters Primary Field, specify the number

of characters to use from the Primary Field Name.
5.
In Primary Field Winkey Code, select conditions you want

to apply to the primary field from the drop-down list.
Figure 10.1 Window Key Rules

In this example, the Key_List_10 rule is used to generate
the window key as follows:
A red flag
indicates a
REQUIRED
field for this
operation.
Use the first three characters of the postal code

Append the first character of the business name
Append the first character and subsequent consonants of
the street name
Append a 1 if this is a personal name, and a 2 if this is
a business name
You can also specify a secondary window key. The
secondary window key will be used if the conditions in
Field Value Invoke Secondary Field are met.
Specify the Window Key Field

6.
10-7
Review the list of the fields, number of characters and the

window key codes used in the generation of the window key.
Specify the Window Key Field

The Window Key Field determines where the generated window
key will be placed on the output record.
To set window key fields
1.
Navigate to Keys, Keys Settings, Source Key. Under

Source Key, click on a cell and select the name of Key_List
from the drop-down list that will appear (Key_List_01 - 30).
2.
In Window Key Field Name, select the field name from the
drop-down list. The generated window key will be placed into
that field on the output record. In this example, the
generated window key from KEY_LIST_10 will be placed into
the field named WINDOW_KEY_01:
Figure 10.2 Window Key Field
Additional Settings
You can also specify these additional settings:
1.
2.
3.
name, or enter the name of a file to receive debugging
information.

1.

Linking Your Data
10-8
Additional Settings
2.


See Window
Key Generator
in the TS
Quality
Reference
Guide for
complete
settings
information.
1.
2.
In Settings File Encoding, select the encoding from the

drop-down list.
To specify mask file

1.
2.
In the Mask File text box, enter the path and file name for
the mask file.
Run the Window Key Generator and View Results
10-9
Run the Window Key Generator and View

Results
To run the Window Key Generator and view results
When you click

Run,
TS Quality
automatically
saves your
settings.
To save your
settings
without running
the step, click
Save.
1.
2.
Click Run to run the Window Key Generator step.
3.
Select OK.
4.
On the Results tab, the Statistics sub-tab appears.
5.
Navigate to the Output Settings tab and click the Data

Browser icon to view the WINDOW_KEY_01 field.
Figure 10.3 Window Key Generated

6.
Notice that all of the generated window keys end with a 2.

This means all of the records have been designated as
business records.
Linking Your Data
10-10
Sorting the Record by the Window Key
Sorting the Record by the Window Key

After creating the window keys, but before running the Relationship
Linker, the input record must be sorted by the window key. The
Sort Utility is used to sort a file into the desired order. In this
example, the output file from the Window Key Generator will be
sorted by the WINDOW_KEY_01 field. The output file will have
the extension .srt to indicate that the file has been sorted.

The Sort Utility uses the output from the Window Key Generator
step as input.
Tip:
You can either
edit the file names
manually or click
the File
Chooser
icon to
browse to the
appropriate file
and select it.
To view the
contents of your
Data
Browser
icon.
Use the
Dictionary
Editor to
view the
contents of
the DDL file.
1.
Open the Sorting Utility step.
2.
Select the Input Settings tab.
3.
Name text boxes.
4.

OR
5.
6.
Enter the Output File Name and Output DDL Name. The
Output File Name should have the extension of .srt to
indicate that it is a sorted file.
7.

Process Settings
10-11
To specify the output file qualifier

The File Qualifier is a unique name given to a data file. For
the Sort Utility, the output data file must have the unique file
qualifier (with .srt suffix).
1.
2.

Specify Output Data File Qualifier. The default is OUTPUT.
See Input and Output Settings on page 9-2 for the
optional input and output settings for the Sort Utility.
Process Settings
Once you have identified input and output files, you are ready to
define the settings to process your data. The settings for processing
are managed in the Advanced Settings window.
Specify Sort Fields

To specify sort fields and sort order
A red flag
indicates a
REQUIRED
field for this
operation.
1.
2.
Click Entry Settings.
3.
Select the input DDL fields from the drop-down list in the
Key box.
4.
Select the sort order from the drop-down list in the Order
box. Values are either Ascending Order or Descending
Order.
Figure 10.4 Sort Field for Window Key

See Additional Settings on page 9-6 for the
additional settings for the Sort Utility.
Linking Your Data
10-12

To run the Sort Utility and view results
When you click

Run,
TS Quality
automatically
saves your
settings.
To save your
settings
without running
the step, click
Save.
1.
2.
Click the Run button to run the Sorting Utility.

Selected.
3.
4.
Select OK.
On the Results tab, the Statistics sub-tab appears. The
Sort Key Summary is shown on this tab.
Be sure the file to be used in the Relationship Linking
step is sorted by the appropriate window key.
Using Relationship Linker
10-13
Using Relationship Linker

The Relationship Linker step identifies the relationships between
records in a file at the business and consumer level. It can also
identify whether duplicates exist in several files.
The Relationship Linker uses Comparison Routines to determine
the level of similarity between records. The result of the
comparisons is categorized as either Pass, Suspect, or Fail, based
on the similarity of data elements.
There are two types of linking functions:
Window Linkingcompares records to other records in the
same file
Reference Linkingcompares records in the input file to
an existing reference file
For each linking, there are two levels of matching:
Consumer
Consumer Level 1 - Household level matching
Consumer Level 2 - Individual level matching
Business
Business Level 1 - Company level matching
Business Level 2 - Contact level matching
Comparison Routines are used to compare a variety of types of
data including business names, personal names, and geographic
components. For example, the ABSOLUTE routine compares two
fields and looks for an exact match.
The next chapter will explain how to change and tune
the comparison routines. See TS Quality Reference
Guide, Appendix C for a detailed description of
Relationship Linker routines and their associated
scoring values.
Linking Your Data
10-14
Linking Examples
Linking Examples
This section contains detailed examples for each stage of matching,
beginning with input data.
Example 1: Sample Input Data

Assume that you have the following input data:
---------------------------------------------------------------Vals Lube & Repair 105 Main St
Tyngsboro
Ma
01879
Vals Lubrication
Main St
Tyngsboro
Ma
01879
John C Nicoli
25 Linnell Cir
Billerica
Ma
01862
J C Nicoli
25 Linnell Cir
Billerica
Ma
01862
John Nicole
91 Linnell Cir
Billerica
Ma
01862
Chris J Nicoli
25 Linnell Cir
Billerica
Ma
01862
Vals Lube Co
105 Main St
Tyngsboro
Ma
01879
C
J Nicoli
25 Linnell Cir
Billerica
Ma
01862
Vasco Laboratories
13 Main St
Tyngsboro
Ma
01879
----------------------------------------------------------------
Example 2: Data With Appended Window Key

Create a window key (the last field) using Key_List_10. (See
Create Window Key Rules on page 10-6 for the rules of Key_List_
10. )
---------------------------------------------------------------------------Vals Lube & Repair 105 Main St
Tyngsboro
Ma
01879 018VA MAI2
Vals Lubrication
Main St
Tyngsboro
Ma
01879 018VA MAI2
John C Nicoli
25 Linnell Cir
Billerica
Ma
01862 018NICLIN1
J C Nicoli
25 Linnell Cir
Billerica
Ma
01862 018NICLIN1
John Nicole
91 Linnell Cir
Billerica
Ma
01862 018NICLIN1
Chris J Nicoli
25 Linnell Cir
Billerica
Ma
01862 018NICLIN1
Vals Lube Co
105 Main St
Tyngsboro
Ma
01879 018VA MAI2
C
J Nicoli
25 Linnell Cir
Billerica
Ma
01862 018NICLIN1
Vasco Laboratories
13 Main St
Tyngsboro
Ma
01879 018VA MAI2
----------------------------------------------------------------------------
Linking Examples
10-15
Example 3: Data Sorted By Window Key

The input record must be sorted by the window key. The
Relationship Linker will match records in the same window key set.
---------------------------------------------------------------------------John C Nicoli
25 Linnell Cir
Billerica
Ma
01862 018NICLIN1
John Nicole
91 Linnell Cir
Billerica
Ma
01862 018NICLIN1
Chris J Nicoli
25 Linnell Cir
Billerica
Ma
01862 018NICLIN1
J C Nicoli
25 Linnell Cir
Billerica
Ma
01862 018NICLIN1
C
J Nicoli
25 Linnell Cir
Billerica
Ma
01862 018NICLIN1
Vals Lube Co
105 Main St
Tyngsboro
Ma
01879 018VA MAI2
Vasco Laboratories
13 Main St
Tyngsboro
Ma
01879 018VA MAI2
Vals Lubrication
Main St
Tyngsboro
Ma
01879 018VA MAI2
Vals Lube & Repair 105 Main St
Tyngsboro
Ma
01879 018VA MAI2
----------------------------------------------------------------------------
Example 4: Data Grouped by Matched Level 1 (Households)

After running the Relationship Linker, matched households would
look like this:
---------------------------------------------------------------------------John C Nicoli
25 Linnell Cir
Billerica
Ma
01862 018NICLIN1
Chris J Nicoli
25 Linnell Cir
Billerica
Ma
01862 018NICLIN1
J C Nicoli
25 Linnell Cir
Billerica
Ma
01862 018NICLIN1
C
J Nicoli
25 Linnell Cir
Billerica
Ma
01862 018NICLIN1
---------------------------------------------------------------------------John Nicole
91 Linnell Cir
Billerica
Ma
01862 018NICLIN1
---------------------------------------------------------------------------Vals Lube Co
105 Main St
Tyngsboro
Ma
01879 018VA MAI2
Vals Lubrication
Main St
Tyngsboro
Ma
01879 018VA MAI2
Tyngsboro
Ma
01879 018VA MAI2
---------------------------------------------------------------------------Vasco Laboratories
13 Main St
Tyngsboro
Ma
01879 018VA MAI2
----------------------------------------------------------------------------
Linking Your Data
10-16
Linking Examples
Example 5: Data Grouped by Matched Level 2 (Individuals) in

Matched Level 1 (Households)
After running the Relationship Linker, matched individuals in
matched households would look like this:
----------------------------------------------------------------------------*John C Nicoli
25 Linnell Cir
Billerica
Ma
01862 018NICLIN1
J C Nicoli
25 Linnell Cir
Billerica
Ma
01862 018NICLIN1
*Chris J Nicoli
25 Linnell Cir
Billerica
Ma
01862 018NICLIN1
C
J Nicoli
25 Linnell Cir
Billerica
Ma
01862 018NICLIN1
----------------------------------------------------------------------------*John Nicole
91 Linnell Cir
Billerica
Ma
01862 018NICLIN1
----------------------------------------------------------------------------*Vals Lube Co
105 Main St
Tyngsboro
Ma
01879 018VA MAI2
Vals Lubrication
Main St
Tyngsboro
Ma
01879 018VA MAI2
Tyngsboro
Ma
01879 018VA MAI2
----------------------------------------------------------------------------*Vasco Laboratories
13 Main St
Tyngsboro
Ma
01879 018VA MAI2
-----------------------------------------------------------------------------
* Indicates best record or 'survivor' of the match. See Using the

Create Common Utility on page 12-3 to learn more about the best
record and survivor record.
Example 6: Data Grouped by Suspect Level 1 (Households)

After running the Relationship Linker, suspect household would look
like this:
---------------------------------------------------------------------------John C Nicoli
25 Linnell Cir
Billerica
Ma
01862 018NICLIN1
Chris J Nicoli
25 Linnell Cir
Billerica
Ma
01862 018NICLIN1
J C Nicoli
25 Linnell Cir
Billerica
Ma
01862 018NICLIN1
C
J Nicoli
25 Linnell Cir
Billerica
Ma
01862 018NICLIN1
John Nicole
91 Linnell Cir
Billerica
Ma
01862 018NICLIN1
---------------------------------------------------------------------------Vals Lube Co
105 Main St
Tyngsboro
Ma
01879 018VA MAI2
Vals Lubrication
Main St
Tyngsboro
Ma
01879 018VA MAI2
Tyngsboro
Ma
01879 018VA MAI2
---------------------------------------------------------------------------Vasco Laboratories
13 Main St
Tyngsboro
Ma
01879 018VA MAI2
----------------------------------------------------------------------------
Linking Examples
10-17
Example 7: Data Grouped by Suspect Level 2 (Individuals) within

Suspect Level 1 (Households)
After running the Relationship Linker, suspect individuals in suspect
households would look like this:
----------------------------------------------------------------------------*John C Nicoli
25 Linnell Cir
Billerica
Ma
01862 018NICLIN1
J C Nicoli
25 Linnell Cir
Billerica
Ma
01862 018NICLIN1
John Nicole
91 Linnell Cir
Billerica
Ma
01862 018NICLIN1
*Chris J Nicoli
25 Linnell Cir
Billerica
Ma
01862 018NICLIN1
C
J Nicoli
25 Linnell Cir
Billerica
Ma
01862 018NICLIN1
----------------------------------------------------------------------------*Vals Lube Co
105 Main St
Tyngsboro
Ma
01879 018VA MAI2
Vals Lubrication
Main St
Tyngsboro
Ma
01879 018VA MAI2
Tyngsboro
Ma
01879 018VA MAI2
----------------------------------------------------------------------------*Vasco Laboratories
13 Main St
Tyngsboro
Ma
01879 018VA MAI2
-----------------------------------------------------------------------------
Linking Your Data
10-18
Window Linking
Window Linking
Window Linking compares records to other records in the same
file. A group of records is matched to each other, one window key
set at a time.

The Relationship Linker uses the output from the Sort Utility2 step
as input.
Tip:
You can either
edit the file names
manually or click
the File
Chooser
icon to
browse to the
appropriate file
and select it.

1.
Open the Relationship Linker step and select the Input

Settings tab.
2.
Name text boxes.
3.

- OR -
To view the
contents of your
data file,
click the
Data
Browser icon.
Use the
Dictionary
Editor to view the
contents
of the DDL
file.
4.
5.
Name text boxes.
6.
Optionally, specify a Linking File. A linking file indicates

which matched records are linked together with common
data. If you want to produce a linking file, identify the
Linking Data File and Linking DDL File.
7.

10-19

1.
2.
3.
4.
You can also specify the following settings.

1.
2.

(default is 1).
Valid
delimiters are
Tab, Space,
Semicolon,
Comma, and
Pipe.
Characters
other than
those listed
must be
enclosed by
quotation
marks.
1.
2.


1.
2.


1.
Linking Your Data
10-20
Basic Settings
2.

3.

Settings.
4.

instructions.
Basic Settings
You must specify the match method and the name form field.
To select match method
A red flag
indicates a
REQUIRED
field for this
operation.
1.
2.
In Match Method, select Window Matching from the

drop-down list.
To specify name form field

1.
2.
In Name Form Field, select the name form field from the
drop-down list. The Name Form Field contains the
Consumer/Business flag. This field is created by the
Transformer or Customer Data Parser, and is used by the
Relationship Linker to distinguish between consumer and
business records.
Field and Pattern Files
Flag
Description
Consumer
Business
10-21
The consumer/business flag within the matching

window must be the same.
Field and Pattern Files

The Relationship Linker uses Field Files and Pattern Files in the
linking process.The default files for your country are included in the
TS Quality package.
Field Files - contains fields to compare in the linking
process.
Default Field Files:
\TrilliumSoftware\tsq10r5s\<project>\settings\
xxbus1fld.stx (business level1)
xxbus2fld.stx (business level2)
xxcon1fld.stx (consumer level1)
xxcon2fld.stx (consumer level2)
Pattern Files - contains patterns or report cards to
determine the level of similarity between the records in the
linking process. The pattern is assigned a number and
designated with a pass, suspect, or fail.
Default Pattern Files:
\TrilliumSoftware\tsq10r5s\<project>\settings\
xxbus1pat.stx (business level1)
xxbus2pat.stx (business level2)
xxcon1pat.stx (consumer level1)
xxcon2pat.stx (consumer level2)
Linking Your Data
10-22
Window Key Field

To specify field and pattern settings
1.
Click Advanced and navigate to Process, Field Pattern

Settings.
2.
Accept the default country specific settings files or select the

customized settings file from the drop-down list.
See Using the Relationship Linker Rule Editor on
page 11-12 to learn more about customizing the field
and pattern files.
Window Key Field

The Relationship Linker tries to match records in the same window
key set. Therefore, you must specify the window key field.
To specify window key field
1.
Navigate to Process, Transaction Window Settings.
2.
In Window Key Field, select the window key field you are
using for matching. In this example, Window Key Field is set
to WINDOW_KEY_01.
Figure 10.5 Window Key Field
Window Size
You can control how many records are added to the match window.
If there are more records of one window key than the value
specified, additional windows are created for the remaining records.
For example, if you have 1000 records and set the value at 500,
additional match windows are created for the remaining records.
Run the Relationship Linker and View Results
10-23
To specify maximum window size

1.
Click Advanced and navigate to Process, Transaction

Window Settings.
2.
In Maximum Window Size, specify a numeric value.
For the additional settings for window linking, See

Additional Settings on page 10-28.

To run the Relationship Linker and view results
When you click
Run,
TS Quality
automatically
saves your
settings.
To save your
settings
without running
the step, click
Save.
1.
Click OK to close the Advanced Settings and then click Run

to run the Relationship Linker.
You can right-click on a step and select Run Selected.
2.
Select OK.
3.
On the Results tab, the Statistics sub-tab appears. Review

the statistics for the Relationship Linker on the sub-tab.
4.
Click Results Analyzer to view the record detail for the

linking process. The output data set is sorted by matched
individual number within matched household number within
the window key.
The Results Analyzer allows the user to view the
actual data and match results. We will explain this tool
in detail in the next chapter.
Linking Your Data
10-24
Reference Linking
Reference Linking
Reference Linking compares records in your input file to an
existing reference file. It is mainly used to update new records
within the existing master file in the database.
For example, suppose youve received a new set of records after
running the initial linking. In this case, you would take the new
records as your input file and the initial matched records as your
reference file. You can compare the input file with the reference file
and verify the existence of new records in the reference file, and
update the file if necessary.
If a match is found, a matching key number is copied from the
reference record to the input record. If no match is found, a new
key number is generated and appended onto the input record. The
number of output records in reference linking is the same as in the
input records. Users can use the matching key numbers to update
the reference file.
See Relationship Linker in the TS Quality Reference
Guide for detailed information on Reference Linking.

1.
Open the Relationship Linker step and select the Input

Settings tab.
2.
Name text boxes.
3.

- OR Click Replace. The default file names in the Input Data File

Tip:
You can either
edit the file names
manually or
click the File
Chooser
icon to browse to
the appropriate
file and select it.
To view the
contents of your
Data
Browser
icon.
Use the
Dictionary
Editor to
view the
contents of
the DDL
file.
10-25
4.
Click the Reference Match checkbox to enable reference

matching. When this checkbox is checked, the Reference
File and Reference DDL options become enabled.
5.
Specify the Reference File and Reference DDL.
6.
7.
Name text boxes.
8.
You may also specify a Linking File. A linking file indicates

which matched records are linked together with common
data. If you want to produce a linking file, specify the
Linking Data File and Linking DDL File.
9.

To specify a second output file

A second output file for the reference linking contains all records
from the reference file that had a matching record in the input
file.
1.
Click Advanced and navigate to Reference, Output

Settings.
2.
Specify Reference Output Data File and Reference

Output DDL File.

A File Qualifier is an unique name given to a data file. Each
1.
2.
3.
4.
Click Advanced and navigate to Reference, Input

Settings.
Specify Reference Input Data File Qualifier.
Click Advanced and navigate to Reference, Output
Settings.
Specify Reference Output Data File Qualifier.
See the Window Linking section for the optional input
and output settings.
Linking Your Data
10-26
Basic Settings
Basic Settings
The steps for settings of Match Method, Name Form Field, Field
Pattern, and Window Key are the same as those for window linking.
See Basic Settings on page 10-20 for details.
Specify Matching Numbers

If a match is determined, a matching key number is copied from the
reference record to the input record. You must specify the fields
where those matching numbers are stored.
Reference Level 1 Number - Identifies the field in the
reference file where existing level 1 numbers are stored. For
matched records at level1, this number in the reference file
is copied to the input file.
Reference Level 2 Number - Identifies the field in the
reference file where existing level 2 numbers are stored. For
matched records at level2, this number in the reference file
is copied to the input file.
Reference Record ID - Identifies the field where record ID
are stored. This value must be unique between reference file
and input file.
You must add these fields to the DDL file prior to
attempting reference linking.
To specify matching numbers
1.
Click Advanced and navigate to Process, Reference

Matching. Enter the Reference Level1 Number field.
2.
Enter the Reference Level2 Number field.
3.
Enter the Reference Record ID field.
To specify numbers when there is no match

If an input record does not match any record in the reference
file at the level 1, it will be assigned a number from the
Number Generation Start and Number Generation Cycle.
Specify Matching Numbers
10-27
1.

Matching.
2.
In Number Generation Start, enter a starting number for

unmatched new records, like 0
The starting number will be this value plus 1.
3.
In Number Generation Cycle, enter a text or numeric

string which will be added to the beginning of the Number
Generation Start value, as in NM. If you do not specify a
value, the default will be used. The default is YYDDD, where
YY is the last 2 digits of the year, and DDD is a number date
from 1/1).
You can also specify the following additional settings.

To match all reference records
You can control whether to identify all matches when a input
record matches more than one record in the reference file.
1.

Matching.
2.
To enable all matches, check the box next to Reference File

Match All. If this box is not checked, Relationship Linker
does not attempt to match any additional records on the
reference file after matching one record.
To specify maximum window size

You can control how many records are added to the match
window. If there are more records of one window key than the
value specified, additional windows are created for the
remaining records. For example, if you have 1000 records and
set this value to 500, additional Match windows are created for
the remaining records.
1.

Matching.
2.
In Maximum Window Size, specifies a numeric value.
Linking Your Data
10-28
Display Match/Suspect Pattern IDs
Display Match/Suspect Pattern IDs

If you want to display matched or suspect household/individual
pattern IDs in the output, you can specify the field to store those
IDs.
To specify fields for match/suspect pattern IDs
1.

Matching.
2.
In Reference Level1 Pass (Suspect) Pattern Field,

specify a DDL field where Level 1 pattern IDs are written out
for output.
3.
In Reference Level2 Pass (Suspect) Pattern Field,

specify a DDL field where Level 2 pattern IDs are written out
for output.
Additional Settings
For both Window Linking and Reference Linking, you can configure
these additional settings:
See
Relationship
Linker in the
TS Quality
Reference
Guide for the
complete
settings
information.
1.
2.
3.
name, or enter the name of the file which will receive
debugging information.

1.
2.

10-29

1.
2.

drop-down list.
To specify mask file

1.
2.
In the Mask File text box, enter the path and file name for
the mask file.

To run the Relationship Linker and view results
When you click
Run,
TS Quality
automatically
saves your
settings.
To save your
settings
without running
the step, click
Save.
1.
2.
Click Run button to run the Relationship Linker.

You can right-click on a step and select Run Selected.
3.
Select OK.
4.
On the Results tab, the Statistics sub-tab appears. Review

the statistics for the Relationship Linker on the sub-tab.
5.
Click the Results Analyzer button to view the record detail

for the linking process.
The Results Analyzer tool allows the user to view the
actual data and match results. We will explain this tool
in detail in the next chapter.
Linking Your Data
10-30
11-1
CHAPTER 11
Tuning the Linking Rules
11-2
The output of the Relationship Linking process is displayed in the
Relationship Linker Results Analyzer. This tool allows you to
view and analyze linked results. After viewing these results, you can
determine if there is a need to customize the rules of the link
process to meet your business requirements.
Use the Results Analyzer to view and analyze the results of
the Relationship Linker process
Use the Rule Editor to analyze the linking rules and add a
field to compare in the link process
Customize the field and pattern lists by adding fields and
patterns to the process
Re-run the Relationship Linker using the new linking rules
and view results
Use the Data Comparison Calculator to test the comparison
routine and appropriate score
Using the Relationship Linker Results Analyzer
11-3
Using the Relationship Linker Results

Analyzer
The Relationship Linker Results Analyzer displays linked
records in a spreadsheet format. Once you have run the
Relationship Linker, you can display the match results using this
tool. You can browse the matched results and examine the data to
see how records were initially matched. You can then decide if it is
necessary to change the business rules to meet your requirements.
View the Linking Results

To start the Results Analyzer
1.
Open the Relationship Linker step, and click the Results

Analyzer button.
Results
Analyzer
button
Figure 11.1 Launch Results Analyzer

2.
The Relationship Linker Results Analyzer opens.

11-4

In the Results Analyzer, each column is titled with either a field
from one of the match comparison list files or a key field from the
DDL file. The individual record data is displayed in the horizontal
row. View linked data by clicking the appropriate tab.
Click on
the tab and
view data
Matched
records
Match key
Figure 11.2 Relationship Linker Results View

Linked records are grouped together by color, alternating between
blue and white. If a record is by itself and not above or below
another record of the same color, then that record did not match
any other record.
If all of the records are business records, as in the
example above, no Consumer_Lev1 or Consumer_Lev2
records are displayed.
Records are displayed based on a specific key, depending on the
currently-selected tab. For example, if you are on the Business_
Lev1 tab, looking at Business_Lev1 matches, then matches will be
11-5
displayed based on the lev1_matched field.

When viewing the relationship linking results in the Results
Analyzer, you can see Matched records as well as Suspect
records:
The field with
the match key
will be in bold,
and highlighted
in red, to show
that this key is
being used to
show matches.
Matched Displays data with exact matches between

records. All records met the requirements for pass patterns
(patterns that begin with P).
Suspect Lists data with the most likely matches between
records. All records met the requirements for suspect
patterns (patterns that begin with S).
To view matched and suspect records
1.
Matched records at the Consumer/Business Level_1 and

Level_1 are displayed by default.
2.
To view Suspect records at the Consumer/Business

Level_1, click the Suspect radio button. To execute this
view, click the red exclamation point
.
Check this button to view

suspect matches
Click this button to execute

the selected view
Figure 11.3 Switch Matched and Suspect Records

3.
When you view suspect matches, the field for the matched
level is highlighted in red and italicized, in addition to the
field that contains the match key (highlighted in bold and in
11-6

red). This shows how the matches reflect in a suspect level
versus those in a matched level.
Figure 11.4 Suspect View with Matched Level

4.
To return to the Matched record view at the Consumer/

Business Level_1, click the Matched radio button. To
execute this view, click the red exclamation point.
5.
If you want to review the Suspect records for Level_2,

select the Suspect radio button next to Level_2. To
Edit Fields to Display
11-7
execute this view click on the red exclamation point, and

then select the Business_Lev2 tab.
Figure 11.5 Business Level View
Edit Fields to Display

In the Results Analyzer view, you can select and delete fields to
display.
To select and delete fields to display
If you select
Show
Standard
Fields in the
Format menu,
it displays only
the standard
DDL fields.
1.
Select Tools, Browse More Fields.
2.
The left window shows all Available Fields. Any field can be
highlighted and dragged into the Selected Fields window. A
field can also be highlighted and moved by clicking Add. If
you want to move all fields, click Add All.
3.
Click Show. Every field that is shown in the window will

appear as a column in the main viewer.
4.
To delete fields from the display, select those fields in the

Selected Fields window.
11-8
Save Fields to Display

5.
Click Delete, then Show, to update the display.
Figure 11.6 Select Fields to Display

6.
To search for a field, enter the field name in the Search text
box. Click Show.
Save Fields to Display

You can also save a view of fields in this window. If you frequently
look at the same fields in a file, saving a view can save time.
To save a view of fields
1.
2.
In the Browse More Fields window, select fields to display.

Click the Save button.
View Records in a Range

3.
11-9
In the Save window, name the view, and then identify the
desired location for the file.
Figure 11.7 Save Fields to Display

4.
To view a stored view, select the name of the view from the
drop-down menu in Select a Selected Customized View.
The fields will be loaded in the Selected Fields window.
Click Show to view the stored fields.
You can use Back
and Forward
to display the previous or next view.
on the Tools menu

If your output file is very large, it is a good idea to search smaller
subsets of your file. This will make the program run more quickly.
You can select a range of records within the file to view.
To view records in a range
1.
Enter a starting record number in the Browse Records

From text box and an ending number in the To text box.
11-10

2.
Click Go. Only records in the specified range will be

displayed.
Figure 11.8 View Records in a Range

You can also use the Previous Block
and Next Block
buttons to browse the data. The program browses in
blocks, based on the entered range.
For example, if you entered a range of Record 610, then
click Next Block, the program displays Record 1115. If you
click Previous Block, the program displays Record 15.
There are also Previous Block and Next Block buttons
displayed at the top and bottom of the vertical scroll bar on
the right.
If you notice
breaks in the
record number
sequence, it is
because each
record is either
a Consumer or
Business level
record.
Previous Block
Next Block
To view records by group size or pattern ID

You can view records in a group, either by group size or pattern
ID.
1.
Enter values in the Minimum Number of Members and/or

Pattern Number text boxes.
Figure 11.9 Record Group Size or Pattern ID

2.
Only matched groups that correspond to that value will be

displayed. For example, if you enter 2 for the Minimum
Number of Members and 100 for the Pattern Number, only
11-11
matches in groups of two or more with the pattern number

100 will be displayed.
Matches in
groups of 2 or
more
Pattern number
Figure 11.10 View Records by Pattern Number

For more detailed information about Result Analyzer, see
the Online Help.
11-12
Using the Relationship Linker Rule Editor
Using the Relationship Linker Rule Editor

Once you have reviewed the data, you may want to add or change a
field and a pattern in the match rules to meet your business
requirements. For example, any records at the Contact level
(Business_Lev2) that have the same Last_name and Account_
number field should be positively matched together. You can use
the Relationship Linker Rule Editor to change the match rules
to achieve that goal.
The field and pattern list files used in the Relationship Linker
process are displayed in the Relationship Linker Rule Editor.
View the Linking Rules

To start the Rule Editor
1.
From the Relationship Linker step, click on the Rules

Editor button on the bottom left.
Rules
Editor
button
Figure 11.11 Launch Relationship Linker Rule Editor

2.
The Relationship Linker Rules Editor opens.
11-13
To view the Linking Rules

1.
When you open an existing field and existing pattern files,

the Field List Editor (upper pane) and Grade Pattern
Editor (lower pane) open automatically.
2.
Select Tile Horizontally or Tile Vertically from the

Window menu to view both field and pattern lists. You can
view the Consumer or Business, Level 1 or Level 2 by
clicking on the appropriate tab.
Click on the
tab and view
different levels
of field and
patterns.
Click on a
column
heading and
drag it to the
desired
location to
rearrange the
columns.
Figure 11.12 Relationship Linker Rule Editor
11-14

The following table contains a list of columns in the Field List
Editor window and Grade Pattern Editor window.
Column
Description
Field List Editor

Description
Describe all fields in the field settings file. Double-click the cell to
edit it.
Score A - E
Specify up to 5 grade thresholds. For example, the first score is

the threshold for grade A and the second score is the threshold for
grade B. A through D must be positive; E can be positive or
negative.
Comparison
Routine
The Linker calls this routine to perform the field comparison.

Double-click the cell, select the desired routine from the list, and
click OK.
Propagation
Routine
The Linker calls this routine to perform the comparison

propagation for this field. Double-click the cell and select a routine
Field Name 1
-3
Specify up to three fields for linking. Double-click the cell to open

the field name list and double-click the desired field name.
Routine
Modifier
Specify a value passed to a comparison routine. Each routine uses

a different number of modifiers; some use none. Double-click the
cell to open the list and double click on a modifier.
Grade Pattern Editor

Category
Lists the pattern category: P(Pass), F (Fail), or S (Suspect). Click

inside the cell and select the pattern from the drop-down menu.
Pattern ID
The pattern ID is a number ranging from 0 to 999. No duplicates

are allowed.
Customize the Field and Pattern Lists

Field Name
Columns
11-15
The remaining column headings take their names from the

description column in the Field List Editor window. The valid
grades are A, B, C, D, and E. The hyphen (-) represents a
wildcard character. Click inside the cell and select the grade from
the drop-down menu.

The Field List contains the fields which are compared in the link
process. The Pattern List contains patterns used to determine the
degree of similarity between records.
You can customize the linking process by adding fields and/or
patterns to the process. For example, you can add a field and a
pattern to the link rules so that any records at the Contact
level(Business_Lev2) with the same Last_name and Account_
number field will be positively matched together.
To add a field to the field list
1.
In the Field List Editor click last_name in the Description

column.
2.
Select Edit, Insert After Selected Row to add a row for a

new field.
If you insert a new row in the Field List Editor, a new
column is automatically inserted into the Grade
Pattern Editor. Conversely, if you delete a row in the
Field List Editor, the corresponding column in the
Grade Pattern Editor is also deleted.
3.
Double-click the Description column and add a description

of account_number for this row.
4.
Double-click the Score A column and add 100. This means

that you want to compare the Account_numbers for two
records and they must match at 100%.
11-16

5.
Double-click in the Comparison Routine column and select

partial1. partial1 is the routine used to compare the actual
field data in the Account_number fields.
6.
Double-click the Field Name 1 column and select Account_

number as the field for comparison.
Figure 11.13 Account_number Field Added

To add a pattern to the pattern list
1.
In the Grade Pattern Editor click Pattern ID 128 and

select Edit, Insert Before Selected Row. The new pattern
row is added.
2.
In the Category column select a P for a positive match

pattern (Pass). In the Pattern ID column give the pattern
the number 400 as this is very different from the other
patterns in the list.
The Pattern IDs 128 and 400 have no special meaning.
They are used here as examples only.

3.
11-17
Select an A for the grade for the last_name field and for
the grade for the account_number. The grade A means
Score A (100) for those fields.
Figure 11.14 Pattern ID 400 Added

4.
Select File, Save to save the file. When asked Do you want
to continue? select Yes. When asked Do you want to
delete subsequent duplicate patterns? select No.
5.
Close the Relationship Linker Rule Editor.
6.
Close the Results Analyzer.
7.
See Checking Errors in the Field and Pattern Lists on page

11-18 to verify any errors in the changes you have made.
11-18
Checking Errors in the Field and Pattern Lists
Checking Errors in the Field and Pattern Lists

If you have made changes to the field and/or pattern file, make
sure to run the Error Report. The program displays a message if it
discovers a problem in the file, such as missing routines or duplicate
pattern IDs. For example, the following grade pattern file has a
duplicated pattern ID:
Pattern ID
102 is
duplicated
Figure 11.15 Duplicate Pattern ID

To check errors in the field and pattern lists
1.
After the changes have been made, select Error Report

from the Tools menu. If an error is found, you will receive an
error message.
Figure 11.16 Error Message for Single Error
Re-Run the Relationship Linker and View Results

2.
11-19
The message prompts you to continue. If you click Yes, the

error checking continues. You may see another error such as
the one below.
Figure 11.17 Error Message for Additional Errors

3.
This message tells you that some grade patterns are

duplicates. Click Yes to remove all duplicates. Click No to
leave the duplicates in the file.
4.
Once you have deleted these duplicate patterns, a message

appears confirming the deletion.
5.
Select Save from the File menu to save the file.
For more detailed information about the Relationship

Linker Rule Editor, see the Online Help.
Re-Run the Relationship Linker and View

Results
Once a change is made to the field or pattern list, the Relationship
Linker process must be re-run. At this time, the Relationship Linker
will use the new linking rules you defined.
To run the Relationship Linker with the new rules
1.
Open the Relationship Linker step and click Run.

You can right click on a step and select Run Selected.
This is an alternate way to run the step.
11-20
Re-Run the Relationship Linker and View Results

2.
Click Results Analyzer to view the new results.
3.
Click the Business_Lev2 tab to view the results of the new

contact matching.
4.
In the lower left corner, type 400 in the Pattern Number

box and click OK. This will show only records that were
matched using Pattern ID 400. Review the records. Notice
that this new field and pattern were able to link records that
use nicknames in the first name field.
Pattern Number
box
Figure 11.18 New Matching Results
Using the Data Comparison Calculator
11-21

The Data Comparison Calculator can help you determine the
correct comparison routine and appropriate score for fields that you
add to the match process. For example, you can test the difference
between the ABSOLUTE and PARTIAL1 comparison routines and
decide which routine you want to use.
The steps for testing the routines are the same for most of the
comparison routines; the exceptions are SUBSTRING, DATE,
ARRAY1, ARRAY2 and MXDNAME. This section shows the general
steps for using these routines.
See the TS Quality Reference Guide, Appendix C for a
detailed description of Relationship Linker routines and
their associated scoring values.
To perform a comparison test
Check the
Match Case
box if you are
performing a
case-sensitive
comparison.
1.
From the Relationship Linker Results Analyzer select Tools,

Invoke Data Comparison Calculator.
2. The Data Comparison Calculator will open.

1. Enter a value for the first field in the Record 1, Field 1
text box, and then enter a value for the second field in the
Record 2, Field 1 text box.
2. Highlight a routine in the Comparison Routines list. If the
routine uses modifiers, they will appear in the Routine
Modifiers box. Select a modifier from the list or highlight
(none) (default).
3. Click Comparison. The score appears in the Score box.
Example
In this example, two values in the Account_number field are
compared using the ABSOLUTE and PARTIAL1 routines.
ABSOLUTE compares two fields and looks for an exact match.
Score 100 is an exact match, including blank vs. blank. PARTIAL1
compares two fields and looks for an exact match, but applies
different scores for blanks. Score 100 is an exact match excluding
11-22

blank vs. blank, 75 is blank field vs. non-blank field, and 65 is blank
field vs. blank field.
To run the ABSOLUTE and PARTIAL1 comparison routines
1.
Type a sample Account_number into the Record 1 and

Record 2 Field 1 boxes. Select the PARTIAL1 Comparison
Routine from the Comparison Routines list.
Figure 11.19 Data Comparison Calculator

2.
Click Compare. The score is 100. Change the Comparison

Routine to ABSOLUTE and click Compare. The score is
again 100.
3.
Clear the Record 1 and Record 2 Field 1 boxes. Now click

Compare. The score is again 100. Change the Comparison
Routine to PARTIAL1. The score of a blank field to a blank
field using PARTIAL1 is 65. This is an important distinction.
We did not want two records with blank Account_number
11-23
fields to positive match together.

For more detailed information on the Data Comparison
Calculator, see the Online Help.
11-24
12-1
CHAPTER 12
Selecting the Best Record
12-2
The Create Common Utility lets you select the best record of a
matched set of records (called the survivor), and then copies that
record to a field in another record, across a matched set of records.
This selection process is defined by decision routines. You can
commonize data in the current field or in a new field, using data
records that originate in another field.
Understand commonization and survivorship
Determine match key level settings
Identify common fields
Assign a survivor record
Run Create Common and view its results
Use the Data Browser to view the actual record data
Using the Create Common Utility
12-3
Using the Create Common Utility

You can use up
to ten levels of
output data
from the
Relationship
Linker.
The Create Common Utility allows you to set options that copy
data across a linked record set. This module has two major
functions:
CommonizationCopy data in one field to other fields in
records linked by a match key. You can commonize data in
an existing field or in a new field. You can also commonize
data sourced from another field.
SurvivorshipSelect a user-defined survivor record
among a group of records, using survivor selection rules.
This function flags a single record at any level, indicating the
best record of the linked set.
Input data file must be sorted by match keys (such as
LEV1_MATCHED) prior to being processed by this module.
If you run this module right after the Relationship Linker
step, the input file is automatically sorted by the match
keys. If you run this module separately, be sure to sort
the input file by match keys.
Example
Assume that the best record is determined according to the most
recent date in the Last_contact_date field. In this example, you
want to copy the account representative information with the most
recent contact date to the set of linked records, and then identify
one account representative per business.
Commonize the account representative from the record that
has the most recent Last_contact_date field.
Once the data is copied, place an indicator of 1 into the
Survivor_flag field for the record that has the most recent
Last_contact_date.
This indicator will be used later to select the best records
from the file.
12-4

The Create Common Utility uses the output from the Relationship
Linker step as input to this step.
Tip:
You can either
edit the file names
manually or click
the File Chooser
icon to
browse
for and
select the file.
To view the
contents of your
data file,
click the
Data
Browser
icon.
Use the
Dictionary
Editor to view the
contents
of the
DDL file.

1.
Open the Create Common step, and select the Input

Settings tab.
2.
Name text boxes.
3.
4.
Specify the Output File Name and Output DDL Name.
5.
Specify a file name in the Statistics File Name and

Process Log Name text boxes.

qualifier.
1.
2.
3.
4.

To specify maximum array records
You can specify the number of records held in memory for the
Match Key Level 1 setting (The Match Key Level settings are
described later in this chapter). The default is 10000.
1.
2.

Enter a numeric value in Maximum Array Records.
12-5

1.
2.

specifies the mamimum number of records to process. By

1.
2.


Valid delimiters
are Tab, Space,
Semicolon,
Comma, and
Pipe. Characters
other than those
listed must be
enclosed by
quotation marks.
1.
2.

3.

Settings.
4.

instructions.
12-6
Process Settings
Process Settings
settings to process your data. The settings for processing are
Match Key Level Settings

Match Key Level settings specify the field that holds the match key
used to group records for evaluation. For example, business records
that were matched together usually have the same LEV1_
MATCHED number. Only records in the same group will be
compared and evaluated.
To specify match key level setting
1.
Click Advanced and navigate to Output, Level Settings.
2.
In Key Field, select the match key from the drop-down list
of DDL fields.
Figure 12.1 Match Key Level Settings
Common Fields
The Common Fields designate the decision routines used to copy
data from one field into other fields in the records linked by a
common key.
A red flag
indicates a
REQUIRED
field for this
operation.
To specify common field

1.
Navigate to Output, Common Fields.
Common Fields
2.
12-7
Specify values for the following settings:
Setting
Description
Level ID
Numeric value that specifies the level of

commonization, for up to 10 levels of data
hierarchy. For example: Level 1=business,
Level 2=location, Level 3=contact, and so on.
Test Field
Field that contains information necessary to

commonize data across records. Works in
conjunction with the decision routines.
Decision Routine Encoding Type of encoding used by the Decision Routine.

Decision Routine
Defines what and how data is processed.
From Field
Field that contains data which is modified or

moved to the Target Field.
Target Field
Field used to store data from the source field

based on the decision routine.
Example
This example uses a decision routine called HIGHCHAR_NBNZ.
The HIGHCHAR_NBNZ routine commonizes the highest value
(non-blank, non-zero) that occurs in the Last_contact_
date field of all records at a record level of 1.
It copies the values in the Acct_rep field with the most
recent Last_contact_date (HIGHCHAR_NBNZ) and puts
this value into the Common_rep field.
Figure 12.2 Common Fields Settings

Record 1 contains the highest value in the Last_contact_
date field. The data in Acct_rep in Record 1 is commonized
into the Common_rep field:
12-8
Survivor Record
Input
LEV1_MATCHED
Last_contact_
date
Acct_rep
Output
Common_rep
Record 1
00000013
2005-03-17
JLS
Record 1
JLS
Record 2
00000013
2003-01-07
BPL
Record 2
JLS
Record 3
00000013
2004-02-08
JCN
Record 3
JLS
Record 4
00000014
2005-01-18
KJP
Record 4
KJP
Record 5
00000014
2003-11-09
MMR
Record 5
KJP
Survivor Record
You can designate a survivor record from a group of records linked
by a match key. Any record flagged as the survivor is assigned a
flag number. The Assign Survivor function defines the test field,
decision routine and target field for survivor identification.
A red flag
indicates a
REQUIRED
field for this
operation.
To assign survivor
1.
Navigate to Output, Assign Survivor.
2.
Specify values for the following settings:
Setting
Description
Level ID
Numeric value that specifies the level of

commonization, for up to 10 levels of data
hierarchy. For example: Level 1=business,
Level 2=location, Level 3=contact, and so on.
Test Field
Field that contains information necessary to

commonize data across records. Works in
conjunction with the decision routines.
Decision Routine
Specifies which decision routine to use for the

survivorship function.
Decision Routine Encoding Type of encoding used by the Decision Routine.

Target Field
Field used to store data from the source field

when the create common rule is satisfied.
Survivor Record
12-9
Setting
Description
Assigned Value
Numeric value that is assigned to the survivor

record.
Example
This example uses a decision routine called HIGHCHAR_NBNZ.
Assume that the best record is the one with the most recent date
(HIGHCHAR_NBNZ) in the Last_contact_date field. This record
needs a survivor flag of 1 in the Survivor_flag field to identify it
as the best record for the LEV1_MATCHED grouping.
Figure 12.3 Survivor Settings

The HIGHCHAR_NBNZ routine looks for the highest
character value (non-blank, non-zero) that occurs in the
Last_contact_date field of all records at a record level of 1.
In this case, Records 1 and 4 contain the highest date
contact, so the program takes those records as survivor. As a
result, the Survivor_flag field is flagged with a 1.
Input
LEV1_
MATCHED
Last_
contact_date
Acct_rep
Output
Common_
rep
Surviror_
flag
Record 1
00000013
2005-03-17
JLS
Record 1
JLS
Record 2
00000013
2003-01-07
BPL
Record 2
JLS
Record 3
00000013
2004-02-08
JCN
Record 3
JLS
Record 4
00000014
2005-01-18
KJP
Record 4
KJP
Record 5
00000014
2003-11-09
MMR
Record 5
KJP
12-10
Additional Settings
Additional Settings
See Create
Common in
the TS Quality
Reference
Guide for
complete
settings
information.

1.
2.
3.
name, or specify a different file to receive debugging
information.
To count the number of records processed

1.
2.


1.
2.

drop-down list.
Run the Create Common and View Results
12-11
Run the Create Common and View Results

To run the Create Common and view results
When you click
Run,
TS Quality
automatically
saves your
settings.
To save your
settings
without running
the step, click
Save.
1.
2.
Click Run to run the Create Common Utility.

Selected.
3.
Select OK.
4.
5.

Browser button next to the Output File Name.
6.
In the Field Selection window, select the fields you used for
the Create Common process, such as LEV1_MATCHED, Acct_
rep, Last_contact_date, Survivor_flag, and Common_rep.
7.
Click Display to see the records.
8.
Notice that for one Business household, the Active record

with the most recent Last_contact_date has a 1 in the
Survivor_flag field. All records in a Business Household
have the Acct_rep from the record with the most recent
Last_contact_date copied into the Common_rep field.
Figure 12.4 Create Common Results Displayed
12-12
Create Common Decision Routines

Decision Routines are the program rules and instructions used in
the Create Common Utility. They control two functions:
How data is searched for and how commonization will
function within the program
How records will be set up for survivorship
Routines marked For commonization only cant be used
to determine a surviving record.
Table 12-1: Create Common Decision Routines

Decision Routine
Description
LOWEST
Lowest numeric value for selected data field
LOWEST_NB
Lowest non-blank numeric value for selected data field
LOWEST_NZ
Lowest non-zero numeric value for selected data field
LOWEST_NBNZ
Lowest non-blank/non-zero numeric value for selected data field
HIGHEST
Highest numeric value for selected data field
HIGHEST_NB
Highest non-blank numeric value for selected data field
HIGHEST_NZ
Highest non-zero numeric value for selected data field
HIGHEST_NBNZ
Highest non-blank/non-zero numeric value for selected data field
LOWCHAR
Lowest character value for selected data field
LOWCHAR_NB
Lowest non-blank character value for selected data field
LOWCHAR_NZ
Lowest non-zero character value for selected data field
LOWCHAR_NBNZ
Lowest non-blank/non-zero character value for selected data field
HIGHCHAR
Highest character value for selected data field
HIGHCHAR_NB
Highest non-bank character value for selected data field
HIGHCHAR_NZ
Highest non-zero character value for selected data field
12-13

Decision Routine
Description
HIGHCHAR_NBNZ
Highest non-blank/non-zero character value for selected data field
LEAST
least occurring value for selected field
LEAST_NB
Least occurring non-blank value for selected field
LEAST_NZ
Least occurring non-zero value for selected field
LEAST_NBNZ
Least occurring non-blank/non-zero value for selected field
LITERAL
The specified value of a Selected Data Field. Value is in

parentheses: (For Commonization Only). This example searches for
the literal value 978-436-8900:
LITERAL (978-436-8900)
The literal value must be the same length as the test field. If spaces
are required in the literal string, the entire LITERAL decision routine
must be enclosed in quotes. In the line below, the literal value 978436-8900 is preceded by four blanks, so the entire routine must be
enclosed in quotes.
LITERAL (
978-436-8900)
LONGEST
Compares the length of the test field data on one record against the
length of the data in the same field on another record. System
commonizes the longer of the two fields.
Field1 = Smith
Field2 = Smit
In this case, the contents of test field, Smith (the longer of the
two) is commonized.
MOST
Most occurring value for selected data field
MOST_NB
Most occurring non-blank value for selected data field
MOST_NZ
Most occurring non-zero value for selected data field
MOST_NBNZ
Most occurring non-blank/non-zero value for selected data field
SHORTEST
Compares the length of the test field data on one record against the
length of the data in the same field on another record. System
commonizes the shorter of the two fields.
Test field = Smith
Test field = Smit
In this case, Smit (the shorter of the two) is commonized.
12-14
Decision Routine Selections for a Single Field

Decision Routine
Description
SURVIVOR
Survivor Value Found in List (For Commonization Only)

In the examples below, we will consider 10 records, and how the
content of those records applies to ten different decision routines.
Record #
Field Contents
Record 1
123
Record 2
123
Record 3
456
Record 4
___
Record 5
___
Record 6
___
Record 7
000
Record 8
000
Record 9
000
Record 10
000
The following table shows sample decision routine results:
Routine
Searches for the
To commonize field (Records)
HIGHEST
Highest numeric value
456 (Record 3)
LOWEST
Lowest numeric value
___ (Records 4, 5 and 6)
LOWEST_NB
Lowest, non-blank numeric value
000 (Records 7-10)
LOWEST_NZ
Lowest, non-zero numeric value
LOWEST_NBNZ
Lowest, non-blank, non-zero

numeric value
123 (Records 1 and 2)
LEAST
Least occurring value
456 (Record 3)
MOST
Most occurring value
000 (Records 7-10)
12-15
Routine
Searches for the
To commonize field (Records)
MOST_NZ
Most occurring non-zero value
MOST_NBNZ
Most occurring non-blank, nonzero value
123 (Records 1 and 2)
12-16
13-1
CHAPTER 13
Manipulating Your Data
13-2
In some cases you may want to manipulate and reconstruct data
elements at certain stages of data processing. Use the Data
Reconstructor to manage various data manipulation tasks. The
Data Reconstructor is particularly useful when global data needs to
be standardized into an identical format at the end of a project.
This chapter explains how to use the Data Reconstructor. You will
perform these tasks:
Specify input, output, and DDL files for the Data
Reconstruction step
Define specific Data Reconstruction rules for each country
Set the Use Rule
Run Data Reconstruction and view results
Use the Data Browser to view the reconstructed data
Generate a single file of all your global data
Using the Data Reconstructor
13-3
Using the Data Reconstructor

The Data Reconstructor is a flexible, rule-based data
reconstruction program. It features a rich scripting language with
conditional IF/ELSE capabilities and text manipulation. This
scripting feature enables you to apply rule-based logic at any point
in a job stream or real-time process.
The Data Reconstructor reconstructs addresses from a combination
of data, elements, and postal matcher output fields. Reconstruction
rules can be used to create an input file for a database or to create
delivery address fields with specific size constraints.
Rules File
The Rules file is a plain text file that contains data reconstruction
rules, which are constructed with a special scripting language.
Country-specific rules files are included in the installation package.
These rules use nested IF/ELSE logic that includes selection and
conditional data reconstruction features.
A rules file can contain a single rule or many rules; however, only
one rule can be executed at a time.
Default Rules Files:
C:\TrilliumSoftware\tsq10r5s\<project name>
settings\xxdrrules.sto
xx is a 2-digit country code such as ca, de, gb, or us.
13-4
Rule Script Language

A sample usdrrules.sto file might look like this:
Rule
rule label_line
Rule Name
Keyword
#---------------------------------------#
# Output Alignment Section
#---------------------------------------#
if(out.NEWADDRL4(1:5) = "
") then
move out.NEWADDRL5, NEWADDRL4;
move " "
, NEWADDRL5;
endif;
Endrule
endrule
Keyword

The Data Reconstructor provides a rich script language to use when
writing data reconstruction rules. You can combine existing data
elements and literal values to create new data elements, based on
markers you find within the record (such as Parser and Postal
Matcher type fields and flag fields). You can use conditional logic to
accommodate special factors when reconstructing your data. Rules
can be either simple or complex, depending on your business,
country, and language requirements.
Fields
Fields are used in the script language to reference input or output

data fields (defined in the DDL files) and literal values. When used
to refer to a data field, the field-name must exactly match the
spelling and case of the name in the corresponding
DDL file.
Syntax
n.
out.
IN.
OUT.
Literal
Values
[n:n]
field name
[n:*]
(n:*)
literal value
(n:n)
OR
literal value
BLANKS
ZEROS
NULLS
Literal values are string constants that consist of any combination of

characters enclosed within either double-quotation marks () or
Data Reconstructor Rules
13-5
single-quotation marks (').

'TS Quality'
"Mary said "you can quote me!""

or
or
TS Quality'
'This is what Mary's friend said'
A literal string must begin and end with the same type of
quotation mark.
If you need to include an actual quote character in the string, you
can either enter it twice in a row or quote the entire string with the
other quote character.
Although there is no practical limitation to the length of a
literal value, this version of the Data Reconstructor limits
the total combined size of all literals to 100 KBytes.
Data Reconstructor Rules

Reserved Words
The following words are reserved words; they have special meaning
and cannot be used except for their intended purpose:
alphabetic
alphanumeric
append:0spaces
copy_all
is
append:pack
append:2spaces
left_justify
left_justify:full
NULLS
numeric
perform
proper_case
proper_case:a
else
proper_case:A
proper_case:anyline
proper_case:g
lower_case
proper_case:G
proper_case:geography
proper_case:n
then
proper_case:N
proper_case:name
proper_case:S
endif
proper_case:s
proper_case:street
right_justify:full
LT
AND
ends_with
OR
title_case
and
EQ
or
endrule
append
append_pack
GE
OUT
move
GT
Out
upper_case
BLANKS
if
pack
NE
CONTAINS
13-6
Precedence
IN
right_justify
ZEROS
contains
in
rule
copy
LE
STARTS_WITH
ENDS_WITH
Precedence
Precedence controls which operators are executed first in an
expression. Operators are grouped into the following levels
(from highest to lowest):
Operator type
Keyword or symbol
Relational operators
GT, LT, GE, LE, < >, <=, >=
Equity operators
EQ, NE, =, !=, <>, ==
String operators
Contains starts_with, ends_with

CONTAINS STARTS_WITH, ENDS_WITH, IS
Logical AND operator
And, and, &, &
Logical OR operator
OR, or, ||
Example
In the following expression, relational operations are performed first
(== >= and <), followed by the logical AND operation, and finally
the logical OR operation:
if(state == "CA" or zip_code >= 10000 AND zip_code < 20000)
//statement(s);
endif;
endif;
endrule
Associativity
Associativity controls how operators at the same precedence level
are grouped. All operations have left-to-right associativity.
Examples
The following expressions perform the same action:
Associativity
13-7
Example 1
if(prov == "NL" OR prov == NL OR prov ==

NB)
move Atlantic, out.region;
PE OR prov =
Example 2
if(((prov == "NL" OR prov == NS) OR prov ==

= NB)
move Atlantic, out.region;
Comments
PE) OR prov
The Data Reconstructor recognizes three styles of comments:
C Style
Begin with /*, end with */ and include all characters in between.
Comments can span multiple lines.
/*
#... Example of C style comments.
*/
Only C style comments can be embedded in the middle of a line.
C++ Style
Begin with // and extend to the end of the line. If multi-line comments
are required, the comment portion of each line must begin with //.
//
//... This is an example of C++ style comments.
//
Shell Style
Similar to C++ style comments except that # is used instead of //.

Comments begin with # and extend to the end of the line.
#
#... This is an example of shell style comments.
#
13-8
Associativity
Input or
Output
Dictionary?
By default, the source_field in an action statement and the first

field in an IF condition are assumed to be input fields (as defined in
the input dictionary). Also, the destination_field in an action
statement and the second field in an IF condition are assumed to be
output fields (as defined in the output dictionary).
It is possible to override these assumptions in the following ways:
Prefix an input field with in. or IN.
Prefix an output field with out. or OUT.
Example
move out.newline2, OUT.newline1;
You may declare your fields explicitly as input or output by always
including the IN. or OUT. prefix (this will also improve your scripts
readability):
if(in.gout_fail_level != "0") then
move in.line1, out.line1;
endif;
Selecting a
Portion of a
Field
The language has a built-in substring capability that allows you to

select a portion of an input or output field by specifying a position
and length after the field as [n:n].
The first n is the beginning position of the substring. The
first character in a field is considered to be in position 1.
The second n is the length of the substring. Length can
be specified as * to indicate the remainder of the field.
For example, each of these statements does the same thing:
move "CANADA, OUT.newline4;
move "CANADA, OUT.newline4[1:*];
Associativity
13-9
Substring notation can only be used with DDL fields and cannot be
used with literal values. For example, each of these statements will
generate an error message:
move BLANKS[1:10] , OUT.newline1; // will generate an error
move "CANADA"[2:*], OUT.newline2; // will generate an error
Binary
Data
Strings
A binary string constant can be either octal or hexadecimal.
Concatenat
ing Literal
Values
Literal values can be joined together using a plus sign (+) as an

operator. This can be useful when you need to create a very long
literal string or to make your scripts easier to understand.
Hexadecimal the first quote character must be preceded

immediately by an upper or lower case x and each character
is represented by its equivalent two-digit hexadecimal value
(range 00 - FF). (A special case is made for x"CR" (carriage
return) which is considered equivalent to x"0D" and "LF" (line
feed), which is considered equivalent to x"0A". For example:
X'5368656C646F6E' or x"CRLF".)
Octal the first quote character must be preceded
immediately by an upper or lower case o and each character
is represented by its equivalent three-digit octal value (range
000 - 377). For example: O"110141162164154151156147" or
o'015012'.
Example
move
+
+
move
BLANKS,
ZEROS and
NULLS
"----------------------------------"
"-------------------------------------"
"--------------------------------", dashed_line_120ch;
'Network Pathways Inc., ' +
'Suite 100-401, '
+
'1600 Bedford Hwy, '
+
'Bedford, NS, '
+
'Canada B4A 1E8'
, return_address;
The BLANKS, ZEROS and NULLS keywords can be used to set a field
entirely to blanks, zeros or binary-zeros. They can also test to see if
a field contains only blanks, zeros or binary zeros.
13-10
Associativity
Whenever these keywords are used, a literal value is created
dynamically with exactly the right number of blanks, zeros or
NULLS to match the size of the other fields used in the expression.
If, for some reason, all fields in an expression are BLANKS, ZEROS
or NULLS keywords, the length of the resulting literal values will be
one.
Examples
In this example, all fields used within the IF-conditions are one byte
long:
If(BLANKS == BLANKS) then
// always true
endif;
If(BLANKS == ZEROS) then
// always false
endif;
In this example, the length of the BLANKS literal will be ten bytes to
match the 10-byte substring selected from the 30-byte city field
using the city[2:10] notation.
If(city[2:10] == BLANKS) then
// characters 2 through 11 of city are blank
endif;
IF
Statements
These statements allow you to add conditional logic in your scripts

to choose between two or more options. For instance, you can
choose to build an output address from Postal Matcher fields or from
original input data based on Parser and Postal Matcher flags.
Syntax
IF [condition [and/or/AND/OR] condition]
[action statement]
[else action statement;]
ENDIF;
IF statements consist of three parts:
THEN
Associativity
13-11
the condition(s) to be evaluated

the action_statement(s) to execute when the condition(s)
are TRUE
the action_statement(s) to execute when the condition(s)
are FALSE.
When conditions evaluate as TRUE, the action_statement(s)
following the conditions are executed; otherwise, the action_
statement(s) that follow the else keyword are executed. Conditions
must be enclosed in round brackets. The then keyword is optional
and can be omitted, or included to improve readability.
When two fields of unequal lengths are compared, the
comparison is made as if the shorter of the two fields
was padded with blanks to match the length of the larger
field.
Example
If the field urban_city_name was 20 bytes long, the following two

conditions would be the same:
if(urban_city_name == "BOSTON")
if(urban_city_name == "BOSTON
")
Conditions
The program conditions include four relational conditions, two

equality conditions and six string conditions as shown in table A-15:
Table 13.1 Data Reconstructor Rules Conditions
Relational Conditions
Description
field1 GT field2
field1 > field2
Greater Than True if field1 is greater than field2
field1 GE field2
field1 >= field2
Greater Than Or Equal To

True if field1 is greater than field2 or field1 is equal to field2
field1 LT field2
field1 < field2
Less Than
True if field1 is less than field2
field1 LE field2
field1 <= field2
Less Than Or Equal To

True if field1 is less than field2 or field1 is equal to field2
13-12
Associativity
Table 13.1 Data Reconstructor Rules Conditions
Equality Conditions
Description
field1 EQ field2
field1 == field2
field1 = field2
Equal To
True if field1 is equal to field2
field1 NE field2
field1 != field2
field1 <> field2
Not Equal To
True if field1 is not equal to field2
String Conditions
Description
field1 is numeric
String is Numeric True if field1 contains only numerics.

Leading and trailing blanks are trimmed from the field before
making the comparison. The IS_DIGIT_FNAME table
indicates the alphabetic characters.
field1 is alphabetic
String is Alphabetic True if field1 contains only alphas.

the comparison is made. The IS_ALPHA_FNAME table
indicates the alphabetic characters.
field1 is alphanumeric
String is Alphanumeric True if field1 contains only alphas

or numerics.
the comparison. IS_ALPHA_FNAME and IS_DIGIT_
FNAME specify alphabetic and numeric characters.
field1 CONTAINS field2

field1 contains field2
field1 ~= field2
String Contains True if field2 is found anywhere within

field1. Leading and trailing blanks are trimmed from both
fields before comparisons.
field1 STARTS_WITH field2

field1 starts_with field2
field1 ~< field2
String Starts With True if field1 starts with field2.

Leading and trailing blanks are trimmed from both fields
before comparisons.
field1 ENDS_WITH field2

field1 ends_with field2
field1 ~> field2
String Ends With True if field1 ends with field2. Leading

and trailing blanks are trimmed from both fields before
comparisons.
Associativity
13-13
Example
This is an example using all twelve conditions.
if(zip_code GT "10000"
zip_code LT "50000"
pr_rev_group GE "008"
pr_rev_group LE "010"
pr_gout_fail_level == "0"
state != "NY"
first_name starts_with "PH"
last_name ends_with "ING"
company_name contains "TAXI"
in.birth_date is numeric
postal_code[1:1] is alphabetic
company_name is alphanumeric
move "1", flag;
else
move "0", flag;
endif;
Logical
Operators
AND
AND
AND
AND
AND
AND
AND
AND
AND
AND
AND
) then
IF conditions can be combined using logical AND and OR operators

to create compound conditions.
Table 13-2: Data Reconstructor Logical Operators

Logical Operators
Description
condition1 AND condition2

condition1 and condition2
condition1 && condition2
Logical AND. TRUE only if both condition1 and condition2

are TRUE.
condition1 OR condition2
condition1 or condition2
condition1 || condition2
Logical OR. TRUE if either condition1 or condition2 is

TRUE.
The order of evaluation of compound conditions is described in the

section Precedence and Associativity. (See Rules File on page
13-3.) The usual order can be altered using brackets to group the
conditions to be evaluated first. See the following example.
13-14
Associativity
Example
if((pr_rev_group == "000" OR pr_rev_group == "009") AND
pr_gout_fail_level == "0") then
/* Construct a new address from postal matcher output
fields */
Nested IF
Statements
You can create nested IF statements in which one IF statement is

embedded within another.
IF [condition1]
IF [condition2]
[Action statement1]
ELSE
[Action statement2]
ENDIF
ELSE
[Action statement4]
ENDIF
Example
rule LABEL1
if(gb_out_match_level = "0") then
if(gb_out_dpndthorough_name <> BLANKS) then
move
gb_out_house_number,
nwaddrl3;
append gb_out_dpndthorough_name, nwaddrl3;
append gb_out_dpndthorough_desc, nwaddrl3;
append gb_out_thorough_name
, nwaddrl4;
append gb_out_through_desc
, nwaddrl4;
else
move
gb_out_house_number
, nwaddrl3;
append gb_out_thorough_name
, nwaddrl3;
append gb_out_through_desc
, nwaddrl3;
endif;
endrule
The sample shows one rule definition called LABEL1, which will
either populate output fields nwaddrl3 or nwaddrl4, depending on
whether the field gb_out_dpndthorough_name is blank or not, as
long as the record had a match level of 0. Both nwaddrl3 and
Associativity
13-15
nwaddrl4 fields are populated if there was data in the dependent

thoroughfare name field.
Action
Statements
Syntax
verb [:modifier] [source field] [,] [destination field] ;
- orperform rule_name;
Some action statements may include a modifier that changes their
operation slightly. The modifier must immediately follow the verb
and be delimited from it with a single colon.
For example, the append:2spaces statement works like the
append statement, with the exception that two spaces are used for
a delimiter instead of one. The comma separating the source-field
from the destination-field is optional.
Specific action statements take either no, one, or two arguments as
described in the following sections.
Action
Statements
Require
No
Arguments
There is one action statement that requires no arguments:
Statements
Description
copy_all
Copies all corresponding input fields to output fields. Fields are

considered to correspond if they have the same name in both input and
output DDL files. Any output fields that do not correspond to input fields
are reset to blanks.
When DDL names are the same, copy_all moves the entire input record
to the output record instead of performing field-by-field moves as an
optimization.
The copy_all statement resets to blanks any output field

13-16
Associativity
that has no corresponding input field. For this reason, it
should always be used at the beginning of your script.
Action
Statements
Require
One
Argument
The script language has six action statements that require a single
argument. In each of these statements, the lone argument is used
to specify a destination field and cannot be a literal value.
Statements
Description
pack
Removes all blank characters from the destination field.
upper_case
Converts all of the characters in the destination field to upper case.
lower_case
Converts all of the characters in the destination field to lower case.
title_case
Converts all characters in the destination field to a mix of upper and

lower case. The first alphabetic character (and any alphabetic that
follows a non-alphabetic one) are converted to upper case; the
remaining characters are converted to lower case.
A special exception is made for apostrophe-s which is converted to
lower case. For example "MARY-JANES BAKERY" would be changed
to "Mary-Jane's Bakery".
right_justify
Right-justifies the contents of the destination field. Removes any

trailing blanks from the contents of the field.
left_justify
Left justifies the contents of the destination field. Removes any

leading blanks.
right_justify:full
Right justifies the contents of the destination field and converts

each occurrence of multiple blanks to a single blank. For example,
given a 20 character field containing the following value: "EXPIRY
20001127 ",
right_justify:full produces:"
EXPIRY 20001127"
Associativity
13-17
Statements
Description
left_justify:full
Left justifies the contents of the destination field and converts each
occurrence of multiple blanks to a single blank. For example, for a
20-character field containing this value:
"
THE
PIT STOP "
left_justify:full produces:"THE PIT STOP
"
proper_case
Converts all characters in the destination field to a mix of upper and

lower case using an external UPLOW table. When no corresponding
entries are found in the UPLOW table, the destination field is still
converted to mixed upper/lower case using title_case logic.
proper_case:a
proper_case:A
proper_case:anyline
proper_case:g
proper_case:G
proper_
case:geography
Indicates that the proper_case statement is not for any specific

line type. Only the ("A") line-type entries in the UPLOW table will
be searched. This is the default operation when no modifier is
specified.
proper_case for a field containing name information.
Searches the ("N") line-type entries in the UPLOW table, followed
by the ("A") line-type entries if a match was not found in the "N"
entries.
proper_case for a field containing street information.
Searches the ("S") line-type entries in the UPLOW table, followed by
the ("A") line-type entries if a match was not found in the "S"
entries.
proper_case for a field containing geography information.
Searches the ("G") line-type entries in the UPLOW table, followed
by the ("A") line-type entries if a match was not found in the "G"
entries.
perform
Causes all statements in another rule to be executed.
proper_case:n
proper_case:N
proper_case:name
proper_case:s
proper_case:S
proper_case:street
Example:
perform
fix_name_line
This will execute all statements in the previously defined fix_

name_line rule in the same rule-file. You can only perform a rule
that has already been defined within the rule file.
13-18
Associativity
Action
Statements
Require
Two
Arguments
The first argument specifies a source-field or a literal value and the

second argument specifies the destination-field.
Statements
Description
copy
Copies the contents of one field to another, adjusting the data type (if
necessary) to match the description of the output field in the DDL. If
the first argument is a literal, a move operation is performed instead of
a copy.
move
Moves one text field to another. Unlike copy, no conversion from one
data type to another is attempted.
If source-field is longer than destination-field, it is truncated during the
move. If source-field is shorter than destination-field, the destinationfield is padded with blanks after the move.
append
Appends the contents of one field to the end of the contents of another
field, after first adding a single blank character as a separator.
If the destination-field is currently empty (all blanks) then a move
operation is performed instead of append. This makes it possible to
perform a series of append operations on the same destination-field
without creating unwanted blanks at the beginning of the field.
If there is not enough room at the end of the destination-field, the
source-field will be truncated to fit. There must be at least 2 blanks at
the end of the destination-field before an append operation will be
attempted.
append_pack
append:pack
append:0spaces
Works like the append statement, but without the blank separator.
Appends the contents of one field directly to the end of contents of
another field.
There must be at least 1 blank at the end of the destination-field before
this operation will be attempted.
Associativity
13-19
Statements
Description
append:2spaces
Appends contents of one field to the end of the contents of another field
after first adding two blank characters as a separator. May be required
in some countries (e.g. Canada) to separate the postal-code from the
remainder of the line.
If there is not enough room at the end of the destination-field, the
source-field will be truncated to fit. There must be at least 3 blanks at
the end of the destination-field before an append:2spaces operation
will be attempted.
Overlapping
Fields
Move, append and append_pack/append:pack operations with

source and destination-fields that overlap in memory are fully
supported. These operations are completed as if a temporary copy
of the source-field had been made before the operation started.
Example
move "TRILLIUM,
out.temp;
move out.temp[2:4], out.temp[1:4]// following this move the out.temp
// field will contain RILLIUM"
String
Variables
String variables must be declared with a STRING keyword before

they are used. String variables can be declared in two places:
At the beginning of the rules file, before any rules are
defined
At the beginning of a specific rule, before the first action
statement
String variables have specific properties:
Names begin with a dollar sign: $NAME
They are also case-sensitive ($NAME vs. $name vs.
$Name)
They have a default length of 256 characters, unless a
different length is specified at the time theyre first declared
Example
String variables may be used any place in a rule that a DDL field
name can be used.
13-20

STRING $LAST_NAME[30];
STRING $last_name;
STRIN $BigBuffer[10000]
// 30ch long
// 256ch long
//10,000ch long
rule
sample1
STRING $name[50];
move in.first_name, $name;
append in.last_name, $name;
move $name, out.full_name;
endrule;

In this example, the Data Reconstructor uses the output from the
Create Common Utility step as input. By using the same output DDL
(global DDL) for all country-specific data, you can standardize the
global data into the same format.
The global DDL contains these fields:
Original name and address fields
Country-specific Postal Matcher match level codes
Reconstructed name and address fields based on new
standardized, enriched and linked data fields
(NEWADDRESSL1 - 10)
Original user-defined fields
1.
Open the Data Reconstructor step and select the Input

Settings tab.
2.
Name text boxes.

Tip:
You can either
edit the file names
manually or click
the File Chooser
icon to
browse
for and
select the file.
To view the
contents of your
data file,
click the
Data
Browser
icon.
Use the
Dictionary
Editor to view the
contents
of the
DDL file.
13-21
3.
4.
Specify the Output File Name and Output DDL Name.


qualifier.
1.
2.
3.
4.

1.
2.

specifies the mamimum number of records to process. By
Valid delimiters
are Tab, Space,
Semicolon,
Comma, and
Pipe. Characters
other than those
listed must be
enclosed by
quotation marks.
1.
2.


1.
2.

13-22
Settings for the Data Reconstructor

3.

Settings.
4.

instructions.
Settings for the Data Reconstructor

Setting the Rules File

To specify the rules file
1.
2.
Specify the Rules File.

Example:
C:\TrilliumSoftware\tsq10r5s\tmt\settings\
usdrrules.sto
Setting the Use Rule

Each rules file can contain a number of rules available for use. Each
rule begins with the rule keyword and ends with the endrule
keyword. You must specify which rule you are using.
Setting the Use Rule
13-23
Example
A sample usdrrules.sto file might look like this:
Rule
rule label_line
Rule Name
Keyword
#---------------------------------------#
# Output Alignment Section
#---------------------------------------#
if(out.NEWADDRL4(1:5) = "
") then
move out.NEWADDRL5, NEWADDRL4;
move " "
, NEWADDRL5;
endif;
Endrule
endrule
Keyword
To specify the Rule

1.
2.
Specify the rule name in the Use Rule field. For the example
above, enter label_line. If you are using multiple rules in
the Rules File, place a comma after each rule.
Rule File
Use Rule
Figure 13.1 Rule Settings
13-24
Additional Settings
Additional Settings
See Data
Reconstructor
in the TS
Quality
Reference
Guide for
complete
settings
information.
To use alphabetic characters table

You can include a table which identifies characters that are
alphabetic characters. This setting may be required for the
special characters found in many languages. When this table is
not specified, your operating systems default settings will be
used.
This table is used by the is alphabetic, is alphanumeric,
proper_case and title_case rules.
1.
2.
In Alpha Defines Table, enter the path and file name of

your alphabetic character table.
To use numeric digits table

You can include a table which identifies characters that are
numbers. This setting may be required for the special characters
found in many languages. When it is not specified, your
operating systems default settings will be used.
This table is used by the is numeric and is alphanumeric
rules.
1.
2.
In Numeric Defines Table, enter the path and file name of

your numeric digit table.
To use lowercase/uppercase translation table

You can include a table used to translate characters to all lower
or upper case. This setting may be required for the special
characters found in many languages, or to convert from one
code page to another. When this table is not specified, your
operating systems default settings will be used.
Additional Settings
13-25
This table is used by the proper_case, title_case, and lower_

case/upper_case rules.
1.
2.
In Lowercase Translation File or Uppercase Translation

File, enter the path and file name of your table.

1.
2.
3.
name, or specify the file to which debugging information will
be written.

1.
2.


1.
2.

drop-down list.
13-26
Run Data Reconstruction and View Results
Run Data Reconstruction and View Results

To run Data Reconstructor and view results
When you click

Run,
TS Quality
automatically
saves your
settings.
To save your
settings
without running
the step, click
Save.
1.
2.
Click Run to run the Data Reconstruction step.

Selected.
3.
Select OK at the Message box.
4.
5.

Browser button next to the Output File Name.
6.
In the Field Selection window, select the fields you used for
the Data Reconstruction process, such as NEWADDRL1 NEWADDRL5.
7.
Click Display to review the data and ensure that it has been
reconstructed properly.
Bringing the Data Together
13-27
Bringing the Data Together

The final step in the project uses the Transformer to merge multiple
files into a single output file of global data. On input, select only the
survivor records for inclusion in the single output data file. On
output, you will have a file of survivor records with the accurate
account representative assignment and the format required for
each country.
Add a Global Transformer step

To merge data from multiple countries together, you must add a
Global Transformer step to the project.
To add a Transformer step to the project
If the Data Flow
Architect area
is locked,
unlock it by
right clicking in
the Data Flow
Architect and
selecting Lock.
1.
From the Main menu, click Edit, Add new project step
from palette. The palette displays all the available steps.
You can also select the List View tab and click Add
New Step from Palette. Right-click anywhere in the
Data Flow Architect area and select Add New Step
from Palette.
2.
Under the Standardization category, select Transformer.

Drag and drop this step onto the Data Flow Architect area.
3.
In the Choose Country Name box, select Global.
4.
In the Provide a Unique Step Name box, enter a step

name: for example, Transformer at End.
5.
Drag and drop this Transformer at End step to the end of

the country flows and attach the Data Reconstruction steps.
6.
To connect the steps, click the right connection area on the

Data Reconstruction step, then click anywhere on the
Transformer at End step.
Alternatively, you can right-click the steps and select Start
Connection or End Connection from the pop-up menu.
13-28
Add a Global Transformer step
Figure 13.2 Connecting Steps in Data Flow Architect

See Using the Data Flow Architect on page 2-20 for
detailed instructions on how to connect steps.
13-29

Tip:
You can either
edit the file names
manually or click
the File Chooser
icon to
browse
for and
select the file.
To view the
contents of your
data file,
click the
Data
Browser
icon.
Use the
Dictionary
Editor to view the
contents
of the
DDL file.
1.
Open the Transformer at End step and click Input

Settings.
2.
Select the Input Data and Input DDL files for the first country
(for example, Canada). Click Add.
Figure 13.3 Input File Settings

3.
Add the other countries input data files and DDLs.
Figure 13.4 Input File Settings for All Countries

4.
5.
Select the Output File Name and Output DDL Name.
Figure 13.5 Output File Settings

6.

13-30
Select and Bypass Records

The final step should use only survivor records as input; that is,
records with a 1 in the Survivor_flag field. This will create a final
output file with one contact per business. This record will also have
the commonized account representative data. To achieve this
result, use the Select and Bypass Condition function on the input
files.
To build a Select/Bypass Condition
See Select or
Bypass
Records on
page 5-37 for
more
information.
1.
Click Advanced and navigate to Input, Settings, Select

Record Conditions.
2.
Click the first condition row and select Edit Condition. The
Logic Builder window appears.
3.
Double-click Survivor_flag in the DDL Fields list.
4.
Double-click EQUAL TO in the Operators list. Enter 1. This

operation will select only records that have a 1 in the
Survivor_flag field.
5.
Click Apply and Close. Remain on the Advanced Settings

window.
Figure 13.6 Select Bypass Logic Builder

Process Settings
13-31
6.
Select the next input file name under Input Files and repeat
the steps.
7.
Repeat for all remaining input files.
Process Settings
Once you have specified input and output files, you can specify
Source Identification
Since you are merging multiple records into one file, an input
source identifier should be applied. This indicator designates the
origin of the record.
To specify source identification
The File
Source field is
set to four
characters in
the standard
DDL.
1.
Click Additional Settings.
2.
Enter File Source to indicate the input origin of the record.

For example, use CA for the records from the Canadian file.
Remain on the Advanced Settings window.
Figure 13.7 Source Identification

13-32
Run Transformer and View Results

3.
Select the next input file name under Input Files and repeat
the steps.
4.
Repeat for all remaining input files.

To run Transformer and view results
When you click

Run,
TS Quality
automatically
saves your
settings.
To save your
settings
without running
the step, click
Save.
1.
2.
Click Run to run the Transformer at End step.

Selected.
3.
Select OK.
4.
Summary results appear on the Results tab of this step.
Figure 13.8 Summary Resutls

13-33
5.
Notice that only selected records from each input file are
included.
6.
Navigate to Output Settings and click the Data Browser

button next to the Output File Name.
7.
In the Field Selection window, click Add All and Display.
8.
Review the records and fields selected by this process. The

final records would look like this:
Figure 13.9 Final Record Output
13-34
14-1
CHAPTER 14
Packaging Projects
Packaging Projects
14-2
Most users create a script from the steps in the project. This
chapter will describe how to create and run a script, and provides a
summary of Real-Time Processing. Real-Time processing can ensure
that new data entering the database is transformed, cleansed,
enriched and linked at the point of entry. The TS Quality Analyzer
tool can demonstrate a Real-Time processing environment.
This chapter focuses on several tasks:
Use the List View to order and select steps for inclusion to a
batch script
Generate a script to run all selected steps
Save, view and run the script
Export and import projects
Understand the Director architecture and the role of the
cleansing/matching servers
Move from a batch environment to a real-time environment
Understand the role of the business rules
Use the TS Quality Analyzer to sample real-time cleansing
and matching
Batch Script
14-3
Batch Script
You can combine projet steps into a batch script that can be run
on UNIX or Windows platforms. The Windows interface lets you set
up a project that exists on UNIX (by using mapped network drives),
but that can be run using the local Windows client.
Create a Script
Before generating a script to run your project, you must select the
steps to be included.
To select steps
1.
In the List View in the Control Centers main window, use

CTRL+click and select the desired steps.
If you want to select a block of contiguous steps, click on
the first step desired and then SHIFT+click on the last
step desired.
If you want to select all the steps in the List View, click on
Select All Steps
on the tool bar.
To generate a batch script

All batch files
have the
extension of
.bat.
1.
Click Generate Script to Run

Batch Process window will appear.
on the tool bar. The
Figure 14.1 Batch Process Window

Packaging Projects
14-4
Edit a Script
2.
On the Save As tab, specify the name and location of the

script.
3.
Click Save to save your steps for a batch process.
Edit a Script
To edit a batch script
1.
The Modify tab allows you to view and edit the batch file.
Select the Modify tab. The steps in the batch file are listed
on the left and the file contents appear on the right.
If you click a step on the left, the right pane will
automatically page down to the steps section in the file.
Any remarks are preceded with a *rem*.
Figure 14.2 Batch Script - Modify Tab
Run a Script
14-5
2.
To edit the script, click Edit. This will open the script in the
WordPad editor.
3.
Make your changes and save the file.
4.
Click Refresh
to update the file.
Run a Script
To run a batch script
1.
Select the Run tab. From this tab, choose one of the
following methods to run the batch file:
Run as an attached process - If you run the batch file
as an attached process, the Control Center must remain
open. The Notify when job is done running check box
is active only when this option is selected.
Run as a detached process - If you run the batch file
as a detached process, the process will run independently
of the Control Center.
Figure 14.3 Batch Script - Run Tab

2.
Select the appropriate radio button, and select Notify when

job is done running if necessary.
Packaging Projects
14-6
Create Multiple Batch Files

3.
Click Run to start the process. If you run the batch as an

attached process, you will see the following Message box
when the process is complete.
Figure 14.4 Batch Script - Complete Message

4.
Click OK. Review the Console Results.

If there are no messages in the Console Results, the
batch job ran successfully.
Create Multiple Batch Files

After running the batch process, the Batch Process window remains
open. If you want to run another batch process, you must close
the window from the first batch process that was run. A new
batch file requires a fresh batch window.
Exporting/Importing Projects
14-7
Exporting/Importing Projects
The Export and Import functions take a Control Center project and
relocate it to a new release directory, or to another machine,
without loss of project steps. This feature lets you move project
contents to different platforms and drive locations.
Whether an upgrade is taking place or a project is being moved
from one licensed server to another, Import/Export procedures
make it easy to move your data quality process from one location to
another. This feature also allows previous versions of the TS Quality
to be migrated successfully into the current environment.
Projects created with TS Quality are fully exportable and can be
imported without loss of user-defined steps. These projects can be
exported into a TS Quality directory.
In order to export and import a project to another physical
machine, the project directory and its accompanying
subdirectories must be physically copied to a media device
and transferred to the other machine.
Packaging Projects
14-8
Export Projects
Export Projects
The first step of the export/import procedure is to export the
project.
To export a project
1.
In the Control Center main window, double-click the project

that you want to export.
2.
Select File, Export Project...
Figure 14.5 Export Project

3.
In the Export Project window, accept the default exported

project file name, or provide the correct file name. The
default file name adds a .zip extension to the current file
name.
4.
Click Start Export. The export process starts.
5.
Select OK.
Import Projects
14-9
Import Projects
The second step of the export/import procedure is to import the
project into the current environment.
To import a project
1.
In the Control Center main window, select File, Import

Project...
Figure 14.6 Import Project

2.
Enter file names for the Project to Import (the .zip file),
New Project Directory and New Project Name. Use the
navigation buttons to browse for the file and directory.
3.
Under Original Path, you should see the old path for the
project, postal table, census table and software. If you want
to change this information, specify the new location under
the New Path area and click Substitute.
4.
Click Start Import. The import process starts.
5.
Select OK. The old project will be re-created in the new

location.
Note that the imported project is set up exactly as it appeared
Packaging Projects
14-10
Import Projects from Windows to UNIX

before it was exported:
All steps will be included and every step module will be in
the same place in the Data Flow Architect window as it was
before the project was exported.
Settings files for each step retain their information. The file
contents themselves are unchanged, except in cases where
the platform of the import project was different than the
platform it was created on when it was exported. In this case,
the slashes in paths will change to the direction that is
correct for the new platform. If relative paths were used
within the settings files, this feature will save some effort in
changing path locations within the files.
Import Projects from Windows to UNIX

If the original platform was Windows and the new platform is UNIX,
the slash direction will be reversed. The paths that include drive
letters will need to be corrected manually to specify the UNIX
location.
Using relative paths (for example, ..\ddl\input.ddl) will save work if
you know in advance that your project will be moved from Windows
to UNIX. It is the users responsibility to modify all path definitions
in every file and each run command correctly once the import is
complete.
We recommend testing the export process by moving or
removing some data files from the original project before
exporting the project or moving it onto a portable media
device.
Real-Time Processing
14-11
Real-Time Processing
Now that the batch process is in place, you can leverage the
business rules which you have developed in batch and use them in
a real-time environment. By implementing a real-time solution,
your data can be transformed, cleansed, enriched and linked at
point of entry. This section provides an overview of the TS Quality
real-time processing with the Director. The Director for TS
Quality is an optional application. For information on the Director
for TS Quality, please contact your Trillium Software sales
representative.
The Director
The Director acts as a registry for cleansing and matching servers
that are made available to the calling environment.
Figure 14.7 Director Architecture Overview

Packaging Projects
14-12
Cleansing Server
Cleansing Server
TS Quality uses a single API approach to simplify the task of moving
data through the various country specific modules. The simple API
eliminates the need for the programmer to know the internal
workings of each TS Quality function. This interface uses a single
configuration file to enable simple construction of complex
transactional data quality processing. The business rules developed
in the batch application are used in this configuration file.
Matching Server
The Matching Server supports reference matching, allowing you to
compare an incoming record to the database of existing records.
Match results are returned to the calling application.
Figure 14.8 Directory Architecture - Startup Process
Real-Time Transaction Processing
14-13
Real-Time Transaction Processing

Once the Director and Cleansing/Matching servers are started, a
transaction record can be processed by the calling application.
Initially, the client makes a request for the cleansing services. The
Director provides a handle to the cleansing server so that the calling
application can communicate directly with the cleansing server. The
application sends the data to the cleansing server and the cleansed
data is returned to the calling application. The handle is then
released back to the Director. The process is repeated for matching.
Figure 14.9 Director Architecture - Record Processing

Through the reference match function in the Relationship Linker,
duplicate records can be identified so that they do not enter the
database.
Packaging Projects
14-14
Moving From Batch to Real-Time
Moving From Batch to Real-Time

Business rules designed for the batch process can also be used as
resources for real-time processing. Batch process DDLs and
settings files can be reused in real time. An XML configuration file,
specific to the Director Architecture, is used to control the real-time
process modules.Arguments in this file make individual modules
available to the calling application.
Linking Single Record Using the TS Quality

Analyzer
You can use the TS Quality Analyzer to watch real-time processing.
The TS Quality Analyzer application is written in C# and acts as the
transaction broker for the TS Quality real-time interface.
Process Method
When a record is entered into the data entry window, it is collected
and sent to the real-time cleansing engine when the user clicks the
Cleanse button. The cleansed results are displayed on the screen.
Next, the cleansed transaction record can be compared to records
in the master database with the Relationship Linkers reference
match function. Candidate records are retrieved from the database
for the given window key. The transaction record and the retrieved
records are then sent to the reference match function for
comparison.
The calling application must retrieve the records from the
database using the window key from the transaction
record. The retrieval of records is not a function of TS
Quality.
If a match is found, the matched record is displayed. If no match is
found, a message will be displayed on the screen.
See Matching on page 8-8 for instructions on how to run
the Matching function using the TS Quality Analyzer.
15-1
CHAPTER 15
Working from the

Command Line
Working from the Command Line
15-2
This chapter describes commands that run the TS Quality modules
on UNIX and 32-bit PC platforms. Use these commands for two
purposes:
run modules
create log files
Executing TS Quality Modules
15-3
Executing TS Quality Modules

All TS Quality modules can be run from the command line.
Before you run a module, make sure that your
environment variables are properly set. See
Getting Started with TS Quality for instructions.
Syntax
The syntax for command line execution is:
<program_name>
<settings_file_name>
<log_file_name>
where:
Example
program_name
The modules executable. See the next page for a

complete list of program names.
settings_file_name
The path and name of the modules settings file.
log_file_name
The path and name of the modules log file. A log

file displays any processing errors in the program.
This command is optional.
This is a sample command to execute the Transformer.

tranfrmr
..\settings\ustranfrmr.stx ..\data\error.log
15-4
Program Names
Program Names
Table 15.1 contains a complete list of program names. Use the
appropriate program name in your command line:
Table 15.1 Command Line Execution Program Names

Module
Program Name
Transformer
tranfrmr
cusparse
apparse (China, Japan, Korea, and Taiwan)
Parsing Customization
prcustom
apcustom (China, Japan, Korea, and Taiwan)
busparse
Postal Matchers
xxpmatch (xx=country code)

Examples:
aupmatch (AU) capmatch (CA) depmatch (DE)
hkpmatch (HK) gbpmatch (GB) uspmatch (US)
tqpmatch (TQ)
appmatch (China, Japan, Korea, and Taiwan)

Global Data Router
globrtr
Window Key Generator
winkey
Relationship Linker
rellink
Create Common
common
Data Reconstructor
datarec
tsqdisp
File Update Utility
fileupdt
Program Names
15-5
Table 15.1 Command Line Execution Program Names

Frequency Count Utility
tsqfreq
Merge Split Utility
mrgsplit
Resolve Utility
resolve
Set Selection Utility
tsqsetsl
Sort Utility
tsqsort
15-6
Program Names
16-1
CHAPTER 16
Working with the TS Quality

Utilities
Working with the TS Quality Utilities
16-2
TS Quality offers a number of utilities to perform specific tasks. All
utilities can be executed from the TS Quality Control Center, in a
batch process, or from the command line. This chapter explains
how to use these utilities:
File Display utility
File Update utility
Frequency Count utility
Merge Split utility
Resolve utility
Set Selection utility
Sort utility
Each utilitys basic settings, such as input and output settings, are
the same as the TS Quality core modules. This chapter focuses on
the process settings information (the Advanced Settings window)
for each utility.
Refer to the TS Quality Reference Guide for
complete settings information on each utility.
16-3

The File Display Utility lets you create a customized display copy
of a file without altering the contents of the original file. For
example, use the File Display Utility to review your output data
after you run the Relationship Linker to determine if your linking
results are accurate, or if you need to tune your business rules to
improve the results.
You can also use the File Display Utility to create a new file that
organizes the original files data to meet specific display
requirements.

The File Display Utility can use the input file and input DDL from any
other step. This utility is most often used to display and verify
results from the Relationship Linker. In this case, use the Input File
Name and Input DDL Name from the output of the Relationship
Linker step.
The File Display Utility does not use a DDL for output.
Outer Key and Inner Key

The data that will be displayed is grouped by Outer Key and Inner
Key.
Outer Key - Creates a group or records which have the
same value in the field specified in the Outer Key Field.
Inner Key - Creates a group of records within the outer
key that have the same value in a field specified in the Inner
Key Field.
For example, the data can be grouped by LEV1_MATCHED (Outer
Key) and LEV2_MATCHED_IN_LEV1_MATCHED (Inner Key) as
shown in Figure 16.1:
16-4
Outer
Key
Group
Outer Key and Inner Key
Inner
Key
Group
Figure 16.1 Outer Key Group and Inner Key Group

To specify outer key and inner key fields
1.
From the File Display Utility step, click Advanced and

navigate to Input, Settings.
2.
Select the Outer Key Field and Inner Key Field from the
drop-down menu.
In addition, you can specify the following settings for the outer key
and inner key:
Setting
Description
Inner Key Field

Encoding
Encoding type used by Inner Key Field. If not

specified, the platforms native encoding is used by
default.
Title and Delimiters
16-5
Setting
Description
Inner Key Compare

Bytes
Number of Inner Key Field bytes to use. By default,

this is the field length of Inner Key Field.
Outer Key Field

Encoding
Encoding type used by Outer Key Field. If not

specified, the platforms native encoding is used by
default.
Outer Key Compare

Bytes
Specifies the number of Outer Key Field bytes to

use. By default, this is the field length of Outer Key
Field.

You must specify the title and delimiters for the title section, the
outer key lines, and the inner key lines.
Title
Delimiters
Outer Key
Delimiters
Inner Key
Delimiters
Figure 16.2 Title, Outer Key, Inner Key Delimiters

In this example, the lines in the reports title section are separated
16-6

by a series of forward slashes (/), while the outer lines are
separated by an asterisk (*) and the inner lines are separated by a
hyphen (-).
To specify the title
A red flag
indicates a
REQUIRED
settings.
1.
2.
In Title 1, enter the first title line of the report. This line
must be enclosed within quotation marks ( ) if there is a
space between characters. For example: Matching Report
3.
If necessary, specify a second title line in Title 2 and a third

title line in Title 3. Those lines must also be enclosed by
quotation marks ( ) if there is a space between characters.
4.
In Title 1, 2, 3 Encoding, specify the encoding for each line

if necessary. If not specified, the platforms native encoding
is used by default.
To specify the delimiters

1.
2.
In Title Line Delimiter, specify a line delimiter for the title

line.
3.
In Outer Key Delimiter, specify a line delimiter for the

outer key. A Tab, Space, Semicolon, Comma, Pipe, or
any other character may be used. Characters other than
those listed must be enclosed by quotation marks.
4.
In Inner Key Delimiter, specify a line delimiter for the

outer key. A Tab, Space, Semicolon, Comma, Pipe, or
any other character may be used. Characters other than
those listed must be enclosed by quotation marks.
5.
In the Encoding settings for each line, specify the encoding

if necessary. If not specified, the platforms native encoding
is used by default.

6.
16-7
In addition, you can specify the following settings for the

outer key and inner key:
Setting
Description
Carriage Return
Indicates how the end of line is indicated in the report,

which affects how the report displays on different
platforms. When checked, a line feed is used to indicate
the end of a line (UNIX platforms). If unchecked, a
carriage return/line feed is used to indicate the end of a
line (all other platforms).
Maximum Lines
Per Page
Numeric value that specifies the maximum number of

lines to display on a page. The default is 66.
Compress Blank
Lines
When checked, compresses blank lines on the report.
Inner Break
Spacing
Number of separator lines to print for the break of the

inner set. The default is 1.
Outer Break
Spacing
Number of separator lines to print for the break of the

outer set. The default is 1.
Outer Key
Minimum
Numeric value that specifies the minimum number of

records in the set to display in the report.
To display the record in a format that is easy to read,

specify at least Title1, Title Line Delimiter, Outer
Key Delimiter and Inner Key Delimiter.
16-8
Field Settings
Field Settings
The fields that will be displayed in the report should also be
identified.
LINE_01
LINE_02
LINE_03
LINE_04
Figure 16.3 Fields Displayed

This sample report includes LINE_01 to LINE_04, and all fields are
displayed on one line.
To specify fields to be displayed
1.
Click Advanced and navigate to Output, Create Report.
Field Settings
2.
See TS Quality
Reference
Guide and
Online Help
for complete
settings
information.
16-9
Refer to the table below and configure these additional

settings:
Setting
Description
Report Value Type
Defines the type of Report Value entry. Values are:

DDL field name
Literal value
Insert spaces only
If the space option is used, Report Value is
not required. However, you must specify the
number of spaces with Report Value
Length.
Report Value
Specifies either a literal value or a field name to

display in the report.
Report Value
Encoding
Specifies the encoding used by Report Value. If not

specified, the platforms native encoding is the
default.
Report Value Length
Specifies a limit for the length of the value displayed

in the report. This is useful when using a DDL field
and the field length is very long but might not be
completely filled.
Report Line Number
Specifies the line number for the value in Report

Value. The same Report Line Number value can
be associated with more than one Report Value,
but they must be grouped together and remain in
numerical order.
16-10
File Update Utility
File Update Utility

The File Update Utility updates a file, called the master file, with
the data contained in another file, called the transaction file. You
can update records in the master file based on a specific key set
when the keys match between records.The File Update utility
separates records based on match or not-match key conditions and
creates separate files for these records for further review or
processing.
Match Keys and Fields

The update will be applied to the records in the master file based on
a Match Key specified by the user. If the match keys values are
equal in the master and transaction files, the record will be
considered matched, and field values in the matched records in
the master file will be updated by the values in the transaction file.
Fields to be updated are determined by the output DDLs for the
output files. If the key values in the master file and transaction file
are not equal, the record will be considered unmatched and the
unmatched records will be written out.
Match keys are specified with the Match Key setting.
Field names are used for match keys.
The values in the key fields must be equal in both the
master and transaction files to perform the update.
Up to five match keys can be specified.
All input files (master and transaction files) must be sorted
in ascending order by the match key.
Neither field names nor field lengths for match keys need
to be equal for the master and transaction files. The
program will search for a match based on the values in
the key fields specified.
Fields to be updated must be specified in the output DDLs. When
there are common fields between the master and transaction files,
16-11
the values in the common fields in the master file will be

overwritten (updated) by the values in the same fields in the
transaction file.
Example
In these sample master and transaction files, the following fields
are used (the Match Key is the Record_Key field):
Tran File
Master File
Record#
Record_Key
Street
City
Record#
Record_Key (Match Key)
Name
(Match Key)
State
If the output DDL has all fields from the master and transaction
files, the match master file includes the following fields. Therefore,
the value in the common field, Record#, will be overwritten by the
transaction file:
Match Master File
The values of
these fields
are inserted
from the tran
file.
Record #
Record_Key
Name
Street
City
State
The value of this

field will be
overwritten
(updated) by the
tran file.
If you prefer not to overwrite or update the common field

in the master file with information from the transaction
file, redefine the field name either for the master or for
the transaction file.
16-12
Example
Input
In this example, the master file contains the customers names and
the transaction file contains the customers addresses.
Match key: Record_Key field.
Master Input File (master.dat)
Rec #
Record_Key
Name
0001
John Nicoli
0001
J Nicoli
0002
Mary Rogers
0003
Kevin McCarthy
Tran Input File (tran.dat)

Rec #
Record_Key
Street
City
State
100
0001
25 Linnell Circle
Billerica
MA
200
0001
1 Elm St.
Nashua
NH
300
0002
25 Linnell Circle
Billerica
MA
400
0002
12 Oak St.
Waltham
MA
500
0004
3 Royal Court
Boston
MA
Output DDL
The following fields are used in the DDL for ALL master output files
(match_master.ddx, match_dup_master.ddx, unmatch_
master.ddx):
Rec #
Record_Key
Name
Street
16-13
Rec #
City
State
Output
The program searches for records in master.dat that have the
same key values as tran.dat. In this case, the records with
0001 and 0002 in the Record_Key field are matched
records.
1.
2.
The Rec #1 in master.dat and the Rec #100 in tran.dat are

matched records because they matched first. Similarly, Rec
#3 and the Rec # 300 are matched records. Therefore, in the
match master file, the values in the Rec# field will be updated
by tran.dat, and all address-related fields and their values will
be added from the tran.dat file.
Match Master File (match_master.dat)

Rec #
Record_Key
Name
Street
City
State
100
0001
John Nicoli
25 Linnell Circle
Billerica
MA
300
0002
Mary Rogers
25 Linnell Circle
Billerica
MA
3.
Rec #2 in master.dat is a duplicate matched record because

it appeared after the first matched record (Rec #1). Therefore,
Rec #2 is written out to the match master duplicate file. For the
match master duplicate file, the user must select whether or
not to update the record by the Update Output Rec setting. See
the following cases:
Duplicate records are additional records with the same key
that appear after the first match occurs.
Match Master Duplicate File (match_dup_master.dat)

Case 1: Update the matched duplicate records
UPDATE_OUTPUT_REC
ON
16-14
Rec #
Record_Key
Name
Street
City
State
100
0001
J Nicoli
25 Linnell Circle
Billerica
MA
Case 2: Do not update the matched duplicate records

UPDATE_OUTPUT_REC
Rec #
Record_Key
Name
0001
J Nicoli
4.
OFF
Street
City
State
Rec #4 in master.dat is an unmatched record because it does

not share the same key value with any records in tran.dat.
Therefore, Rec #4 is written out to the unmatch master file.
Unmatch Master File (unmatch_master.dat)

Rec #
Record_Key
Name
0003
Kevin McCarthy
5.
Street
City
State
The matched records in tran.dat are written out to the match_

tran.dat file to show which transaction records matched a master
record. If there are any matched duplicate, unmatched, or
unmatched duplicate records in the transaction file, they are also
written out to separate files.
Match Tran File (match_tran.dat)

Rec #
Record_Key
Street
City
State
100
0001
25 Linnell Circle
Billerica
MA
300
0002
25 Linnell Circle
Billerica
MA
16-15
Match Tran Duplicate File (match_dup_tran.dat)

Rec #
Record_Key
Street
City
State
200
0001
1 Elm St.
Nashua
NH
400
0002
12 Oak St.
Waltham
MA

For input, you must specify the Master File Name and DDL,
Transaction File Name and DDL. For output, you can specify
various types of Match and Unmatch files.
Master and transaction files must be sorted by match keys.
Match Key Settings

The update will be applied to the records in the master file based on
the Match Key specified by the user. If the match keys values
are equal between the master and transaction files, the record will
be considered matched, and field values in the matched records in
the master file will be updated with the values in the transaction
file.
A red flag
indicates a
REQUIRED
setting.
To specify match keys

1.
From the File Update Utility step, click Advanced and

navigate to Input, Master Settings.
2.
Specify Match Keys by selecting field names from the dropdown list.
3.
Navigate to Input, Transaction Settings.
4.
Specify the same Match Keys as you specified for Master

Settings.
If you specify multiple match keys, separate the keys
with commas. For example: Last_Name,First_
Name,SS_number.
16-16
Transaction Output Settings

To enable Table Update
When it is not practical to sort the master file by the match keys
due to its size, enable the Table Update function to update the
master file without sorting. The program will only update the
first matched record in the master file with the contents of
the first matched record in any transaction file. Any
following matches for that match key will be ignored.
1.
Navigate to Input, Master Settings.
2.
Select Table Update.
If Table Update was turned on in the Master Settings,

duplicate records will not be written out because the
program will not search for duplicate records.
For the match master duplicate file, the user must
select whether or not to update the record by using the
Update Output Rec setting in the Master Match Dup
Settings.
Transaction Output Settings

If you specify transaction output files such as Tran Match File and
Tran Unmatch File, you need to set the Transaction File Qualifier in
Output Advanced Settings.
To specify transaction settings
See TS Quality
Reference
Guide and
Online Help
for complete
settings
information.
1.
Click Advanced and navigate to Output, Match Tran

Settings or any of the output transaction files settings for
the files you specified on the Output Setting tab.
2.
Select For Tran. This is the value specified in the File

Qualifier in Transaction Settings.
16-17

The Frequency Count Utility analyzes data records to determine
the frequency of input fields by counting the occurrences of literal
data strings, mask shapes and blanks.The resulting frequency
counts are displayed in the output file.
Example
As shown in this example, the data can be counted by FIRST_
NAME, LAST_NAME and STREET_ADDR. When multiple fields are
specified like this, the frequency counts will be made on the
combined value of the fields, not on the individual fields.
COUNT
65
35
35
30
30
35
34
34
1
1
35
35
35
35
35
35
35
35
35
FIRST_NAME LAST_NAME
John
John
Nicoli
John
Nicoli
John
Smith
John
Smith
Bernard
Bernard
LeCuyer
Bernard
LeCuyer
Bernard
LCuyer
Bernard
LCuyer
Clara
Clara
Currier
Clara
Currier
Iulia
Iulia
Andrei
Iulia
Andrei
Jack
Jack
Sweeney
Jack
Sweeney
STREET_ADDR
13 Yellow Way
23 Purple Circle
19 Blue St
19 Blue St
18 Red Road
14 Orange Parkway
17 Black Street
Figure 16.4 Frequency Count
16-18

The Frequency Count Utility uses the Input File Name and Input
DDL Name from any other TS Quality step.
Count Settings
To specify fields to count
A red flag
indicates a
REQUIRED
setting.
See TS Quality
Reference
Guide and
Online Help
for complete
settings
information.
1.
From the File Update Utility step, click Advanced and

navigate to Input, Freq Settings.
2.
Click Entry Settings. Select Field Name from the dropdown list. This is the field which will be counted.
3.
Select either
Literal or
Mask for
Count Type.
4.
Click Field
Settings.
Select Sort
Type
(Descending Order or Ascending Order) and Sort Option
(Count or Value). The Sort Option specifies whether to sort
the results by Count or by field Value.
5.
Optionally, you can check Show All Combinations to

display an expanded output view. You can also specify how
many records of a given frequency are displayed in the
Number of Top Occurrences text box. For example, a
value of 100 will show the top 100 most frequently occurring
records.
When Show All Combinations is checked, the
program counts the number of occurrences of a
specified data combination. For example, if a field
contains Tom Smith and another record contains Tom,
the report shows two occurrences of Tom and one
occurrence of Tom Smith.
Merge Split Utility
16-19
Merge Split Utility

The Merge Split Utility lets you manipulate files with merge keys
and split rules. You can create merge keys to determine how files
will be merged, and create rules to split files into multiple smaller
files or to produce multiple output files from a single input file.

The Merge Split Utility uses the Input File Name and Input DDL
Name from any other TS Quality step.
Using Multiple Input Files to Create an Output

DDL
You can specify up to a maximum of ten (10) input files and their
associated DDLs and use these to create a common output file for
later processing by modules downstream in your workflow. This
process requires that after you specify the input files, you map
input fields from the associated DDLs to a common output DDL file.
To add multiple input files and map
1.
Double-click a Merge Split Utility step to open the Merge

Split Utility window.
2.
In the Input Data File field, type or browse to the input file
you wish to use.
3.
In the Input DDL File field, type or browse to the inpt DDL
file associated with the input data file you specified in Step 2.
4.
Click Add.
5.
Repeat Steps 2-3 until youve added all DDL files you want to
use to create the common output format.
6.
Click the Define Output DDL button (bottom left).
7.
The Define Output DDL dialog appears.
16-20
Figure 16.5 Define Output DDL dialog

8.
Use the Input DDL drop-down menu to select the DDL file
you want to use to map fields to an output DDL file. The
input DDL fields appear in the left pane and the final output
DDL fields appear in the right-pane.
9.
Use the buttons in the center panel to refine the output DDL
list of fields. You can choose from these options:
Addadds the selected input DDL field to the output DDL
list.
Deletedeletes a selected output DDL field from the list.
Merge Files
16-21
Move Upmoves the selected field in the output DDL list up

one row.
Move Downmoves the selected field in the output DDL list
down one row.
Redefineredefines an input field as a portion of an output
field. Use this option to map multiple input fields to the same
redefined output field.
Consolidateconsolidates an input field with an existing
output field. Use this option when two or more fields have
different names but contain the same data, such as zipcode,
ZIP5, and postal_code.
For Redefine and Consolidate, make sure that the
lengths of the input fields do not exceed the overall
length of the redefined or consolidated output DDL
field.
10.
When you are ready, click Save to save the output DDL field
mapping. When the Merge Split Utility step runs, it will create
an output DDL file that uses this mapping.
Merge Files
For a merge operation, all input files that will be merged and the
output file MUST be the same shape. In other words, they must use
the same DDL.
Input files must be sorted by match keys.
Example
In this example, Input 1 will be merged into Input 2 using the
Name field as Match Key.
Input 1
Custmer_ID#
Name
0000001
John Nicoli
0000002
Mary Nicoli
16-22
Merge Files
Input 2
Custmer_ID#
Name
9000001
Alice Rogers
9000002
Kevin McCarthy
The following DDLs are used for ALL input files and output files:
Customer_ID#
Name
On output, the program copies Match Key values from Input 1 and
Input 2 along with other components of data. The record order will
be determined by the order of key values. As a result, the total
number of records is the sum of the number of records from both
Input 1 and Input 2.
Output File
Customer_ID# Name
9000001
Alice Rogers
0000001
John Nicoli
9000002
Kevin McCarthy
0000002
Mary Nicoli
To merge files
A red flag
indicates a
REQUIRED
settings.
1.
From the Merge Split Utility step, click Advanced and

navigate to Process, Settings.
2.
On the Field Settings tab, select Merge for Process Type

3.
On the Field Settings tab, select Match Key from the dropdown list.
You can specify up to five fields for the Match Key,
separated by commas.
Split a File
16-23
Split a File
Splitting a file is useful when your system has a file-size limit or you
want to separate a file into manageable pieces. The pieces can later
be re-assembled using the Merge operation. You can split the input
file by the number of records or bytes per segment.
To split a file
1.
2.
On the Field Settings tab, select Split for Process Type

3.
On the Field Settings tab, select Partition Method from

the drop-down list.
Partition Method
Description
Round Robin Number Split the file by number of records. If this is selected,
the Round Robin Number must be specified.
Round Robin Keys
Split the file by the key (field). If this is selected,

Match Keys must be specified.
The input file should be sorted by the Match Key
field.
Ranges
Split the file by a range of values. If this is selected,

Range Start and Range End values (Entry
Settings tab) must be specified.
Ranges Stable
Split the file by the field name and field length. The
field name and field length are specified by Match
Key.
Records Per Segment Split the file by segment. If this is selected, Records
Per Segment must be specified.
Bytes Per Segment
Split the file by segment. If this is selected, Bytes

Segment Per File
Split the file by the defined number of segments. The

number of segments is spcified by Number of
Output Files.
You can specify up to five fields for the Match Key,

separated by commas. If the Partition method is set to
Ranges, this setting can contain only one field.
16-24
Split a File
4.
On the Field Settings tab, specify Number of Output

Files. This is the number of output files to create. The entry
in the output file name is used as a base file name;
extensions will be generated up to the value specified here.
Examples
Round Robin Keys
In this example, the input file will be split to Output 1 and Output 2
using the Round Robin Keys method, and Lev2_matched field as
the Match Key.
Input File
Name
Lev2_matched
B McCarthy
000001
Bob McCarthy
000001
Catherine Rogers
000002
Cathy Rogers
000002
On output, the program splits the input file. The first Lev2_matched
group will be written to Output1, and the second Lev2_matched
group will be written to Output2.
Output 1
Name
Lev2_matched
B McCarthy
000001
Bob McCarthy
000001
Output 2
Name
Lev2_matched
Catherine Rogers
000002
Cathy Rogers
000002
Round Robin Number

In this example, the input file will be split to Output 1, Output 2 and
Output 3 using the Round Robin Number method, and the Round
Merge and Split Files
16-25
Robin Number is set to 1 .

Input File
Name
Lev2_matched
B McCarthy
000001
Bob McCarthy
000001
Catherine Rogers
000002
Cathy Rogers
000002
On output, record #1 is written to output1, record #2 is written to

output2, and record #3 is written to output3. Since there are only
three output files specified, record #4 goes back to output1, and
the cycles continue in this manner:
Output 1
Name
Lev2_matched
B McCarthy
000001
Cathy Rogers
000002
Output 2
Name
Lev2_matched
Bob McCarthy
000001
Output 3
Name
Lev2_matched
Catherine Rogers
000002

To merge and split files
1.
Click Advanced and navigate to Process, Settings. On the

Field Settings tab, select Both for Process Type from the
drop-down list.
16-26

2.
On the Field Settings tab, select Partition Method from

the drop-down list.
For the Both Process Type, first Merge, then Split will
be performed.
Partition Method
Description
Round Robin Number Split the file by number of records. If this is selected,
a Round Robin Number must be specified.
Round Robin Keys
Split the file by the key (field). The input file must be
sorted by this field. If this is selected, Round Robin
Keys must be specified.
Ranges
Split the file by a range of values. If this is selected,

Range Start and Range End values (Entry
Settings tab) must be specified.
Ranges Stable
Split the file by field name and field length. The field
name and field length are specified by Match Key.
Records Per Segment Split the file by segment. If this is selected, Records
Bytes Per Segment
Split the file by segment. If this is selected, Bytes

Per Segment must be specified. The number of
segments is specified by Number of Output Files.
Segment Per File
Split the file by the defined number of segments.
3.
On the Field Settings tab, select Match Key from the dropdown list.
You can specify up to five fields, separated by commas. If

the Partition method is set to Ranges, this setting can
contain only one field.
4.
See TS Quality
Reference
Guide and
Online Help
for complete
settings
information.
On the Field Settings tab, specify Number of Output

Files. This is the number of output files to create. If this
value is greater than 1, the entry in the output file name is
used as a base file name, and extensions will be generated
up to the value specified here.
Resolve Utility
16-27
Resolve Utility
See TS Quality
Reference
Guide and
Online Help
for complete
settings
information.
The Resolve Utility resolves transitivity affecting links between

transactions. When a Window Linking is performed with the
Relationship Linker, you can create a link file indicating which
records are linked together with common data.
Transitivity occurs when two records are linked together indirectly
through a third record in a multi-linking process. For example,
record A may have linked record B in the first run and record B may
link record C in a subsequent run using a different window key.
When this happens, record A has linked record C through
transitivity. Using the MALINK file from the Relationship Linker, the
Resolve utility creates a relationship of the records that can then be
used to represent the entire matched record set.
Please contact Trillium Software Customer Support
for more information on multi-linking.
Example
MALINK file from

match on SS#
MALINK file from

match on Name
Merge Files
Resolve
If the MALINK record layout is:

Recid(20) + Recid(20) + Match Type(1) + Pattern(3)
00000001
00000002
P
405
16-28

Multiple matches are run on Social Security Number and name.
Record A matches to B, and on the other match record B matches to
record C. Each run produces a MALINK file with the matches in it: A
-> B & B -> C. The MALINK files are then combined.
Recid
0002
0002
0007
0015
Recid
0007
0015
0009
0022
Type Pat
P
329
P
210
P
230
P
230
The Resolve Utility processes this file and produces the following
output:
Recid
0007
0015
0009
0022
Recid
0002
0002
0002
0002
Transitivity would also show that, if record A matched to record B

and record B matched to record C, then record A must also match
to record C.
The Resolve Utilitys output is then used to update the keys on the
first linkings output. Typically done with the File Update Utility, any
occurrence of Recid in column 1 is updated to the value in Recid in
column 2. The File Update Utilitys output is then sorted on the
updated key to group all recoded records together with their
resolved match set.

For input, the Resolve Utility takes merged MALINK files from the
2nd-Nth relationship linkers in a multi-link process. It resolves
transitivity issues of matches from the runs which followed the first
linking.
The Resolve Utility does not use a DDL for output.
Link Field
16-29
Link Field
To link files
A red flag
indicates a
REQUIRED
setting.
1.
From the Resolve Utility step, click Advanced and navigate

to Input, Settings.
2.
In From Link Field, select the DDL field from the drop-down
list. This DDL field contains the starting key of the link
(generally located in the left column of the Relationship
Linkers link output file).
3.
In To Link Field, select the DDL field from the drop-down

list. This DDL field contains the ending key of the link
(generally located in the right column of the link output file
from the Relationship Linker).
In addition, you can specify the following settings:

Setting
Description
Process Group
Records
Number of records to process in a set. When the

program reaches this limit, it writes output to the file
in a resolved form. Buffers are created and processing
continues.
Process Group
Memory
Maximum memory to use in a set. This overrides the

Process Group Record settings if both are used.
16-30

The Set Selection Utility selects data from a file and then skip and
select that data on output. Selection of data occurs based on Match
Keys, Select Record Conditions and Bypass Record
Conditions. Field names are used for match keys.
This utility is useful when you select records based on relationship
keys (created during the linking process) where you want a set of
records to be evaluated for defined criteria.
Example
For example, if the Match Key is Household_Number field, the
program first selects records that have the Household_Number field
in the input file.
Assume that, in the Select Record Conditions or Bypass Record
Conditions, the condition is set to Household_number=00001. In
this case, the program selects records if the values in the
Household_Number field equal 00001.
After running the program, you can verify the results of the select
operation by viewing the output file in the Data Browser.

The Set Selection Utility is usually used on the results of the
Relationship Linker. In this case, use the Input File Name and Input
DDL Name from the output of the Relationship Linker step.
Input files must be sorted by match keys.
Select Records
16-31
Select Records
The selection will be applied to records in the input file based on the
Match Key field specified by the user. Records with the same
match key values will be selected and written to the output file.
To select records
1.
From the Set Selection Utility step, click Advanced and

navigate to Input, Settings.
2.
In Match Key, select the DDL field from the drop-down list.
The program selects records which have this field.
Figure 16.6 Select Match Key

To set a limit on number of records or key sets
You can specify the maximum number of records to be selected
and the minimum number of records per Match Key set (a group
of records with the same value) to be selected. These settings
can be set separately for input and output.
1.
Click Advanced and navigate to Input, Settings for input

and Output, Settings for output.
2.
Refer to the table below and specify the appropriate values:
Setting
Description
Maximum Total
Records
Numeric value greater than or equal to 1. It

specifies the maximum total number of records of a
specific key set to process.
16-32
Select Records
Setting
Description
Minimum Records Per Numeric value. Any key set with a record count that
Set
is less than or equal to this value will be discarded
without processing.
See TS Quality
Reference
Guide and
Online Help
for complete
settings
information.
Maximum Records
Per Set
Numeric value. Any key set with a record count that

is equal to or exceeds this value will be discarded
without processing.
Maximum Set
Numeric value that limits the total number of key

sets to process.
To set a condition to select records

Next, you can specify a specific value for the match key that you
want to select. Use Select and Bypass Conditions to do this.
1.
Click Advanced and navigate to Input, Settings, Select

Record Conditions.
2.
Create a desired condition to select records. See Select or

Bypass Records on page 5-37 for building a condition.
Figure 16.7 Select Conditions
Sort Utility
16-33
Sort Utility
See TS Quality
Reference
Guide and
Online Help
for complete
settings
information.
The Sort Utility reads records from input data files and sorts them
to produce a single output file. The single output file is created in a
common shape with a single associated Data Dictionary Language
(DDL) file.
The sort functions support up to 99 sort keys. During the sort
step, you can select fields from input records to be written to the
output. This process is controlled through the input and output DDL
field-mapping function.
See Chapter 9 and Chapter 10 for details of the Sort
Utility.
16-34
Sort Utility
17-1
CHAPTER 17
Customizing the Control

Center
Customizing the Control Center
17-2
TS Quality allows you to customize your work area by changing
several configuration options:
fonts used within the Control Center
size and style of text
color used within the Control Center
Changing the Control Center Display Settings
17-3
Changing the Control Center Display

Settings
You can change the way that items in the Control Center are
displayed by modifying the following settings:
fonts for Menu, Text Viewer, Step Labels, Step Comments
and Project Labels
color for background and arrows
Any changes made here become the default until more
changes are made.
To open the Preference menu
You can also
right-click
anywhere
inside the Data
Flow Architect
(just not on a
specific step
module) and
select
Preference.
1.
Select Setup, Preferences. There are two tabs, General

and Display.
The General tab allows you to decide which applications
or functions launch when the Control Center opens.
See Set Up the Control Center on page 2-5 for
descriptions for the General tab.
The Display tab lets you select a new default font, style,
and size for the text appearing in the Menu, Text Viewer,
Step Labels, Step Comments, and Project labels. You can
17-4

also choose background and foreground colors for these
items.
Figure 17.1 Preference Display Tab

To change the Font Settings
1.
Select the Display tab. In the Category box, click the item
that you want to change.
2.
In Font Selection, select the new font, style, and size from
the pull-down menus on the right. As you make your
changes, the text in the Sample box reflects the changes.
3.
Click OK. The Preferences tab closes and the new font
settings are applied to the selected item.
To change the Color Settings

1.
Select the Display tab. In the Category box, click the item
that you want to change. The Color section becomes active.

2.
17-5
Click either the Foreground or Background button,

depending on which color you want to change. These buttons
invoke separate identical windows.
Swatches
The Swatches tab lets you choose a color from a palette of preset
colors:
a.
Click a color in the palette. The Preview section below

displays the selected color scheme. As you select a color
from the palette, your choice will be recorded in the
Recent: grid on the right. The far-left box in the first row
of the grid will be filled with the selected color. Each time
you select a color, the rest of the boxes will be filled in.
b.
Click OK. The Foreground/Background window closes.
HSB
The HSB tab lets you define the color by Hue (the colors tint),
Saturation (the hues purity), and Brightness (the colors
brightness):
a.
Select the color component that you want to change by

selecting either the Hue, Saturation, or Brightness
radio button. The Color box on the left changes based on
your selection.
b.
Move the slider up or down to shift though the color

spectrum. The Color box on the left adjusts accordingly.
c.
When you want to see how a color will appear, click on

the section of the Color box where the color appears. The
Preview section displays the selected color scheme.
d.
Click OK. The Foreground/Background window closes.
RGB
The RGB tab allows you to define a color as a combination of the
Red, Green, and Blue primary colors.
a.
Drag the sliders for the respective colors to the left or

right. The Preview section updates as the color
changes.
17-6

b.
3.
When you are satisfied with your selection, click OK. The
Foreground/Background window closes.
Once you are satisfied with the changes, click OK in the

Preferences window. The new color settings will be applied.
To change the background color

1.
Right-click anywhere inside the Data Flow Architect (not on a

specific step module) and select Preference, Background
Color.... The Background Color Chooser window opens.
2.
Refer to the steps above for changing the color settings.
3.
Click OK. The new color settings will be applied.
To change the arrow color

1.
Right-click anywhere inside the Data Flow Architect (not on a

specific step module) and select Preference, Arrow
Color.... The Arrow Color Chooser window opens.
2.
Refer to the steps above for changing the color settings.
3.
Click OK. The new color settings will be applied.
A-1
APPENDIX A
The Data Dictionary

Language and DDL Types
The Data Dictionary Language and DDL Types
A-2
The Data Dictionary Language
The Data Dictionary Language

The Data Dictionary Language (DDL) is a method for defining data
file and record layouts. The Data Dictionary Language file,
commonly referred to as DDL or DDL file, is a collection of keywords
that contains the definitions of the input and output files that are
used by TS Quality. DDL input and output files must be defined for
each module in TS Quality.
See Chapter 2, Working with a Project for structure and
components of DDLs and how to create them.
Data Dictionary Language (DDL) Types
A-3
Data Dictionary Language (DDL) Types

The Type must be specified for every DDL field entry. There are four
Type categories: Encoding (code page), Trillium Types, Date
Format, and Class Keyword.

Encoding is a mapping of binary values to code positions which
represent characters of data. It is also called a code page. The
following table is a list of the main character encoding used in TS
Quality.
Note that some encoding below may not be available
depending on the chosen module or GUI Tool. Contact
Customer Support for more information.
Table A-1: DDL Encoding

Type
Description
NOTRANS
NOTRANS means No Translation. The operations will be done in the

default encoding for the host computer.
NOTE: Users need to be careful that the data will not be
translated into their native encoding. For example, if a data
file from Greece is run on a computer in the US and both
the settings files and all of the fields in the DDL are set to
NOTRANS, you will likely get a different result than if the
same project was run in Greece.
ASCII
American Standard Code for Information Interchange. A 7-bit

encoding for representing English characters.
BIG5
Traditional Chinese
CCSID937
Traditional Chinese
CP037
EBCDIC, IBM037
CP1250
Latin 2, Eastern European
A-4

Type
Description
CP1251
Cyrillic (Slavic)
CP1252
Latin1 (ANSI)
CP1253
Greek
CP1254
Turkish
CP1255
Hebrew
CP1256
Arabic
CP1257
Baltic
CP1258
Vietnamese
CP932
Microsoft Extended Shift-JIS Japanese
CP936
Simplified Chinese, GBK
CP949
Korean
CP950
Traditional Chinese
EUCCN
Simplified Chinese, Unix, GB2312, EUC-SC
EUCJP
Japanese, Unix, EUC-JP, EUC-J, JEUC, J-EUC, EUCJ
EUCKR
Korean, Unix, EUC-KR, KS_C_5861-1992
EUCTW
Traditional Chinese, Unix, CNS-11643, CNS-11643-1992
GB12345
Traditional Chinese
HZGB2312
Simplified Chinese, HZ-GB-2312
IBM-83-4040
IBM-83-4242
Japanese corporate kanji code
ISO2022JP
Japanese, ISO-2022-JP
ISO-8859-7
Latin/Greek
ISO 8859-9
Latin-1 modification for Turkish (Latin-5)
A-5

Type
Description
JEF-83-A1A1
JEF-83-4040
JEF-78-A1A1
JEF-78-4040
Japanese corporate kanji code. Hitachi.
JOHAB
Korean
KEIS-83-A1A1
KEIS-83-4040
KEIS-78-A1A1
KEIS-78-4040
Japanese corporate kanji code. Fujitsu.
LATIN1
ISO 8859-1
LATIN2
ISO 8859-2
LATIN4
Baltic
LATIN7
Baltic
LATIN9
ISO 8859-15, Latin1 + Euro symbol and accented characters
UCS2
The encoding of Unicode as 16-bit values. This is the default

transformation format of Unicode. UCS2 is the same as UTF16.
UTF7
The encoding of Unicode as 7-bit values that can be transmitted

safely via E-mail (MIME messages).
UTF8
The encoding of Unicode as 8-bit values. In this encoding, all ASCII

characters are represented by themselves, and all bytes of multibyte characters have the eighth bit turned on. UTF8 is the default
encoding for XML.
UNICODE20:BIGENDIAN
Unicode with the most significant byte first. Other name: bigendian
UNICODE20:LITTLE
-ENDIAN
Unicode with the least significant byte first. Other name: littleendian
A-6
Trillium Types
Trillium Types
Trillium Type is a data type of DDL field. The following table is a list
of Types used in TS Quality. Many of these Types can be used
together (example: PACKED DECIMAL).
Table A-2: Trillium Types

Type
Description
ASCII NUMERIC
Numeric characters in ASCII.
BITFIELD
BITFIELD type is an array of bits embedded within a byte or an

array of bytes. They are treated as right-justified unsigned
integers. The length is specified by the LENGTH statement. The
starting bit position is specified by the POSITION statement.
Numbering schemes for identifying the position of bits are: littleendian - the smallest position number is at the far right of the
entity, big-endian - the smallest position number is at the far left
of the entity. One is used as the starting counting position.
BOOLEAN
BOOLEAN may also be qualified as INTEGER. They are treated as

right justified binary integers; however, fields with the value of
zero are considered to be equal to FALSE, while fields with a nonzero value are considered to be TRUE.
BINARY
Binary data type.
INTEGER
INTEGER types may be signed or unsigned. They are treated as

right justified binary integers of the length specified in the LENGTH
statement. Integers may also be qualified as BOOLEAN.
PACKED
PACKED types can be signed or unsigned. Packed decimal digits of

the length are specified in the LENGTH statement in bytes. They
are treated as right-justified. Since packed decimals are stored two
digits to a byte, the total number of digits is twice the length for
UNSIGNED PACKED and twice the length minus 1 for signed
PACKED. For signed values the right-most nibble holds the sign
value.
Trillium Types
A-7
Table A-2: Trillium Types

Type
Description
ZONED DECIMAL
The ZONED DECIMAL type is treated as EBCDIC NUMERIC

characters with the least significant byte divided into a numeric
digit and a sign. The sign occupies the least significant nibble of
the byte and follows the conventions for PACKED decimal signs.
A-8
Date Format
Date Format
Date format is a type of data which may contain only valid dates.
The following table contains a list of valid date formats.
Table A-3: DDL Date Format

Type
Data Format
ASCII AMERICAN
MM(/)DD(/)YYYY. 8 or 10 bytes.
ASCII EUROPEAN
DD(/)MM(/)YYYY. 8 or 10 bytes.
ASCII JULIAN
(YY)YY(/-)DDD. 5, 7, or 8 bytes.
ASCII LONG JULIAN
YYYY(/-)DDD. 7 or 8 bytes.
ASCII YEAR FIRST
YYYY(/)MM(/)DD. 8 or 10 bytes.
EBCDIC AMERICAN
MM(/)DD(/)YYYY. 8 or 10 bytes.
EBCDIC EUROPEAN
DD(/)MM(/)YYYY. 8 or 10 bytes.
EBCDIC JULIAN
(YY)YY(/-)DDD. 5, 7, or 8 bytes.
EBCDIC LONG JULIAN
YYYY(/-)DDD. 7 or 8 bytes.
EBCDIC YEAR FIRST
YYYY(/)MM(/)DD. 8 or 10 bytes.
PACKED AMERICAN
0MMDDYYYY. 5 bytes.
PACKED EUROPEAN
0DDMMYYYY. 5 bytes.
PACKED JULIAN
(YY)YYDDD. 3 or 4 bytes.
PACKED LONG JULIAN
YYYYDDD. 4 bytes.
PACKED YEAR FIRST
0YYYYMMDD. 5 bytes.
UNSIGNED PACKED AMERICAN
MMDDYYYY. 4 bytes.
UNSIGNED PACKED EUROPEAN
DDMMYYYY. 4 bytes.
UNSIGNED PACKED JULIAN
0(YY)YYDDD. 3 or 4 bytes.
UNSIGNED PACKED LONG JULIAN
0YYYYDDD. 4 bytes.
Date Format
A-9
Table A-3: DDL Date Format

Type
Data Format
UNSIGNED PACKED YEAR FIRST
YYYYMMDD. 4 bytes.
SJIS IMPERIAL DATE
Japanese date format with imperial calendar. CP932

or Shift-JIS encoding only.
Example:
30 1 1
You must use valid month/day
combinations. If the month/day is invalid,
the output data is blanked out.
SJIS JAPANESE DATE
Japanese date format with Gregorian calendar.

CP932 or Shift-JIS encoding only.
Example:
1997 1 1
ASCII ROMAJI IMPERIAL DATE
Japanese date format with shortened imperial year.

ASCII encoding only.
Example:
S35-1-1
A-10
CLASS Keyword
CLASS Keyword
Class keyword specifies the format to be used for the date field. By
using the class keyword, you can convert any 2-digit year into a 4digit year.
The following table describes all specifications for the CLASS
keyword.
Table A-4: DDL CLASS Keyword

Statement
Description
DATE FORWARD
Converts any 2-digit year into a 4-digit year when the data value is
equal to, or greater than, the current year.
Top of date window = current year + 99
Bottom of date window = current year
Example
If the current year is 2005:
Top of date window = 2104 (2005 + 99 = 2104)
Bottom of date window = 2005
DATE BACKWARD
Converts any 2-digit year into a 4-digit year when the data value is
equal to, or less than, the current year.
Top of date window = current year
Bottom of date window = current year 99
Example
Top of date window = 2005
Bottom of date window = 1906 (2005 99 = 1906)
CLASS Keyword
A-11
Table A-4: DDL CLASS Keyword (Continued)

Statement
Description
DATE WINDOW
{nnn}
Converts a 2-digit year into a 4-digit year, according to a userspecified date window. You can specify 1 to 4-digit numbers in
{nnn}.
----------------------------------------------------------------Top of date window = if {nnn} >100 and {nnn} > current year, then
{nnn} is the top of the date window.
Bottom of date window = top of the date window - 99
If the current year is 1999: CLASS IS DATE WINDOW 2030
Top of date window = 2030 (2030 > 100 and > the current year)
Bottom of date window = 1931 (2030 99 = 1931)
----------------------------------------------------------------Top of date window = bottom of the date window + 99
Bottom of date window = If {nnn} >100 and {nnn} < current year,
then {nnn} is the bottom of the date window.
(1967 + 99 = 2066)
Bottom of date window = 1967 (1967 > 100 but < the current year)
----------------------------------------------------------------Top of date window = If {nnn} > 0 and {nnn} < 100, then top of the
date window is current year + nnn
Bottom of date window = current year + nnn -99.
(30 > 0 but < 100, 1999 + 30)
Bottom of date window = 1930 (1999 + 30 -99).
----------------------------------------------------------------Top of date window = current year nnn + 99.
Bottom of date window = If {nnn} < 0, then bottom of date window
is current year nnn
CLASS IS DATE WINDOW -30
Top of date window = 2068 (1999 - 30 + 99)
Bottom of date window = 1969 (-30 < 0, 1999 - 30).
A-12
CLASS Keyword
B-1
APPENDIX B
Parser Review Code
Parser Review Code
B-2
Parser Results
Parser Results
The Parser generates Completion Codes and Review Codes to
identify specific conditions that occur for each record being parsed.
You can review these codes to analyze the Parser results.
Parser Completion Codes (CDP/BDP)

Table B-1 shows the return codes that appear in the pr_
completion_code or bp_completion_code field, in output from
the Customer Data Parser or Business Data Parser Repository.
Several errors (2, 3, 4, 5, 8, D) are caused by inaccuracies
in the file path.
Table B-1: Parser (CDP/BDP) Completion Codes

Value
Description
No error
Insufficient Storage. When using DDL field sub-segments (line_01a, line_01b,

etc.) and the sum of the data in these fields exceeds the redefine field length, the
data is truncated and a value of 1 is returned. Processing continues normally for
all other lines.
Table Error Pattern, Word and/or City tables not found
Log File Error
Detail File Error
Pattern-Word-City Tab Error Pattern, Word and/or City tables not readable.
Too Many Tokens
Line Definition Error
Display File Error
Invalid Parser Handle
Invalid Parm File Entry
Customer Data Parser Review Code/Review Groups
B-3
Table B-1: Parser (CDP/BDP) Completion Codes (Continued)

Value
Description
Invalid Interface Call Type. Must be either: O=Open, P=Process, C=Close
Invalid Service Call Type. Must be either:

D=Send to Display File
E=Supply Error Text
Statistics File Error
Parser not successfully initialized. Settings file may not be correctly defined. Check path and
file name.
Customer Data Parser Review Code/Review

Groups
Review codes are produced for many different data conditions.
These codes can be evaluated in a post parsing process to trigger
specific record handling or review. For instance, if a business
wanted to review every record that had received a review code of
26 (Unknown Name Pattern), a subsequent step following the
parsing process could redirect all records with this condition by
selecting the records with this code.
The code values are represented on the record by position in the
DDL field that corresponds to the line type that contained the
condition. The field names that must be used in the CDP output DDL
to contain these codes are:
pr_name_review_codes
pr_street_review_codes
pr_geog_review_codes
pr_misc_review_codes
pr_global_review_codes
Parser Review Code
B-4
For each of these fields, a flag value of '1' is placed in the position in
the field that corresponds to the value of the condition. So in our
earlier example, where a review code of 26 was reported, you would
find a '1' in the field pr_street_review_codes at position 26.
Table B-2 lists review codes, review groups, and descriptions for the
Customer Data Parser.
Table B-2: CDP Review Codes and Review Groups

Review
Code
Review
Group
Description
Review Codes Can Belong To Multiple Review Code Fields:

000
000
No review code found
008
Unknown name pattern
009
Standardized first name too long
009
Display first name too long
009
Total number of export names gt max
009
Standardized middle name too long
009
Display middle name too long
009
Too many middle names
009
Standardized last name too long
009
Display last name too long
10
009
Standardized title too long
11
009
Display title too long
12
009
Too many titles
13
009
Standardized connector too long
14
009
Display connector too long
Name Codes
B-5
Table B-2: CDP Review Codes and Review Groups (Continued)

Review
Code
Review
Group
Description
15
009
Standardized relation too long
16
009
Display relation too long
17
009
Standardized business too long
18
009
Display business too long
19
009
Derived genders conflict
20
009
Standardized generation too long
21
009
Display generation too long
22
010
More than one middle name
26
011
Unknown street pattern
27
011
Standardized street type too long
28
011
Display street type too long
29
011
Too many street types
30
012
Standardized direction too long
31
012
Display direction too long
32
012
Too many directions
33
013
Standardized street title too long
34
013
Display street title too long
35
013
Standardized complex name too long
36
013
Display complex name too long
37
013
Standardized house number too long
38
013
Display house number too long
Street Codes
Parser Review Code
B-6

Review
Code
Review
Group
Description
39
013
Unusual house number
40
013
Display dwelling too long
41
013
Standardized dwelling too long
42
013
Too many dwellings
43
013
Unusual dwelling value
44
013
Too many dwelling values
45
013
Display box too long
46
013
Standardized box too long
47
013
Unusual box value
48
013
Display route too long
49
013
Standardized route too long
50
013
Standardized route number too long
51
013
Display route number too long
52
013
Unusual route value
53
013
Standardized complex type too long
54
013
Display complex type too long
55
013
Standardized dwelling number too long
56
013
Standardized box number too long
57
013
Display box number too long
58
013
Display dwelling number too long
59
020
Duplicate street line types
B-7

Review
Code
Review
Group
Description
Geography Codes
61
014
No city name found in records
62
014
No state found in records
63
014
Standardized city too long
64
014
Display city too long
66
015
Standardized state/province/county too long
67
015
Display state/province/county too long
70
015
Standardized country too long
71
015
Display country too long
72
015
Standardized neighborhood too long
73
015
Display neighborhood too long
74
015
Standardized post code too long
75
015
Display post code too long
76
015
Unusual post code value
77
016
Corrected city name too long
78
000
City name change used for city
79
017
Conflicting geographic types
80
018
Domestic city name present but could not be verified
Global Review Codes

83
001
Unidentified token
84
019
Unidentified line
85
001
Invalid token definitions
Parser Review Code
B-8
Review Group Hierarchy

Review
Code
Review
Group
Description
86
001
Label or label element too long
87
001
Miscellaneous data for line too long
88
001
Too many categories
89
001
Too many names for export
90
002
Mixed name forms present
91
003
Hold mail element present
92
004
Foreign address element found
93
005
No names identified
94
006
No street identified
95
007
No geography identified
96 - 99
Currently unassigned

Table B-3 displays the default review group hierarchy for the
Customer Data Parser. The review group code is placed in the
PREPOS field: pr_rev_group.
The Review Group Order setting (Process, Settings) can
be used to modify the group hierarchy.
Table B-3: CDP Review Group Hierarchy

Review
Group
Text of Parser Report
001
Unidentified token
Description
B-9
Table B-3: CDP Review Group Hierarchy (Continued)

Review
Group
Description
005
No names identified
No name found on the record

For example:
12 main street
Boston MA 01123
006
No street identified
No street information found on the record.

For example:
John Smith
Boston MA 01123
007
No geography identified
No Geography information found on the

record. For example:
John Smith
12 main street
014
No city or county
identified
Record did not contain city or county, or

could not be identified
019
Unidentified Line
Line type could not be determined, and is

set to ?
008
Unknown name pattern
Pattern for name format does not exist in

table. For example:
John Smith B A C D
12 main street
Boston MA 01123
011
Unknown street pattern
Pattern for street format does not exist in

table
013
Unusual or long address
When the length of the street name exceeds

25 bytes as defined in prepos.ddl
012
Invalid directional
Direction is inconsistent
017
Conflicting geography
types
The country default is US and the valid city

state is followed by foreign type postal code.
For example:
Mr John Smith
12 main street
Boston MA A1C 3R4
Parser Review Code
B-10

Review
Group
Description
015
Geography too long
The length of the geography exceeds 30

bytes as defined in prepos.ddl
018
Unable to verify city

name
City name cannot be identified
016
Corrected city name too

long
Table entry for a city change recode exceeds

25 bytes as defined in prepos.ddl
020
Multiple street line types
More than one street line is found on the

record
010
More than one middle

name
Two or more middle names were found on

the name line. For example:
John Adam Wilson Smith
12 main street
Boston MA 01123
009
Derived genders conflict
The title and first name gender value are

different. For example:
Miss John Smith
12 main street
Boston MA 01123
004
Foreign address
Parser found a geography element outside

the country that the Parser is running. For
example:
John Smith
12 main street
Boston France 01123
003
Hold mail
One of the lines on the record is of type H

(such as Return Mail)
For example:
John Smith
Return Mail
12 main street
Boston MA 01123
Business Data Parser Review Code
B-11

Review
Group
Description
002
Mixed name forms
A business and personal name were both

found on the record.
For example:
John Smith
ABC corp
12 main street
Boston MA 01123
000
No review code found
No identifiable error on record. For example:

John Smith
12 main street
Boston MA 01123

Table B-4 lists the review codes and descriptions for the Business
Data Parser.
Table B-4: BCP Review Code

Review
Code
Description
Review Codes can belong to multiple Review Code Fields:

082
Unidentified pattern
087
Miscellaneous line too long
086
Label Line too long
088
Too many categories found
083
Unknown token
090
No data found
000
No targeted conditions found
Parser Review Code
B-12
Customer Data Parser Review Codes/Review Groups for Asia-Pacific Countries
Customer Data Parser Review Codes/Review

Groups for Asia-Pacific Countries
Review Codes
The Customer Data Parser for China, Korea, and Taiwan generates
review codes for each record to highlight specific conditions
describing how the record was processed. Review codes are written
to the pr_review_code field. The following table lists the individual
review codes.
Review Code
Description
000
No review codes written.
009
Unknown token remaining after processing.
010
Unknown tokens remaining after matching front and back parts of string.
041
Business branch value too long.
042
Business type value too long.
043
Business name value too long.
044
No business name.
045
Business name and type too long for field.
051
Surname value not found in lookup table.
052
Surname value too long.
053
First (given) name value too long.
054
No surname.
055
No given name.
056
Honorific not found in table.
057
Honorific too long.
058
Unidentified token.
Review Groups
Review Code
Description
921
Recoded business name according to a word/pattern table entry to

correct a mistyped or misused word.
B-13
Review Groups
Review groups are groups of review codes that illustrate types of
conditions present in the data, whereas review codes describe
actual specific conditions. Thus, review groups provide a way for
users to quickly understand the general types of conditions
occurring in a data. Review groups are written to the pr_review_
group field.
Review Group
Description
002
Missing name (no business name and missing either first or last name).
003
Business name does not contain a business keyword from the word/
pattern file.
004
There are unknown tokens in the business or personal name.
005
There is a contact name in the business name.
Parser Review Code
B-14
Exception Status
Exception Status
The Customer Data Parser for Japan generates exception status for
each record to highlight specific conditions. Exception status are
written to the pr_status/pr_h_status fields. The following table lists
the individual exception status.
The following table shows values for the exception status.
Value
Description
Mode
00
No specific condition occurred.
ALL
20
No input string found (including comments deleted)
ALL
22
Unknown token found in the final mask.
ALL
30
Multiple business types found.
BNP
CLUE
31
Only business type found.
BNP
CLUE
32
The record consists of branch name or branch name suffix only.
BNP
CLUE
33
*All parsed string is in alphabet with word delimiter, and no business clue found.
BNP
CLUE
34
All parsed string is in katakana with word delimiter, and no business clue found.
BNP
CLUE
35
All parsed string is in hiragana with word delimiter, and no business clue found.
BNP
CLUE
36
Multiple words (space delimited) in kanji and no business clue

found.
BNP
CLUE
40
Single person mode is ON and more than two words found.
PNP
* Word separation : spaces between characters, and characters set

for PNP_DELIMITER.
Index
Index
A
ABSOLUTE
Data Comparison Calculator 11-21
ALPHA
Parsing Customization 7-17
Asian Characters
operators 5-33
Associativity
Data Reconstructor 13-6
Asterisks (*)
Attribute
DDL Editor 3-11
Attribute Modifiers
Category 7-11
Function 7-11
Gender 7-11
Recode 7-12
Attributes
B
Batch Script
create a script 14-3
Edit a script 14-4
Run a script 14-5
batch script 14-3
Binary Data Strings
blanks
Frequency Count Utility 16-17
BNP 6-11
BNP_CLUE 6-11
Build a Conditional Statement 5-35
I-1
Business Attribute 6-18

BPREPOS 6-29
include unknowns in standard
original field 6-34
populate unknown patterns 6-34
Repository DDL File 6-30
retain original values 6-34
Business Data Parser (BDP) 6-27
B-11
C
Category
Attribute Modifiers 7-11
Character Translation 5-9
City Directory File 6-17
City Name Changes
Locality 7-15
Post Town 7-15
CJKTOARABICNUM operator 5-30
CJKTOFULL operator 5-29
CJKTOHALF operator 5-28
Class
DDL A-10
DDL Editor 3-11
CLASS Keyword A-10
Class keyword
DDL 2-42
collating sequence
Sort Utility 9-5
Colleting sequence
ASCII 9-6
EBCDIC 9-6
FOLDED_ASCII 9-6
FOLDED_EBCDIC 9-6
MULTI_NATIONAL 9-6
Comman line execution
Program names 15-4
I-2
Index
syntax 15-3
Comment
DDL Editor 3-11
Comment Lines
Comments
Common Fields
Create Common Utility 12-6
Commonization
Comparison Routine
Comparison Routines
Relationship Linker 10-13
Completion Codes
Business Data Parser 6-36
Customer Data Parser 6-25
COMPOSE or COMP 5-30
Conditionals 5-21
Logic Builder 5-35
Operators 5-26
Syntax 5-21
IF/ELSE Statement 5-21
Control Center
Data Flow Architect 2-20
Graphics View 2-21
List View 2-26
Project Panel 2-16
Project Viewer 2-17
Step Viewer 2-19
Conventions
Parsing Cutomization 7-21
Country Settings 4-7
Create Common
Decision Routines 12-12
Common Fields 12-6
Commonization 12-3
Decision Routines 12-7, 12-8

Match Key Level settings 12-6
Survivor record 12-8
Survivorship 12-3
Create New Project Wizard 2-3
Csutomer Data Parser
Join name lines 6-22
CTOSIMPCHINESE operator 5-30
CTOTRADCHINESE operator 5-30
Business Attribute 6-18
City Directory File 6-17
INPUT_LINE_01 6-3
Line Definitions 6-19
Name Generation 6-21
Parsing Logic Flow 6-4
PREPOS 6-12
Preprocess House Number 6-18
Repository DDL File 6-14
Review Group Hierarchy B-8
Split address lines 6-23
Word Pattern Definition Fil 6-17
Customer Data Parser (CDP) 6-3
Customer Data Parser Review Code
B-3
Customer Data Parser Review Groups

B-3
Customer Data Pasrser

Exceptions File 6-16
Customized Definitions Table
Customizing the Control Center 17-3
Color settings 17-4
Display tab 17-3
Font settings 17-4
General tab 17-3
D
Data Browser 3-3
Index
Field Selection 3-4

Save view 3-6
ABSOLUTE 11-21
Comparison Modifiers 11-21
Comparison Routines 11-21
comparison test 11-21
PARTIAL1 11-21
Score 11-21
Data Dictionary Editor 3-9
Data Dictionary Language (DDL) 2-33
Data Flow Architect
Control Center 2-20
Associativity 13-6
Binary Data Strings 13-9
Comments 13-7
Conditions 13-11
Fields 13-4
literal values 13-4
operators 13-6
Precedence 13-6
Reserved Words 13-5
ruels file 13-3
rule script language 13-4
Rules File
Action Statements 13-15
Logical Operators 13-13
String Variables 13-19
Use Rule 13-22
Data Reconstructor Rules 13-5
Date format
DDL 2-42
DDL
Attributes 2-35
CLASS 2-36
Class keyword 2-42
Comment 2-35
Date format 2-42
I-3
DDL Builder 2-37

Default 2-35
Encoding 2-42
Field Name 2-34
Keywords 2-34
Length 2-35
methods of creating 2-33
Record Length 2-34
Record Name 2-34
Redefine 2-34, 2-40
Start Position 2-34
syntax 2-39
text format 2-33
Type 2-34
Type keyword 2-42
XML format 2-33
DDL Editor
Attribute 3-11
Class 3-11
Comment 3-11
Default 3-11
Field Name 3-10
Length 3-10
Record Length 3-10
Record Name 3-10
Redef 3-10
Start Position 3-10
Type 3-10
Update ORIGINAL_RECORD
Length 3-10
DDL Types A-3
Decision Routines
Create Common 12-12
Create Common Utility 12-7, 12-8
Default
DDL Editor 3-11
Delete
Delimited file
I-4
Index
creating a DDL 2-38

Delimited Files
DDL considerations 2-33
Delimiters
File Display Utility 16-5
Director
Cleansing Server 14-12
Matching Server 14-12
Real-Time Processing 14-11
Dual Address Information 9-17
E
Encoding
DDL 2-42
Error Report
Field and Pattern Lists 11-18
Exporting projects 14-7
F
Field and Pattern Lists
Error Report 11-18
Field Files
Field List Editor
Comparison Routine 11-14
Description 11-14
Field Name 11-14
Propagation Routine 11-14
Routine Modifier 11-14
Score 11-14
Field Name
DDL Editor 3-10
Field Scanning 5-10
Field Selection
Data Browser 3-4
Field Settings
Fields

Delimiters 16-5
Field settings 16-8
Inner Key 16-3
Outer Key 16-3
File Qualifier 5-4
File Update Utility 16-10
master file 16-10
match key 16-10, 16-15
transaction file 16-10
Full-width (Zenkaku) and half-width
(Hankaku) Japanese Characters
5-31
Function
G
Gender
Global Data Router 4-3
Country Rules file 4-6
Country Settings 4-7
DDL Settings 4-9
Fields Settings 4-9
Global Geography Table 4-6
Global Rules file 4-6
NOMATCH file 4-4
Rules Files 4-6
Separate Output 4-3
Single Output 4-4
Global Geography Table 4-6
Grade Pattern Editor
Category 11-14
Field Name Columns 11-15
Pattern ID 11-14
Graphics View
Control Center
Index
H
Help
Control Center 2-8
HIRAGANASTOL operator 5-30
How to Use Operators for Asian
Characters 5-33
HYPHEN
I
IF Statements
IF/ELSE
Import projects
Windows to Unix 14-10
Importing projects 14-7, 14-9
Inner Key
Insert
J
JKANATOROMAN operator 5-29
JROMANTOKANA operator 5-29
K
KTOROMAN operator 5-29
L
Length
DDL Editor 3-10
Line Definitions 6-19
Line Lengths
Line Type
I-5
Geography 7-8
Miscellaneous 7-8
Name 7-8
Street 7-8
Line Types
Linking File
Relationship Linker 10-18, 10-25
List View
Control Center
literal data string
Literal Values
M
MALINK
Resolve Utility 16-27
Mask 5-17
Transformer 5-17
mask shapes
Masks
Recodes
master file
Match Key
File Update Utility 16-10, 16-15
Set Selection Utility 16-31
Match Key Level Settings
Create Commmon Utility 12-6
Match Level Codes
Postal Matchers 9-15
Match Master Duplicate File
Match Master File
I-6
Index

Match Tran Duplicate File
Match Tran File
Matching
TS Quality Analyzer 8-4
Merge Files
Merge Split Utility 16-21
merge keys
merge keys 16-19
split rules 16-19
Modify
multi-linking
Multiple Definitions
N
Name and Address Format
project 2-13
NUMERIC
O
Operations
Operators
Operators for Asian Characters 5-28
Outer Key
P
Parser Customization Editor

Parser Tables 6-17
Parsing Customization
Attributes 7-10
City Name Changes
for non-US cities 7-14
for US Cities 7-14
Comment lines 7-21
Conventions 7-21
Customized Definitions Table 7-3
Delete 7-7
Insert 7-7
Line Lengths 7-21
Line Types 7-8
Masks 7-6
Modify 7-7
Multiple Definitions 7-15
Operations 7-6
Patters 7-15
Phrase 7-5
Quotation Marks 7-21
Special Entries 7-14
Standard Definitions Table 7-3
Sub-tokens 7-5
Synonym 7-12
Syntax of Definitions 7-4
Tokens 7-4
User-defined Attributes 7-10
PARTIAL1
Partition Method
Pattern Files
pattern problems
Bad Name Patterns 7-37
Patterns
Index
Pasing Customization 7-15

Phrase
PNP 6-10
Portion of a Field
Positions
Beginning 7-9
Default 7-9
Ending 7-9
Postal Directory Browser 9-20
City Level 9-20
Street Details 9-22
Street Level 9-21
Postal Matchers 9-9
Census Tables 9-9
DPV Tables 9-9
Match Level Codes 9-15
Postal Base Data File 9-12
Postal Directories 9-9
Postal Form Customer 9-13
Postal Form Database Date 9-13
Postal Form File 9-12
Postal Form Job Number 9-13
Postal Form List 9-13
Postal Level1 Data File 9-12
Postal Level2 Data File 9-12
Prcustom 6-16
Precedence
Preferences
17-3
General 2-6, 2-7

Help 2-8
Preprocess House Number 6-18
Program Names
I-7
Command line execution 15-4

project
Control Center
project step 2-28
creating 2-9
input data and input DDL 2-12
multi-country 2-11
Name and Address Format 2-13
Properties 2-16
settings 2-10
summary 2-14
type 2-3, 2-10
custom project 2-3
standard project 2-3
Project Panel
Control Center 2-16
Project Step
Control Center 2-28
Project Viewer
Control Center 2-17
Projects
exporting 14-7
importing 14-7
Q
Quotation Marks
R
Real-Time Processing 14-11
Director 14-11
Recode
Record Length
DDL Editor 3-10
Record Name
DDL Editor 3-10
Redef
DDL Editor 3-10
I-8
Index
Redefine
DDL 2-40
reference file
Reference Level1 Number
Reference Level2 Number
Reference Linking
Reference Record ID
Business level 10-13
Comparison Routines 10-13
Consumer level 10-13
Field Files 10-21
Pattern Files 10-21
Reference File 10-24
Reference Level1 Number 10-26
Reference Level2 Number 10-26
Reference Linking 10-13, 10-24
Reference Record ID 10-26
Window Key 10-3
Window Linking 10-13, 10-18
Window Size 10-22
Relationship Linker Results Analyzer
11-2, 11-3
fields to display 11-7
matched records 11-5
records to display 11-9
suspect records 11-5
Relationship Linker Rule Editor 11-12
Field List Editor 11-13, 11-14
Grade Pattern Editor 11-13, 11-14
Reserved Words
MALINK 16-27
multi-linking 16-27
transitivity 16-27
Review Codes
Review Groups
ROMAJITOHIRAGANA or RTH 5-29
Round Robin Keys
Round Robin Number
Routine Modifier
Comparison Routine 11-21
Rules File
rules file
Rules Files 4-6
S
Save view
Data Browser 3-6
Score
Select or Bypass Records 5-37
Select/Bypass Records
Logic Builder 5-37
Set Selection Utility 16-30
Sort Fields
Sort Utility 9-5
Sort Utility 16-33
.srt 9-2
Collating sequence 9-5
for Postal Matchers 9-2
Index
JUST_DUPS 9-7
KEEP_ALL 9-6
KEEP_NONE 9-6
KEEP_ONE 9-6
Sort Fields 9-5
Source Identification
Transformer 13-31
Special Entries
Split a File
split rules
Standard Definitions Table
Standardization
TS Qulity Analyzer 8-4
Start Pos
DDL Editor 3-10
Step Viewer
Control Center 2-19
Sub-tokens
Survivor record
Survivorship
Synonym
Syntax
Command line execution 15-3
T
Table Recoding 5-17
Title
I-9
Tokens
transaction file
Transformer 5-2
Character Translation 5-9
Field Scanning 5-10
File Trace Key 5-38
hex conversion 5-9
Source Identification 13-31
Table Recoding 5-17
transitivity
Trillium Types A-6
TS Discovery 3-12
TS Quality Analyzer 8-3
Cleansing 8-4
Master Database 8-8
Matching 8-4, 8-8
Standardization 8-4
Type
DDL Editor 3-10
U
Underscores
in city name changes 7-14
Unmatch Master File
Update ORIGINAL_RECORD Length
DDL Editor 3-10
US City Problems
User Rule
User-Defined Attributes
Using Multiple Input Files to Create an
Output DDL 5-7
I-10
Index
V
View input data
Data Browser 3-3
W
Window Key
Sort 10-10
Window Key Field 10-7
Window Key Generator 10-3
Window Key Rule 10-3
Window Keys 10-3
Window Key Rule

Window Key Rules
definition 10-6
Window Keys
Window Linking
Window Size
Word Pattern Definition File 6-17

TS Quality UserGuide

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

TS Quality UserGuide

Uploaded by

Copyright:

Available Formats

Creating and Working with TS Quality

Creating and Working with TS Quality Projects

Trillium Software System is a registered trademark of Harte-Hanks. UNIX is a registered

Copyright Trillium Software a division of Harte-Hanks, Inc. 2006

Creating and Working with TS Quality Projects

Introduction ................................................................ 1-1

Working with a Project ............................................ 2-1

Investigating Your Data .......................................... 3-1

Using the Global Steps ............................................. 4-1

Cleansing Your Data ................................................. 5-1

Standardizing Your Data ......................................... 6-1

Tuning the Parsing Rules ........................................ 7-1

Creating and Working with TS Quality Projects

Analyzing Single Data .............................................. 8-1

Enriching Your Data .................................................. 9-1

Linking Your Data .................................................... 10-1

Tuning the Linking Rules ....................................... 11-1

Selecting the Best Record ..................................... 12-1

Creating and Working with TS Quality Projects

Manipulating Your Data ......................................... 13-1

Packaging Projects ................................................. 14-1

Working from the Command Line ....................... 15-1

Working with the TS Quality Utilities ................ 16-1

Customizing the Control Center .......................... 17-1

The Data Dictionary Language and DDL Types .A-1

Creating and Working with TS Quality Projects

Parser Review Code ..................................................B-1

Creating and Working with TS Quality Projects

Customer business name

Customer identification numbers

The goal of this sample project is to create a consolidated customer

Creating and Working with TS Quality Projects

Working with a Project

Working with a Project

Creating and Working with TS Quality Projects

Working with a Project

Using the Control Center

Using the Control Center

Make sure that

Double-click the TS Quality v10.5 Server icon on the

Double-click the TS Quality v10.5 Control Center icon on

The Start Up screen appears.

Figure 2.1 Start Up Screen

Creating and Working with TS Quality Projects

Set Up the Control Center

The Control Centers main window is behind the Start Up

Figure 2.2 Control Center Main Window

Set Up the Control Center

Set Up the Control Center

Select Setup from the main menu.

Select Preferences. There are two tabs, General and

Figure 2.3 General Preference

Set Up the Control Center

The General tab allows you to decide which applications or

Determines how the Control Center handles projects upon

Select one or more of these check boxes to determine which

Enter the directory where Help files are stored.

Enter the path and executable file of the application used to

Working with a Project

Set Up the Control Center

Enter the path and executable file of your Internet browser,

Enter the directory path used to launch TS Discovery.

See Changing the Control Center Display Settings on

Select Setup, Preferences.