You are on page 1of 4

Technical Brief: Customizing the Database of a V8 Profile

SGOS 4, 5 Series

Customizing the Database of a V8 Profile


Introduction
By default, Reporter V8 profiles store a certain set of log fields to the database generated during log processing.
Also by default, during log processing, V8 profiles create a certain list of pair and triplet sets across subsets of
these fields in order to support improved report generation performance for reports that summarize counter data
across combinations of database fields (such as a Sites by User report, for example). If desired, it is possible
to customize V8 profiles to store additional log fields to the database and/or to create additional pair and triplet
sets to support custom reports on either new fields or new combinations of existing fields. It is also possible
to change the query and path strings, which by default are stored as string bag fields, to be stored as dataset
fields. A V8 Profile database may be customized by editing the profiles .cfg configuration file, found under the
LogAnalysisInfo/Profiles folder, using a text editor. This document will explain how to make these changes.

Limitations
All profile database configuration changes must be made before any data is processed for a profile. If data has
already been processed for a profile for which database customizations are desired, it is necessary to delete the
profiles database folder, found under the LogAnalysisInfo/Databases folder. These will delete all data that
has previously been processed for a profile. After making the database customizations, new log data may be
processed, or log data that was previously processed and deleted may be reprocessed.

Adding new dataset fields to the database


As an example, we will modify a profile to store the s-supplier-name field as a dataset. The first step is to add
the log field definition. The profiles .cfg file is organized hierarchically into nodes delimited by curly braces ({
and } characters). The name of each node, followed by an equal sign, precedes the nodes opening brace. Log
fields are defined in the profiles .cfg file in the main profile section (which is named after the profile), under the
log node in the fields section. Here is the default log field definition for the cs-username field:
log = {
.
.
.
fields = {
cs_username = {
label = "$lang_stats.field_labels.cs_username"
type = "flat"
index = "0"
name = "cs-username"
db_field = "cs_username"
} # cs_username

To add s-supplier-name, we need to create a similar node for it. At the end of the nodes under the log fields
node, we would add the following new node:

Technical Brief: Customizing the Database of a V8 Profile

s_supplier_name = {
label = Supplier Name"
type = "flat"
index = "0"
name = "s-supplier-name"
db_field = "s_supplier_name"
} # s_supplier_name

The name of the node does not really matter, though it makes sense to choose a name that matches the new
field we are adding. The label value should be set to the desired text for the field being added. The name value
must match the name of the field found in the log header. The db_field value must match the name of the new
database field node that we will add in the next step. This tells Reporter to associate the value found for this log
field with the new database field.
The next step is to add a new database field node, under the database section in the fields node of the
main profile node. Here is the default database node for the cs_username database field:
database = {
.
.
.
fields = {
cs_username = {
label = "$lang_stats.db_field_labels_v8.cs_username"
label_plural = "$lang_stats.db_field_labels_v8.cs_usernames"
log_field = "cs_username"
type = "string"
dataset_field = "true"
initial_dataset_size = "8192"
case_insensitive = "true"
} # cs_username

To add a new database field for supplier name, we would add a new similar node at the end of the database
fields nodes under the fields node, under the database node:
s_supplier_name = {
label = "Supplier"
label_plural = "Suppliers"
log_field = "s_supplier_name"
type = "string"
dataset_field = "true"
initial_dataset_size = "8192"
case_insensitive = "false"
} # cs_username

The label and label_plural values define the text shown in Reporter for this new field. The log_field value must
match the name of the new log field node created in the previous step. Because we are creating a new dataset
field, the type field should always be string, and the dataset_field value should be true. The initial_dataset_
size value should be set to some power of 2, based on the expected number of different values to be found in

Technical Brief: Customizing the Database of a V8 Profile

the new field. Choose a maximum of 8192 for values for which a large number of possible values are expected.
For some types of fields, it may make sense to choose a smaller power of two. For example, the default initial_
dataset_size value for the cs_uri_scheme field is set to 8, because there are not a lot of possible scheme values
expected in this field (http, https, ftp, tcp, etc.). Set the case_insensitive value to true only if the field will
contan values that vary by case ,such as Bill and bill, that should be treated as the same value. Otherwise
set this value to false.
After completing these changes and saving the new profile .cfg file, the values found for s-supplier-name in
the logs will now be stored in the database under the field Supplier. This new field may be added to reports
using the report editor.

Adding new dataset pairs and triplets


When Reporter generates reports that summarize counter data across multiple fields, such as hosts and
users, or users and categories, report generation performance will be much better if Reporter is able to use
pre-computed counter summary dataset files. By default, during log processing Reporter generates pair and
triplet dataset files to support the default reports. If new multi-field reports are created that either use new
dataset fields or new combinations of fields not found in the default reports, report generation performance
will be significantly slower because Reporter will have to compute the cross-field counter summary values
on the fly during report generation. To improve report generation performance for such cases, it is possible to
change a profile to generation additional pair and triplet cross-field summary files. For example, if after adding
the new Supplier field, we would like to generate a report that shows suppliers by category and host, we can
add a new triplet dataset to support this report. The dataset pairs and triplets that Reporter builds during log
processing are specified under the multi-datasets node under the database node under the main profile
node in the profiles .cfg file. To add a new dataset triplet, we can simply copy one of the existing triplets and
then change the dataset number and field names as needed. In this case, we need to add a new node that looks
like this to the end of the child nodes under multi-datasets:
25 = {

field_one = "s_supplier_name"
field_two = "cs_host"
field_three = " sc_filter_category"
} # 25

While adding new dataset pairs and triplets will speed up report generation for summary reports that use them,
there is a trade off in that each new dataset added incrementally slows down log processing performance. It is
also possible to remove unneeded dataset pairs and triplets. All changes to these multi-field datasets must be
made before any logs are processes for a profile.

Changing a string bag field to a dataset field


By default, Reporter stores the path and query strings of a URL in a different way than it stores other string
values in the logs. The path and query strings are stored in .bag files, rather than .set files. This behavior is
designed to optimize performance based on the expectation that there will be a relatively low level of repetition
of the values found in the logs for path and query strings. Reporter can show values from the .bag files only
in detail reports, and is not able to filter on these fields. It is possible to change the path and or the query
strings to be stored as dataset fields, enabling them to be shown in summary reports and enabling filtering
on these fields. The tradeoff is that as dataset fields, these values may tend to require a lot of memory and

Technical Brief: Customizing the Database of a V8 Profile

lower the total amount of log lines that may be successfully processed for the profile. These changes are made
by modifying the values under the field definitions found under the fields node, under the database node,
under the main profile node in the profiles .cfg file. For example, here is the default node for the query field:
cs_uri_query = {
label = "$lang_stats.db_field_labels_v8.cs_uri_query"
label_plural = "$lang_stats.db_field_labels_v8.cs_uri_queries"
log_field = "cs_uri_query"
type = "string"
omit_from_log_detail = "true"
filter_field = "false"
string_bag_field = "true"
} # cs_uri_query

To change the query field to be stored as a dataset value, we would change this node to look like this:
cs_uri_query = {
label = "$lang_stats.db_field_labels_v8.cs_uri_query"
label_plural = "$lang_stats.db_field_labels_v8.cs_uri_queries"
log_field = "cs_uri_query"
type = "string"
omit_from_log_detail = "true"
dataset_field = "true"
initial_dataset_size = "8192"
} # cs_uri_query

A profile with this change will store the query value as a dataset field so that reports may be filtered by the
query value and the query value may be added to summary reports. Following the steps above, it is also
possible to create dataset pairs and triplets that include the query value.

Blue Coat Systems, Inc.


www.bluecoat.com

Corporate Headquarters

Sunnyvale, CA USA // +1.408.220.2200

EMEA Headquarters

Hampshire, UK // +44.1252.554600

APAC Headquarters

Hong Kong // +852.3476.1000

Copyright 2009 Blue Coat Systems, Inc. All rights reserved worldwide. No part of this document may be reproduced by any means nor translated to any electronic medium without the written consent of Blue Coat Systems, Inc. Specifications
are subject to change without notice. Information contained in this document is believed to be accurate and reliable, however, Blue Coat Systems, Inc. assumes no responsibility for its use. Blue Coat, ProxySG, PacketShaper, ProxyClient and
BlueSource are registered trademarks of Blue Coat Systems, Inc. in the U.S. and worldwide. All other trademarks mentioned in this document are the property of their respective owners. v.TB-CUSTOMIZING_DB_V8PROFILE-v3-0409

You might also like