ReferenceImport

ReferenceImport

The ReferenceImport utility builds Loqate reference datasets. Running the utility with no parameters will display the following information:

Usage: referenceimport <Source data file> <Source metadata file> <Table name> <Unicode table path>
Source data file format is tab-delimited UTF8 text.
Source metadata file format is tab-delimited UTF8 text containing:
- Field Name
- Field Type (Blank means same as Field Name)
- Content Type (Text,Range,Addition,Interpolate,InterpolateRange)
- Confidence
The first line contains the record type and overall confidence


Source Metadata File Syntax

This file is in TAB Delimited UTF-8 text format. The syntax is as follows:

LINE 1: RECORD TYPE (Optional)<TAB>DATASET CONFIDENCE(Optional)

Where:

  • RECORD TYPE is the data type to be assigned to a complete record within the dataset. Setting this value specifies that a single record within the dataset completely and uniquely describes an instance of this data type (e.g. Address).
  • DATASET CONFIDENCE is the default confidence value to be assigned to interpretations suggested by this dataset. Valid values are 1-255, where 255 signifies that complete confidence can be assigned to the correctness of the data within this dataset. Default 255.

LINE 2+: FIELD NAME<TAB>COMPONENT NAME<TAB>DATA TYPE<TAB>FIELD CONFIDENCE(Optional)

Where:

  • FIELD NAME is the local or colloquial name for this field. e.g. Municipality in Canada, or Prefecture in Japan. This value can be left blank. A value of ‘null’ indicates that the field should be ignored and not compiled into the resultant dataset.
  • COMPONENT NAME is the Loqate field type of this field.
  • DATA TYPE is one of the following values:
    • Text (or the deprecated TextFullField) means that a full text index will be built for this field (this should be used for most fields)
    • Range means that the field is a compressed range format (see below for syntax)
    • InterpolateRange means that the field is a compressed range format that can be used to generate interpolated values for Interpolate fields
    • Interpolate means this field is a pipe-delimited list of numeric values to be used in conjunction with an InterpolateRange field to generate an interpolated value
    • Addition means this field is added to any matched records, but is never used during the search and matching process
  • FIELD CONFIDENCE is an optional value from 1-255 which overrides the DATASET CONFIDENCE for individual fields.

Example:

Address<TAB>200
Zip<TAB>PostalCodePrimary<TAB>Text
State<TAB>AdministrativeArea<TAB>Text<TAB>205
City<TAB>Locality<TAB>Text<TAB>220
Street<TAB>Thoroughfare<TAB>Text<TAB>220
<TAB>Premise<TAB>Range
<TAB>SubBuilding<TAB>Range


Source Data File Syntax

This file is in TAB Delimited UTF-8 text format. It reflects the format specified by the associated Metadata file.
‘Text’ fields may contain a list of aliases, delimited by pipe. An alias prefixed with $ indicates that this is an invalid alias that can be used for matching but will be replaced by the primary alias (first entry) in the supplied field data.
‘Range’ and ‘InterpolateRange’ fields have the following syntax:
  • VAL1[[%|*] VAL2][{PREFIX][}POSTFIX][|…]
  • VAL1 is the start value of the range.  Value can be either numeric or a single alpha character.
  • VAL2 is the end value of the range.  Value can be either numeric or a single alpha character, and must match the VAL1 type. Both VAL2 > VAL1 or VAL1 > VAL2 are acceptable.
  • The range step indicator can either be a ‘%’ which indicates step 2, or a ‘*’ which indicates step 1. e.g. ‘1%5’ matches the numbers 1, 3 and 5, whereas ‘1*5’ matches the numbers 1, 2, 3, 4 and 5.
  • PREFIXes and POSTFIXes are optional, and preceded by ‘{‘ or ‘}’ respectively.
  • Multiple ranges can be chained together using the pipe character as a delimiter.
For example a line matching the above metadata format could look like:
94158<TAB>CA<TAB>SAN FRANCISCO|$FRISCO|$SF<TAB>BERRY ST<TAB>300%399<TAB>1201*1245{STE |1301*1345{STE