Miscellaneous

recordlinkage.index_split(index, chunks)

Function to split pandas.Index and pandas.MultiIndex objects.

Split pandas.Index and pandas.MultiIndex objects into chunks. This function is based on numpy.array_split().

Parameters:
  • index (pandas.Index, pandas.MultiIndex) – A pandas.Index or pandas.MultiIndex to split into chunks.

  • chunks (int) – The number of parts to split the index into.

Returns:

list – A list with chunked pandas.Index or pandas.MultiIndex objects.

recordlinkage.get_option(pat)

Retrieves the value of the specified option.

The available options with its descriptions:

classification.return_typestr

The format of the classification result. The value ‘index’ returns the classification result as a pandas.MultiIndex. The MultiIndex contains the predicted matching record pairs. The value ‘series’ returns a pandas.Series with zeros (distinct) and ones (matches). The argument value ‘array’ will return a numpy.ndarray with zeros and ones. [default: index] [currently: index]

indexing.pairsstr

Specify the format how record pairs are stored. By default, record pairs generated by the toolkit are returned in a pandas.MultiIndex object (‘multiindex’ option).

Valid values: ‘multiindex’ [default: multiindex] [currently: multiindex]

Parameters:

pat (str) – Regexp which should match a single option. Note: partial matches are supported for convenience, but unless you use the full option name (e.g. x.y.z.option_name), your code may break in future versions if new options with similar names are introduced.

Returns:

result (the value of the option)

:raises OptionError : if no such option exists:

recordlinkage.set_option(pat, value)

Sets the value of the specified option.

The available options with its descriptions:

classification.return_typestr

The format of the classification result. The value ‘index’ returns the classification result as a pandas.MultiIndex. The MultiIndex contains the predicted matching record pairs. The value ‘series’ returns a pandas.Series with zeros (distinct) and ones (matches). The argument value ‘array’ will return a numpy.ndarray with zeros and ones. [default: index] [currently: index]

indexing.pairsstr

Specify the format how record pairs are stored. By default, record pairs generated by the toolkit are returned in a pandas.MultiIndex object (‘multiindex’ option).

Valid values: ‘multiindex’ [default: multiindex] [currently: multiindex]

Parameters:
  • pat (str) – Regexp which should match a single option. Note: partial matches are supported for convenience, but unless you use the full option name (e.g. x.y.z.option_name), your code may break in future versions if new options with similar names are introduced.

  • value – new value of option.

Returns:

None

Raises:

OptionError if no such option exists

recordlinkage.reset_option(pat)

Reset one or more options to their default value.

Pass “all” as argument to reset all options.

The available options with its descriptions:

classification.return_typestr

The format of the classification result. The value ‘index’ returns the classification result as a pandas.MultiIndex. The MultiIndex contains the predicted matching record pairs. The value ‘series’ returns a pandas.Series with zeros (distinct) and ones (matches). The argument value ‘array’ will return a numpy.ndarray with zeros and ones. [default: index] [currently: index]

indexing.pairsstr

Specify the format how record pairs are stored. By default, record pairs generated by the toolkit are returned in a pandas.MultiIndex object (‘multiindex’ option).

Valid values: ‘multiindex’ [default: multiindex] [currently: multiindex]

Parameters:

pat (str/regex) – If specified only options matching prefix* will be reset. Note: partial matches are supported for convenience, but unless you use the full option name (e.g. x.y.z.option_name), your code may break in future versions if new options with similar names are introduced.

Returns:

None

recordlinkage.describe_option(pat, _print_desc=False)

Prints the description for one or more registered options.

Call with not arguments to get a listing for all registered options.

The available options with its descriptions:

classification.return_typestr

The format of the classification result. The value ‘index’ returns the classification result as a pandas.MultiIndex. The MultiIndex contains the predicted matching record pairs. The value ‘series’ returns a pandas.Series with zeros (distinct) and ones (matches). The argument value ‘array’ will return a numpy.ndarray with zeros and ones. [default: index] [currently: index]

indexing.pairsstr

Specify the format how record pairs are stored. By default, record pairs generated by the toolkit are returned in a pandas.MultiIndex object (‘multiindex’ option).

Valid values: ‘multiindex’ [default: multiindex] [currently: multiindex]

Parameters:
  • pat (str) – Regexp pattern. All matching keys will have their description displayed.

  • _print_desc (bool, default True) – If True (default) the description(s) will be printed to stdout. Otherwise, the description(s) will be returned as a unicode string (for testing).

Returns:

  • None by default, the description(s) as a unicode string if _print_desc

  • is False