Miscellaneous¶
-
recordlinkage.
index_split
(index, chunks)¶ Function to split pandas.Index and pandas.MultiIndex objects.
Split
pandas.Index
andpandas.MultiIndex
objects into chunks. This function is based onnumpy.array_split()
.Parameters: - index (pandas.Index, pandas.MultiIndex) – A pandas.Index or pandas.MultiIndex to split into chunks.
- chunks (int) – The number of parts to split the index into.
Returns: list – A list with chunked pandas.Index or pandas.MultiIndex objects.
-
recordlinkage.
get_option
(pat)¶ Retrieves the value of the specified option.
The available options with its descriptions:
- classification.return_type : str
- The format of the classification result. The value ‘index’ returns the classification result as a pandas.MultiIndex. The MultiIndex contains the predicted matching record pairs. The value ‘series’ returns a pandas.Series with zeros (distinct) and ones (matches). The argument value ‘array’ will return a numpy.ndarray with zeros and ones. [default: index] [currently: index]
- indexing.pairs : str
Specify the format how record pairs are stored. By default, record pairs generated by the toolkit are returned in a pandas.MultiIndex object (‘multiindex’ option).
Valid values: ‘multiindex’ [default: multiindex] [currently: multiindex]
Parameters: pat (str) – Regexp which should match a single option. Note: partial matches are supported for convenience, but unless you use the full option name (e.g. x.y.z.option_name), your code may break in future versions if new options with similar names are introduced. Returns: result (the value of the option) Raises: OptionError : if no such option exists
-
recordlinkage.
set_option
(pat, value)¶ Sets the value of the specified option.
The available options with its descriptions:
- classification.return_type : str
- The format of the classification result. The value ‘index’ returns the classification result as a pandas.MultiIndex. The MultiIndex contains the predicted matching record pairs. The value ‘series’ returns a pandas.Series with zeros (distinct) and ones (matches). The argument value ‘array’ will return a numpy.ndarray with zeros and ones. [default: index] [currently: index]
- indexing.pairs : str
Specify the format how record pairs are stored. By default, record pairs generated by the toolkit are returned in a pandas.MultiIndex object (‘multiindex’ option).
Valid values: ‘multiindex’ [default: multiindex] [currently: multiindex]
Parameters: - pat (str) – Regexp which should match a single option. Note: partial matches are supported for convenience, but unless you use the full option name (e.g. x.y.z.option_name), your code may break in future versions if new options with similar names are introduced.
- value – new value of option.
Returns: None
Raises: OptionError if no such option exists
-
recordlinkage.
reset_option
(pat)¶ Reset one or more options to their default value.
Pass “all” as argument to reset all options.
The available options with its descriptions:
- classification.return_type : str
- The format of the classification result. The value ‘index’ returns the classification result as a pandas.MultiIndex. The MultiIndex contains the predicted matching record pairs. The value ‘series’ returns a pandas.Series with zeros (distinct) and ones (matches). The argument value ‘array’ will return a numpy.ndarray with zeros and ones. [default: index] [currently: index]
- indexing.pairs : str
Specify the format how record pairs are stored. By default, record pairs generated by the toolkit are returned in a pandas.MultiIndex object (‘multiindex’ option).
Valid values: ‘multiindex’ [default: multiindex] [currently: multiindex]
Parameters: pat (str/regex) – If specified only options matching prefix* will be reset. Note: partial matches are supported for convenience, but unless you use the full option name (e.g. x.y.z.option_name), your code may break in future versions if new options with similar names are introduced. Returns: None
-
recordlinkage.
describe_option
(pat, _print_desc=False)¶ Prints the description for one or more registered options.
Call with not arguments to get a listing for all registered options.
The available options with its descriptions:
- classification.return_type : str
- The format of the classification result. The value ‘index’ returns the classification result as a pandas.MultiIndex. The MultiIndex contains the predicted matching record pairs. The value ‘series’ returns a pandas.Series with zeros (distinct) and ones (matches). The argument value ‘array’ will return a numpy.ndarray with zeros and ones. [default: index] [currently: index]
- indexing.pairs : str
Specify the format how record pairs are stored. By default, record pairs generated by the toolkit are returned in a pandas.MultiIndex object (‘multiindex’ option).
Valid values: ‘multiindex’ [default: multiindex] [currently: multiindex]
Parameters: Returns: - None by default, the description(s) as a unicode string if _print_desc
- is False