API Reference

The package is divided into two main independent subpackages: rdm.db and rdm.wrappers.

Database interaction

Databases can be accessed via different so-called data sources. You can add your own data source by subclassing the base rdm.db.datasource.DataSource class.

Base DataSource

class rdm.db.datasource.DataSource[source]

A data abstraction layer for accessing datasets.

This layer is typically hidden from end-users, as they only access the database through DBConnection and DBContext objects.

column_values(table, col)[source]

Returns a list of distinct values for the given table and column.

param table:target table
param cols:list of columns to select
connect()[source]
Returns:a connection object.
Return type:DBConnection
connected(tables, cols, find_connections=False)[source]

Returns a list of tuples of connected table pairs.

param tables:a list of table names
param cols:a list of column names
param find_connections:
 set this to True to detect relationships from column names.
return:a tuple (connected, pkeys, fkeys, reverse_fkeys)
fetch(table, cols)[source]

Fetches rows for the given table and columns.

param table:target table
param cols:list of columns to select
return:rows from the given table and columns
rtype:list
fetch_types(table, cols)[source]

Returns a dictionary of field types for the given table and columns.

param table:target table
param cols:list of columns to select
return:a dictionary of types for each attribute
rtype:dict
foreign_keys()[source]
Returns:a list of foreign key relations in the form (table_name, column_name, referenced_table_name, referenced_column_name).
Return type:list
select_where(table, cols, pk_att, pk)[source]

Select with where clause.

param table:target table
param cols:list of columns to select
param pk_att:attribute for the where clause
param pk:the id that the pk_att should match
return:rows from the given table and cols, with the condition pk_att==pk
rtype:list
table_column_names()[source]
Returns:a list of table / column names in the form (table, col_name).
Return type:list
table_columns(table_name)[source]
Parameters:table_name – table name for which to retrieve column names
Returns:a list of columns for the given table.
Return type:list
table_primary_key(table_name)[source]

Returns the primary key attribute name for the given table.

param table_name:
 table name string
tables()[source]
Returns:a list of table names.
Return type:list

MySQLDataSource

class rdm.db.datasource.MySQLDataSource(connection)[source]

A DataSource implementation for accessing datasets from a MySQL DBMS.

__init__(connection)[source]
Parameters:connection – a DBConnection instance.

PgSQLDataSource

class rdm.db.datasource.PgSQLDataSource(connection)[source]

A DataSource implementation for accessing datasets from a PosgreSQL DBMS.

__init__(connection)[source]
Parameters:connection – a DBConnection instance.

Database Context

A DBContext object represents a view of a particular data source that can be used for learning. Example uses include: selecting only particular tables, table columns, a target attribute, and so on.

class rdm.db.context.DBContext(connection, target_table=None, target_att=None, find_connections=False, in_memory=True)[source]
__init__(connection, target_table=None, target_att=None, find_connections=False, in_memory=True)[source]

Initializes a new DBContext object from the given DBConnection.

Parameters:
  • connection – a DBConnection instance
  • target_table – set a target table for learning
  • target_att – set a target table attribute for learning
  • find_connections – set to True if you want to detect relationships based on attribute and table names, e.g., train_id is the foreign key refering to id in table train.
  • in_memory – Load the database into main memory (currently required for most approaches and pre-processing)
copy()[source]

Makes a deepcopy of the DBContext object (e.g., for making folds)

returns:a deep copy of self.
rtype:DBContext
fetch(table, cols)[source]

Fetches rows from the db.

param table:table name to select
cols:list of columns to select
return:list of rows
rtype:list
fetch_types(table, cols)[source]

Returns a dictionary of field types for the given table and columns.

param table:target table
param cols:list of columns to select
return:a dictionary of types for each attribute
rtype:dict
rows(table, cols)[source]

Fetches rows from the local cache or from the db if there’s no cache.

param table:table name to select
cols:list of columns to select
return:list of rows
rtype:list
select_where(table, cols, pk_att, pk)[source]

SELECT with WHERE clause.

param table:target table
param cols:list of columns to select
param pk_att:attribute for the where clause
param pk:the id that the pk_att should match
return:rows from the given table and cols, with the condition pk_att==pk
rtype:list

Database converters

Converters are used to change the representation of the input database to a native representation of a particular algorithm.

class rdm.db.converters.Converter(dbcontext)[source]

Base class for converters.

__init__(dbcontext)[source]

Base class for handling converting DBContexts to various relational learning systems.

param dbcontext:
 DBContext object for a learning problem
class rdm.db.converters.ILPConverter(*args, **kwargs)[source]

Base class for converting between a given database context (selected tables, columns, etc) to inputs acceptable by a specific ILP system.

param discr_intervals:
 (optional) discretization intervals in the form:
>>> {'table1': {'att1': [0.4, 1.0], 'att2': [0.1, 2.0, 4.5]}, 'table2': {'att2': [0.02]}}

given these intervals, e.g., att1 would be discretized into three intervals: att1 =< 0.4, 0.4 < att1 =< 1.0, att1 >= 1.0

param settings:dictionary of setting: value pairs
mode(predicate, args, recall=1, head=False)[source]

Emits mode declarations in Aleph-like format.

param predicate:
 predicate name
param args:predicate arguments with input/output specification, e.g.:
>>> [('+', 'train'), ('-', 'car')]
param recall:recall setting (see Aleph manual)
param head:set to True for head clauses
user_settings()[source]

Emits prolog code for algorithm settings, such as :- set(minpos, 5)..

class rdm.db.converters.RSDConverter(*args, **kwargs)[source]

Converts the database context to RSD inputs.

Inherits from ILPConverter.

all_examples(pred_name=None)[source]

Emits all examples in prolog form for RSD.

param pred_name:
 override for the emitted predicate name
background_knowledge()[source]

Emits the background knowledge in prolog form for RSD.

class rdm.db.converters.AlephConverter(*args, **kwargs)[source]

Converts the database context to Aleph inputs.

Inherits from ILPConverter.

__init__(*args, **kwargs)[source]
Parameters:discr_intervals – (optional) discretization intervals in the form:
>>> {'table1': {'att1': [0.4, 1.0], 'att2': [0.1, 2.0, 4.5]}, 'table2': {'att2': [0.02]}}

given these intervals, e.g., att1 would be discretized into three intervals: att1 =< 0.4, 0.4 < att1 =< 1.0, att1 >= 1.0

Parameters:
  • settings – dictionary of setting: value pairs
  • target_att_val – target attribute value for learning.
background_knowledge()[source]

Emits the background knowledge in prolog form for Aleph.

negative_examples()[source]

Emits the negative examples in prolog form for Aleph.

positive_examples()[source]

Emits the positive examples in prolog form for Aleph.

class rdm.db.converters.OrangeConverter(*args, **kwargs)[source]

Converts the selected tables in the given context to Orange example tables.

convert_table(table_name, cls_att=None)[source]

Returns the specified table as an orange example table.

param table_name:
 table name to convert
cls_att:class attribute name
rtype:orange.ExampleTable
orng_type(table_name, col)[source]

Returns an Orange datatype for a given mysql column.

param table_name:
 target table name
param col:column to determine the Orange datatype
other_Orange_tables()[source]

Returns the related tables as Orange example tables.

Return type:list
target_Orange_table()[source]

Returns the target table as an Orange example table.

rtype:orange.ExampleTable
class rdm.db.converters.TreeLikerConverter(*args, **kwargs)[source]

Converts a db context to the TreeLiker dataset format.

param discr_intervals:
 (optional) discretization intervals in the form:
>>> {'table1': {'att1': [0.4, 1.0], 'att2': [0.1, 2.0, 4.5]}, 'table2': {'att2': [0.02]}}

given these intervals, e.g., att1 would be discretized into three intervals: att1 =< 0.4, 0.4 < att1 =< 1.0, att1 >= 1.0

dataset()[source]

Returns the DBContext as a list of interpretations, i.e., a list of facts true for each example in the format for TreeLiker.

default_template()[source]

Default learning template for TreeLiker.

Algorithm wrappers

The rdm.wrappers module provides classes for working with the various algorithm wrappers.

Aleph

This is a wrapper for the very popular ILP algorithm Aleph. Aleph is an ILP toolkit with many modes of functionality: learning theories, feature construction, incremental learning, etc. Aleph uses mode declarations to define the syntactic bias. Input relations are Prolog clauses, defined either extensionally or intensionally.

Official documentation.

See Getting started for an example of using Aleph in your python code.

class rdm.wrappers.Aleph(verbosity=0)[source]

Aleph python wrapper.

__init__(verbosity=0)[source]

Creates an Aleph object.

param logging:Can be DEBUG, INFO or NOTSET (default).

This controls the verbosity of the output.

__weakref__

list of weak references to the object (if defined)

induce(mode, pos, neg, b, filestem='default', printOutput=False)[source]

Induce a theory or features in ‘mode’.

param filestem:The base name of this experiment.
param mode:In which mode to induce rules/features.
param pos:String of positive examples.
param neg:String of negative examples.
param b:String of background knowledge.
return:The theory as a string or an arff dataset in induce_features mode.
rtype:str
set(name, value)[source]

Sets the value of setting ‘name’ to ‘value’.

param name:Name of the setting
param value:Value of the setting
setPostScript(goal, script)[source]

After learning call the given script using ‘goal’.

param goal:goal name
param script:prolog script to call
settingsAsFacts(settings)[source]

Parses a string of settings.

param setting:String of settings in the form:

set(name1, val1), set(name2, val2)...

RSD

RSD is a relational subgroup discovery algorithm (Zelezny et al, 2001) composed of two main steps: the propositionalization step and the (optional) subgroup discovery step. RSD effectively produces an exhaustive list of first-order features that comply with the user-defined mode constraints, similar to those of Progol (Muggleton, 1995) and Aleph.

See Example use case for an example of using RSD in your code.

class rdm.wrappers.RSD(verbosity=0)[source]

RSD python wrapper.

__init__(verbosity=0)[source]

Creates an RSD object.

param logging:Can be DEBUG, INFO or NOTSET (default).

This controls the verbosity of the output.

__weakref__

list of weak references to the object (if defined)

induce(b, filestem='default', examples=None, pos=None, neg=None, cn2sd=True, printOutput=False)[source]

Generate features and find subgroups.

param filestem:The base name of this experiment.
param examples:Classified examples; can be used instead of separate pos / neg files below.
param pos:String of positive examples.
param neg:String of negative examples.
param b:String with background knowledge.
param cn2sd:Find subgroups after feature construction?
return:a tuple (features, weka, rules), where:
  • features is a set of prolog clauses of generated features,
  • weka is the propositional form of the input data,
  • rules is a set of generated cn2sd subgroup descriptions; this will be an empty string if cn2sd is set to False.
rtype:tuple
set(name, value)[source]

Sets the value of setting ‘name’ to ‘value’.

param name:Name of the setting
param value:Value of the setting
settingsAsFacts(settings)[source]

Parses a string of settings.

param setting:String of settings in the form:

set(name1, val1), set(name2, val2)...

TreeLiker

TreeLiker (by Ondrej Kuzelka et al) is suite of multiple algorithms (controlled by the algorithm setting), RelF, Poly and HiFi:

RelF constructs a set of tree-like relational features by combining smaller conjunctive blocks. The novelty is that RelF preserves the monotonicity of feature reducibility and redundancy (instead of the typical monotonicity of frequency), which allows the algorithm to scale far better than other state-of-the-art propositionalization algorithms.

HiFi is a propositionalization approach that constructs first-order features with hierarchical structure. Due to this feature property, the algorithm performs the transformation in polynomial time of the maximum feature length. Furthermore, the resulting features are the smallest in their semantic equivalence class.

Official website

Example usage:

>>> context = DBContext(...)
>>> conv = TreeLikerConverter(context)
>>> treeliker = TreeLiker(conv.dataset(), conv.default_template())   # Runs RelF by default
>>> arff, _ = treeliker.run()
class rdm.wrappers.TreeLiker(dataset, template, test_dataset=None, settings={})[source]

TreeLiker python wrapper.

__init__(dataset, template, test_dataset=None, settings={})[source]
Parameters:
  • dataset – dataset in TreeLiker format
  • template – feature template
  • test_dataset – (optional) test dataset to transform with the features from the training set
  • settings – dictionary of settings (see TreeLiker documentation)
run(cleanup=True, printOutput=False)[source]

Runs TreeLiker with the given settings.

param cleanup:deletes temporary files after completion
param printOutput:
 print algorithm output to the terminal

Wordification

Wordification (Perovsek et al, 2015) is a propositionalization method inspired by text mining that can be viewed as a transformation of a relational database into a corpus of text documents. Wordification constructs simple, easily interpretable features, acting as words in the transformed Bag-Of-Words representation.

Example usage:

>>> context = DBContext(...)
>>> orange = OrangeConverter(context)
>>> wordification = Wordification(orange.target_Orange_table(), orange.other_Orange_tables(), context)
>>> wordification.run(1)
>>> wordification.calculate_weights()
>>> arff = wordification.to_arff()
class rdm.wrappers.Wordification(target_table, other_tables, context, word_att_length=1, idf=None)[source]
__init__(target_table, other_tables, context, word_att_length=1, idf=None)[source]

Wordification object constructor.

param target_table:
 Orange ExampleTable, representing the primary table
param other_tables:
 secondary tables, Orange ExampleTables
__weakref__

list of weak references to the object (if defined)

att_to_s(att)[source]

Constructs a “wordification” word for the given attribute

param att:Orange attribute
calculate_weights(measure='tfidf')[source]

Counts word frequency and calculates tf-idf values for words in every document.

param measure:example weights approach (can be one of tfidf, binary, tf).
prune(minimum_word_frequency_percentage=1)[source]

Filter out words that occur less than minimum_word_frequency times.

param minimum_word_frequency_percentage:
 minimum frequency of words to keep
run(num_of_processes=4)[source]

Applies the wordification methodology on the target table

param num_of_processes:
 number of processes
to_arff()[source]

Returns the “wordified” representation in ARFF.

rtype:str
wordify()[source]

Constructs string of all documents.

return:document representation of the dataset, one line per document
rtype:str

Proper

class rdm.wrappers.Proper(input_dict, is_relaggs)[source]
__dict__ = dict_proxy({'__module__': 'rdm.wrappers.proper.proper', 'run': <function run>, 'init_args_list': <function init_args_list>, 'parse_excluded_fields': <function parse_excluded_fields>, '__dict__': <attribute '__dict__' of 'Proper' objects>, '__weakref__': <attribute '__weakref__' of 'Proper' objects>, '__doc__': None, '__init__': <function __init__>})
__init__(input_dict, is_relaggs)[source]
__module__ = 'rdm.wrappers.proper.proper'
__weakref__

list of weak references to the object (if defined)

init_args_list(input_dict, is_relaggs)[source]
parse_excluded_fields(context)[source]
run()[source]

Tertius

class rdm.wrappers.Tertius(input_dict)[source]
__dict__ = dict_proxy({'__module__': 'rdm.wrappers.tertius.tertius', 'run': <function run>, 'init_args_list': <function init_args_list>, '__dict__': <attribute '__dict__' of 'Tertius' objects>, '__weakref__': <attribute '__weakref__' of 'Tertius' objects>, '__doc__': None, '__init__': <function __init__>})
__init__(input_dict)[source]
__module__ = 'rdm.wrappers.tertius.tertius'
__weakref__

list of weak references to the object (if defined)

init_args_list(input_dict)[source]
run()[source]

OneBC

class rdm.wrappers.OneBC(input_dict, is1BC2)[source]
__dict__ = dict_proxy({'__module__': 'rdm.wrappers.tertius.onebc', 'run': <function run>, 'init_args_list': <function init_args_list>, '__dict__': <attribute '__dict__' of 'OneBC' objects>, '__weakref__': <attribute '__weakref__' of 'OneBC' objects>, '__doc__': None, '__init__': <function __init__>})
__init__(input_dict, is1BC2)[source]
__module__ = 'rdm.wrappers.tertius.onebc'
__weakref__

list of weak references to the object (if defined)

init_args_list(input_dict)[source]
run()[source]

Caraf

class rdm.wrappers.Caraf(input_dict)[source]
__dict__ = dict_proxy({'__module__': 'rdm.wrappers.caraf.caraf', 'run': <function run>, '__dict__': <attribute '__dict__' of 'Caraf' objects>, '__weakref__': <attribute '__weakref__' of 'Caraf' objects>, '__doc__': None, '__init__': <function __init__>})
__init__(input_dict)[source]
__module__ = 'rdm.wrappers.caraf.caraf'
__weakref__

list of weak references to the object (if defined)

run()[source]

Utilities

This section documents helper utilities provided by the python-rdm package that are useful in various scenarios.

Mapping unseen examples into propositional feature space

When testing classifiers (or in a real-world scenario) you’ll need to map unseen (or new) examples into the feature space used by the classifier. In order to do this, use the rdm.db.mapper function.

See Example use case for usage in a cross-validation setting.

rdm.db.mapper.domain_map(features, feature_format, train_context, test_context, intervals={}, format='arff', positive_class=None)[source]

Use the features returned by a propositionalization method to map unseen test examples into the new feature space.

param features:string of features as returned by rsd, aleph or treeliker
param feature_format:
 ‘rsd’, ‘aleph’, ‘treeliker’
param train_context:
 DBContext with training examples
param test_context:
 DBContext with test examples
param intervals:
 discretization intervals (optional)
param format:output format (only arff is used atm)
param positive_class:
 required for aleph
return:returns the test examples in propositional form
rtype:str
Example:
>>> test_arff = mapper.domain_map(features, 'rsd', train_context, test_context)

Validation

Python-rdm provides a helper function for splitting a dataset into folds for cross-validation.

See Example use case for a cross-validation example using RSD.

rdm.validation.cv_split(context, folds=10, random_seed=None)[source]

Returns a list of pairs (train_context, test_context), one for each cross-validation fold.

The split is stratified.

param context:DBContext to be split
param folds:number of folds
param random_seed:
 random seed to be used
return:returns a list of (train_context, test_context) pairs
rtype:list
Example:
>>> for train_context, test_context in cv_split(context, folds=10, random_seed=0):
>>>     pass  # Your CV loop