Methods

class gatree.gatree.GATree(max_depth=None, random=None, fitness_function=None, n_jobs=1, random_state=None)

Bases: BaseEstimator

Evolutionary decision tree classifier. The GATree classifier is a decision tree classifier that is trained using a genetic algorithm. The genetic algorithm is used to evolve a population of trees over multiple generations. The fitness of each tree is evaluated using a fitness function, which is used to select the best trees for crossover and mutation.

Parameters:
  • max_depth (int, optional) – Maximum depth of the tree.

  • random (Random, optional) – Random number generator.

  • fitness_function (function, optional) – Fitness function for the genetic algorithm.

  • n_jobs (int, optional) – Number of jobs to run in parallel.

  • random_state (int, optional) – Seed for reproducibility.

max_depth

Maximum depth of the tree.

Type:

int, optional

random

Random number generator.

Type:

Random

X

Training data.

Type:

pandas.DataFrame

y

Target values.

Type:

pandas.Series

att_indexes

Array of attribute indexes.

Type:

numpy.ndarray

att_values

Dictionary of attribute values.

Type:

dict

class_count

Number of classes.

Type:

int

fitness_function

Fitness function for the genetic algorithm.

Type:

function

n_jobs

Number of jobs to run in parallel.

Type:

int

random_state

Seed for reproducibility.

Type:

int

_tree

The fitted tree.

Type:

Node

_best_fitness

List of best fitness values for each iteration.

Type:

list

_avg_fitness

List of average fitness values for each iteration.

Type:

list

static default_fitness_function(root, **fitness_function_kwargs)

Default fitness function for the genetic algorithm.

Parameters:

root (Node) – Root node of the tree.

Returns:

The fitness value.

Return type:

float

fit(X, y, population_size=150, max_iter=2000, mutation_probability=0.1, elite_size=1, selection_tournament_size=2, fitness_function_kwargs={})

Fit a tree to a training set. The population size, maximum iterations, mutation probability, elite size, and selection tournament size can be specified.

Parameters:
  • X (pandas.DataFrame) – Training data.

  • y (pandas.Series) – Target values.

  • population_size (int, optional) – Size of the population.

  • max_iter (int, optional) – Maximum number of iterations.

  • mutation_probability (float, optional) – Probability of mutation.

  • elite_size (int, optional) – Number of elite trees.

  • selection_tournament_size (int, optional) – Number of trees in tournament.

  • fitness_function_kwargs (dict, optional) – Additional kwargs to be passed to the fitness_funciton.

Returns:

The fitted tree.

Return type:

Node

plot(node=None, prefix='')

Plot the decision tree with nodes and leaves.

Parameters:
  • node (Node, optional) – Current node to plot.

  • prefix (str, optional) – Prefix for the current node.

predict(X)

Predict classes for the given data.

Parameters:

X (pandas.DataFrame) – Data to predict.

Returns:

Predicted classes.

Return type:

list

set_fit_request(*, elite_size: bool | None | str = '$UNCHANGED$', fitness_function_kwargs: bool | None | str = '$UNCHANGED$', max_iter: bool | None | str = '$UNCHANGED$', mutation_probability: bool | None | str = '$UNCHANGED$', population_size: bool | None | str = '$UNCHANGED$', selection_tournament_size: bool | None | str = '$UNCHANGED$') GATree

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
  • elite_size (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for elite_size parameter in fit.

  • fitness_function_kwargs (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for fitness_function_kwargs parameter in fit.

  • max_iter (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for max_iter parameter in fit.

  • mutation_probability (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for mutation_probability parameter in fit.

  • population_size (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for population_size parameter in fit.

  • selection_tournament_size (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for selection_tournament_size parameter in fit.

Returns:

self – The updated object.

Return type:

object

class gatree.methods.gatreeclassifier.GATreeClassifier(max_depth=None, random=None, fitness_function=None, n_jobs=1, random_state=None)

Bases: ClassifierMixin, GATree

Evolutionary decision tree classifier. The GATree classifier is a decision tree classifier that is trained using a genetic algorithm. The genetic algorithm is used to evolve a population of trees over multiple generations. The fitness of each tree is evaluated using a fitness function, which is used to select the best trees for crossover and mutation.

Parameters:
  • max_depth (int, optional) – Maximum depth of the tree.

  • random (Random, optional) – Random number generator.

  • fitness_function (function, optional) – Fitness function for the genetic algorithm.

  • n_jobs (int, optional) – Number of jobs to run in parallel.

  • random_state (int, optional) – Seed for reproducibility.

max_depth

Maximum depth of the tree.

Type:

int, optional

random

Random number generator.

Type:

Random

X

Training data.

Type:

pandas.DataFrame

y

Target values.

Type:

pandas.Series

att_indexes

Array of attribute indexes.

Type:

numpy.ndarray

att_values

Dictionary of attribute values.

Type:

dict

class_count

Number of classes.

Type:

int

fitness_function

Fitness function for the genetic algorithm.

Type:

function

n_jobs

Number of jobs to run in parallel.

Type:

int

random_state

Seed for reproducibility.

Type:

int

_tree

The fitted tree.

Type:

Node

_best_fitness

List of best fitness values for each iteration.

Type:

list

_avg_fitness

List of average fitness values for each iteration.

Type:

list

static default_fitness_function(root, **fitness_function_kwargs)

Default fitness function for the genetic algorithm.

Parameters:

root (Node) – Root node of the tree.

Returns:

The fitness value.

Return type:

float

fit(X, y, population_size=150, max_iter=2000, mutation_probability=0.1, elite_size=1, selection_tournament_size=2, fitness_function_kwargs={})

Fit a tree to a training set. The population size, maximum iterations, mutation probability, elite size, and selection tournament size can be specified.

Parameters:
  • X (pandas.DataFrame) – Training data.

  • y (pandas.Series) – Target values.

  • population_size (int, optional) – Size of the population.

  • max_iter (int, optional) – Maximum number of iterations.

  • mutation_probability (float, optional) – Probability of mutation.

  • elite_size (int, optional) – Number of elite trees.

  • selection_tournament_size (int, optional) – Number of trees in tournament.

  • fitness_function_kwargs (dict, optional) – Additional kwargs to be passed to the fitness_funciton.

Returns:

The fitted tree.

Return type:

Node

set_fit_request(*, elite_size: bool | None | str = '$UNCHANGED$', fitness_function_kwargs: bool | None | str = '$UNCHANGED$', max_iter: bool | None | str = '$UNCHANGED$', mutation_probability: bool | None | str = '$UNCHANGED$', population_size: bool | None | str = '$UNCHANGED$', selection_tournament_size: bool | None | str = '$UNCHANGED$') GATreeClassifier

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
  • elite_size (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for elite_size parameter in fit.

  • fitness_function_kwargs (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for fitness_function_kwargs parameter in fit.

  • max_iter (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for max_iter parameter in fit.

  • mutation_probability (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for mutation_probability parameter in fit.

  • population_size (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for population_size parameter in fit.

  • selection_tournament_size (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for selection_tournament_size parameter in fit.

Returns:

self – The updated object.

Return type:

object

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') GATreeClassifier

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.

Returns:

self – The updated object.

Return type:

object

class gatree.methods.gatreeclustering.GATreeClustering(max_depth=None, random=None, fitness_function=None, n_jobs=1, random_state=None, min_clusters=2, max_clusters=10)

Bases: ClusterMixin, GATree

Evolutionary decision tree clustering. The GATree clustering is a decision tree clustering that is trained using a genetic algorithm. The genetic algorithm is used to evolve a population of trees over multiple generations. The fitness of each tree is evaluated using a fitness function, which is used to select the best trees for crossover and mutation.

Parameters:
  • max_depth (int, optional) – Maximum depth of the tree.

  • random (Random, optional) – Random number generator.

  • fitness_function (function, optional) – Fitness function for the genetic algorithm.

  • n_jobs (int, optional) – Number of jobs to run in parallel.

  • random_state (int, optional) – Seed for reproducibility.

  • min_clusters (int, optional) – Number of minimum clusters.

  • max_clusters (int, optional) – Number of maximum clusters.

max_depth

Maximum depth of the tree.

Type:

int, optional

random

Random number generator.

Type:

Random

X

Training data.

Type:

pandas.DataFrame

y

Target values.

Type:

pandas.Series

att_indexes

Array of attribute indexes.

Type:

numpy.ndarray

att_values

Dictionary of attribute values.

Type:

dict

class_count

Number of classes.

Type:

int

fitness_function

Fitness function for the genetic algorithm.

Type:

function

n_jobs

Number of jobs to run in parallel.

Type:

int

random_state

Seed for reproducibility.

Type:

int

min_clusters

Number of minimum clusters.

Type:

int

max_clusters

Number of maximum clusters.

Type:

int

_tree

The fitted tree.

Type:

Node

_best_fitness

List of best fitness values for each iteration.

Type:

list

_avg_fitness

List of average fitness values for each iteration.

Type:

list

static default_fitness_function(root, **fitness_function_kwargs)

Default fitness function for the genetic algorithm.

Parameters:

root (Node) – Root node of the tree.

Returns:

The fitness value.

Return type:

float

fit(X, population_size=150, max_iter=2000, mutation_probability=0.1, elite_size=1, selection_tournament_size=2, fitness_function_kwargs={})

Fit a tree to a training set. The population size, maximum iterations, mutation probability, elite size, and selection tournament size can be specified.

Parameters:
  • X (pandas.DataFrame) – Training data.

  • y (pandas.Series) – Target values.

  • population_size (int, optional) – Size of the population.

  • max_iter (int, optional) – Maximum number of iterations.

  • mutation_probability (float, optional) – Probability of mutation.

  • elite_size (int, optional) – Number of elite trees.

  • selection_tournament_size (int, optional) – Number of trees in tournament.

  • fitness_function_kwargs (dict, optional) – Additional kwargs to be passed to the fitness_funciton.

Returns:

The fitted tree.

Return type:

Node

set_fit_request(*, elite_size: bool | None | str = '$UNCHANGED$', fitness_function_kwargs: bool | None | str = '$UNCHANGED$', max_iter: bool | None | str = '$UNCHANGED$', mutation_probability: bool | None | str = '$UNCHANGED$', population_size: bool | None | str = '$UNCHANGED$', selection_tournament_size: bool | None | str = '$UNCHANGED$') GATreeClustering

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:
  • elite_size (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for elite_size parameter in fit.

  • fitness_function_kwargs (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for fitness_function_kwargs parameter in fit.

  • max_iter (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for max_iter parameter in fit.

  • mutation_probability (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for mutation_probability parameter in fit.

  • population_size (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for population_size parameter in fit.

  • selection_tournament_size (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for selection_tournament_size parameter in fit.

Returns:

self – The updated object.

Return type:

object