A transformer (or feature) recipe is a collection of programmatic steps, the same steps that a data scientist would write a code to build a column transformation. The recipe makes it possible to engineer the transformer in training and in production. The transformer recipe, and recipes, in general, provide a data scientist the power to enhance the strengths of H2O DriverlessAI with custom recipes. These custom recipes would bring in nuanced knowledge about certain domains – i.e. financial crimes, cybersecurity, anomaly detection. etc. It also provides the ability to extend DriverlessAI to solve custom solutions for time-series.
The structure of a recipe that works with DriverlessAI is quite straight forward.
CustomTransformer
Base class that needs to be extended for one to write a recipe. The CustomTransformer
class provides one the ability to add a customized transformation function. In the following example, we are going to create a transformer that will transform a column with the log10
of the same column. The new column, which is transformed by log10
will be returned to DriverlessAI as a new column that will be used for modeling.The ExampleLogTransformer
is the class name of the transformer that is being newly created. And in the parenthesis the CustomTransformer
is being extended.
Depending on what kind of outcome the custom transformer is solving, each one of the above needs to be enabled or disabled. And the following example will show you how this can be done
In the above example, we are building a log10
transformer, and this transformer is an application, for a regression, binary, or a multiclass problem. Therefore we set all of those as True
.
In this example, we enable the acceptance test by returning True
for the do_acceptance_test
function
The column type or col_type
can take nine different column data types, and they are as follows:
Please note that if col_type
is set to col_type=all
then all the columns in the dataframe are provided to this transformer, no selection of columns will occur.
The min_cols
and max_cols
either take numbers/integers or take string parameters as all
and any
. The all
and any
should coincide with the same col_type
, respectively.
The relative_importance
takes a positive value. If this value is more than 1
then the transformer is likely to be used more often than other transformers in the specific experiment. If it is less than 1
then it is less likely to be used than other transformers in the specific experiment. If it is set to 1
then it is equally likely to be used as other transformers in the specific experiment, provided other transformers are also set to relative importance 1
. i , which will over, or under-representation. Default value is 1
, value greater than 1
is over-representation and under 1
is under-representation.
In the above example, as we are dealing with a numeric column (recall, that we are calculating the log10 of a given column) we set the col_type
to numeric
. We set the min_cols
and max_cols
to 1
as we need only one column and the relative_importance
to 1
.
fit_transform
This function is used to fit the transformation on the training dataset, and returns the output column.transform
This function is used to transform the testing or production dataset, and is always applied after the fit_transform.
In the above example, we compose the fit_transform
and transform
for training and testing data, respectively. In the fit_transform
the response variable y
is available. Here our dataframe is named X
. Now X
will be transformed to pandas frame by using the to_pandas()
function. Further, a log10
of the column will be applied and returned. The to_pandas()
function is described here for ease of understanding. A real-world implementation of log transformer is available at the following link HyperLink to LogTransformer
example_transform.py
predict
is chosen by right-clicking. Following this, a target
or response
variable is set.Expert Settings
is chosen, following the recipes, and this – example_transform.py
is ingested.Want to give it a try? Check out a free demo with the tutorials.