It is well known throughout the data science community that data preparation, pre-processing, and feature engineering are one of the most cumbersome parts of the data science workload. So as we continue to innovate here at H2O.ai with our end-to-end automated machine learning (autoML ) capabilities, we challenged ourselves to evolve the process of feature engineering into more robust feature transformation .
The H2O AI Cloud enables exploratory data analysis capabilities through automated visualizations and insights. This core functionality allows data scientists to see feature information, interactions, charts and plots to rapidly understand the dataset they are working with. This accelerates the discovery process for data scientists in finding signal driving features and interactions hidden in their data. The H2O AI Cloud enables these dynamic interactions and feature transformations to be extracted automatically, while simultaneously giving users the ability to turn anything off or on at their choosing, enabling full control of the evolutionary algorithm driving data intelligence.
H2O.ai’s proprietary evolutionary algorithm, which is used to quantitatively experiment and test hundreds of combinations and feature transformations to find signals in the noise is running underneath the surface of H2O.ai’s autoML . Some of the robust automatic feature engineering capabilities include:
Numeric Transformers: Interactions, Binning, Clustering, Target Encoding, Weight of Evidence, Truncated SVD, DBSCAN, TNSE, UMAP
Time Series: Date & DateTime, Exponentially Weighted Moving Averages, Lags, Interactions, and Aggregations
Categorical: One Hot Encoding, Cross Validation Target Encoding & Numeric Encoding, Weight of Evidence
Text Transformers: BERT , BiGRU, Text CNN, CharCNN, TFIDF
Time Transformers: Dates (Days, Months, Years, Seconds etc), Holidays
Image Models and Transformers: Image AutoML, Image Vectorizer
The H2O AI Cloud rapidly accelerates the speed at which data scientists and data engineers can analyze and prepare a dataset for modeling, enabling them to make models with more accuracy, speed and transparency.
Learn more about the latest release of H2O AI Cloud 21.10 here .