Index · BetaML.jl Documentation (2024)

Welcome to the documentation of the Beta Machine Learning toolkit.

About

The BetaML toolkit provides machine learning algorithms written in the Julia programming language.

Aside the algorithms themselves, BetaML provides many "utility" functions. Because algorithms are all self-contained in the library itself (you are invited to explore their source code by typing @edit functionOfInterest(par1,par2,...)), the utility functions have APIs that are coordinated with the algorithms, facilitating the "preparation" of the data for the analysis, the choice of the hyper-parameters or the evaluation of the models. Most models have an interface for the MLJ framework.

Aside Julia, BetaML can be accessed in R or Python using respectively JuliaCall and PyJulia. See the tutorial for details.

Installation

The BetaML package is included in the standard Julia register, install it with:

  • ] add BetaML

Available modules

While BetaML is split in several (sub)modules, all of them are re-exported at the root module level. This means that you can access their functionality by simply typing using BetaML:

using BetaMLmyLayer = DenseLayer(2,3) # DenseLayer is defined in the Nn submoduleres = KernelPerceptronClassifier() # KernelPerceptronClassifier is defined in the Perceptron module@edit DenseLayer(2,3) # Open a text editor with to the relevant source code

Each module is documented on the links below (you can also use the inline Julia help system: just press the question mark ? and then, on the special help prompt help?>, type the function name):

  • BetaML.Perceptron: The Perceptron, Kernel Perceptron and Pegasos classification algorithms;
  • BetaML.Trees: The Decision Trees and Random Forests algorithms for classification or regression (with missing values supported);
  • BetaML.Nn: Implementation of Artificial Neural Networks;
  • BetaML.Clustering: (hard) Clustering algorithms (K-Means, K-Mdedoids)
  • BetaML.GMM: Various algorithms (Clustering, regressor, missing imputation / collaborative filtering / recommandation systems) that use a Generative (Gaussian) mixture models (probabilistic) fitter, fitted using a EM algorithm;
  • BetaML.Imputation: Imputation algorithms;
  • BetaML.Utils: Various utility functions (scale, one-hot, distances, kernels, pca, autoencoder, predictions analysis, feature importance..).

Available models

Currently BetaML provides the following models:

BetaML nameHpMLJ InterfaceCategory*
PerceptronClassifierPerceptronClassifierSupervised classifier
KernelPerceptronClassifierKernelPerceptronClassifierSupervised classifier
PegasosClassifierPegasosClassifierSupervised classifier
DecisionTreeEstimatorDecisionTreeClassifier, DecisionTreeRegressorSupervised regressor and classifier
RandomForestEstimatorRandomForestClassifier, RandomForestRegressorSupervised regressor and classifier
NeuralNetworkEstimatorNeuralNetworkRegressor, MultitargetNeuralNetworkRegressor, NeuralNetworkClassifierSupervised regressor and classifier
GaussianMixtureRegressorGaussianMixtureRegressor, MultitargetGaussianMixtureRegressorSupervised regressor
GaussianMixtureRegressor2Supervised regressor
KMeansClustererKMeansClustererUnsupervised hard clusterer
KMedoidsClustererKMedoidsClustererUnsupervised hard clusterer
GaussianMixtureClustererGaussianMixtureClustererUnsupervised soft clusterer
SimpleImputerSimpleImputerUnsupervised missing data imputer
GaussianMixtureImputerGaussianMixtureImputerUnsupervised missing data imputer
RandomForestImputer, RandomForestImputerUnsupervised missing data imputer
GeneralImputerGeneralImputerUnsupervised missing data imputer
MinMaxScalerData transformer
StandardScalerData transformer
ScalerData transformer
PCAEncoderUnsupervised dimensionality reduction
AutoEncoderAutoEncoderUnsupervised non-linear dimensionality reduction
OneHotEncoderData transformer
OrdinalEncoderData transformer
ConfusionMatrixPredictions analysis
FeatureRankerPredictions analysis

* There is no formal distinction in BetaML between a transformer, or also a model to assess predictions, and a unsupervised model. They are all treated as unsupervised models that given some data they lern how to return some useful information, wheter a class grouping, a specific tranformation or a quality evaluation..

Usage

New to BetaML or even to Julia / Machine Learning altogether? Start from the tutorial!

All models supports the (a) model construction (where hyperparameters and options are choosen), (b) fitting and (c) prediction paradigm. A few model support inverse_transform, for example to go back from the one-hot encoded columns to the original categorical variable (factor).

This paradigm is described in detail in the API V2 page.

Quick examples

(see the tutorial for a more step-by-step guide to the examples below and to other examples)

  • Using an Artificial Neural Network for multinomial categorisation

In this example we see how to train a neural networks model to predict the specie's name (5th column) given floral sepals and petals measures (first 4 columns) in the famous iris flower dataset.

# Load Modulesusing DelimitedFiles, Randomusing Pipe, Plots, BetaML # Load BetaML and other auxiliary modulesRandom.seed!(123); # Fix the random seed (to obtain reproducible results).# Load the datairis = readdlm(joinpath(dirname(Base.find_package("BetaML")),"..","test","data","iris.csv"),',',skipstart=1)x = convert(Array{Float64,2}, iris[:,1:4])y = convert(Array{String,1}, iris[:,5])# Encode the categories (levels) of y using a separate column per each category (aka "one-hot" encoding) ohmod = OneHotEncoder()y_oh = fit!(ohmod,y) # Split the data in training/testing sets((xtrain,xtest),(ytrain,ytest),(ytrain_oh,ytest_oh)) = partition([x,y,y_oh],[0.8,0.2])(ntrain, ntest) = size.([xtrain,xtest],1)# Define the Artificial Neural Network modell1 = DenseLayer(4,10,f=relu) # The activation function is `ReLU`l2 = DenseLayer(10,3) # The activation function is `identity` by defaultl3 = VectorFunctionLayer(3,f=softmax) # Add a (parameterless include("Imputation_tests.jl")) layer whose activation function (`softmax` in this case) is defined to all its nodes at oncemynn = NeuralNetworkEstimator(layers=[l1,l2,l3],loss=crossentropy,descr="Multinomial logistic regression Model Sepal", batch_size=2, epochs=200) # Build the NN and use the cross-entropy as error function.# Alternatively, swith to hyperparameters auto-tuning with `autotune=true` instead of specify `batch_size` and `epoch` manually# Train the model (using the ADAM optimizer by default)res = fit!(mynn,fit!(Scaler(),xtrain),ytrain_oh) # Fit the model to the (scaled) data# Obtain predictions and test them against the ground true observationsŷtrain = @pipe predict(mynn,fit!(Scaler(),xtrain)) |> inverse_predict(ohmod,_) # Note the scaling and reverse one-hot encoding functionsŷtest = @pipe predict(mynn,fit!(Scaler(),xtest)) |> inverse_predict(ohmod,_) train_accuracy = accuracy(ŷtrain,ytrain) # 0.975test_accuracy = accuracy(ŷtest,ytest) # 0.96# Analyse model performancescm = ConfusionMatrix()fit!(cm,ytest,ŷtest)print(cm)
A ConfusionMatrix BetaMLModel (fitted)-----------------------------------------------------------------*** CONFUSION MATRIX ***Scores actual (rows) vs predicted (columns):4×4 Matrix{Any}: "Labels" "virginica" "versicolor" "setosa" "virginica" 8 1 0 "versicolor" 0 14 0 "setosa" 0 0 7Normalised scores actual (rows) vs predicted (columns):4×4 Matrix{Any}: "Labels" "virginica" "versicolor" "setosa" "virginica" 0.888889 0.111111 0.0 "versicolor" 0.0 1.0 0.0 "setosa" 0.0 0.0 1.0 *** CONFUSION REPORT ***- Accuracy: 0.9666666666666667- Misclassification rate: 0.033333333333333326- Number of classes: 3 N Class precision recall specificity f1score actual_count predicted_count TPR TNR support 1 virginica 1.000 0.889 1.000 0.941 9 8 2 versicolor 0.933 1.000 0.938 0.966 14 15 3 setosa 1.000 1.000 1.000 1.000 7 7- Simple avg. 0.978 0.963 0.979 0.969- Weigthed avg. 0.969 0.967 0.971 0.966
ϵ = info(mynn)["loss_per_epoch"]plot(1:length(ϵ),ϵ, ylabel="epochs",xlabel="error",legend=nothing,title="Avg. error per epoch on the Sepal dataset")heatmap(info(cm)["categories"],info(cm)["categories"],info(cm)["normalised_scores"],c=cgrad([:white,:blue]),xlabel="Predicted",ylabel="Actual", title="Confusion Matrix")

Index · BetaML.jl Documentation (1) Index · BetaML.jl Documentation (2)

  • Using Random forests for regression

In this example we predict, using another classical ML dataset, the miles per gallon of various car models.

Note in particular:

  • (a) how easy it is in Julia to import remote data, even cleaning them without ever saving a local file on disk;
  • (b) how Random Forest models can directly work on data with missing values, categorical one and non-numerical one in general without any preprocessing
# Load modulesusing Random, HTTP, CSV, DataFrames, BetaML, Plotsimport Pipe: @pipeRandom.seed!(123)# Load dataurlData = "https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data"data = @pipe HTTP.get(urlData).body |> replace!(_, UInt8('\t') => UInt8(' ')) |> CSV.File(_, delim=' ', missingstring="?", ignorerepeated=true, header=false) |> DataFrame;# Preprocess dataX = Matrix(data[:,2:8]) # cylinders, displacement, horsepower, weight, acceleration, model year, origin, model namey = data[:,1] # miles per gallon(xtrain,xtest),(ytrain,ytest) = partition([X,y],[0.8,0.2])# Model definition, hyper-parameters auto-tuning, training and predictionm = RandomForestEstimator(autotune=true)ŷtrain = fit!(m,xtrain,ytrain) # shortcut for `fit!(m,xtrain,ytrain); ŷtrain = predict(x,xtrain)`ŷtest = predict(m,xtest)# Prediction assessmentrelative_mean_error_train = relative_mean_error(ytrain,ŷtrain) # 0.039relative_mean_error_test = relative_mean_error(ytest,ŷtest) # 0.076scatter(ytest,ŷtest,xlabel="Actual",ylabel="Estimated",label=nothing,title="Est vs. obs MPG (test set)")

Index · BetaML.jl Documentation (3)

  • Further examples

Finally, you may want to give a look at the "test" folder. While the primary objective of the scripts under the "test" folder is to provide automatic testing of the BetaML toolkit, they can also be used to see how functions should be called, as virtually all functions provided by BetaML are tested there.

Benchmarks

A page summarising some basic benchmarks for BetaML and other leading Julia ML libraries is available here.

Acknowledgements

The development of this package at the Bureau d'Economie Théorique et Appliquée (BETA, Nancy) was supported by the French National Research Agency through the Laboratory of Excellence ARBRE, a part of the “Investissem*nts d'Avenir” Program (ANR 11 – LABX-0002-01).

Index · BetaML.jl Documentation (4)

Index · BetaML.jl Documentation (2024)

References

Top Articles
Latest Posts
Article information

Author: Dong Thiel

Last Updated:

Views: 6088

Rating: 4.9 / 5 (79 voted)

Reviews: 86% of readers found this page helpful

Author information

Name: Dong Thiel

Birthday: 2001-07-14

Address: 2865 Kasha Unions, West Corrinne, AK 05708-1071

Phone: +3512198379449

Job: Design Planner

Hobby: Graffiti, Foreign language learning, Gambling, Metalworking, Rowing, Sculling, Sewing

Introduction: My name is Dong Thiel, I am a brainy, happy, tasty, lively, splendid, talented, cooperative person who loves writing and wants to share my knowledge and understanding with you.