OTB.TrainRegression: Train a classifier from multiple images to perform regression.

This application trains a classifier from multiple input images or a csv file, in order to perform regression. Predictors are composed of pixel values in each band optionally centered and reduced using an XML statistics file produced by the ComputeImagesStatistics application. The output value for each predictor is assumed to be the last band (or the last column for CSV files). Training and validation predictor lists are built such that their size is inferior to maximum bounds given by the user, and the proportion corresponds to the balance parameter. Several classifier parameters can be set depending on the chosen classifier. In the validation process, the mean square error is computed between the ground truth and the estimated model. This application is based on LibSVM and on OpenCV Machine Learning classifiers, and is compatible with OpenCV 2.3.1 and later.

Inputs

A list of input images. First (n-1) bands should contain the predictor. The last band should contain the output value to predict.

format

href

Please set a value for io.il.

Input CSV file containing the predictors, and the output values in last column. Only used when no input image is given

format

href

Please set a value for io.csv.

Input XML file containing the mean and the standard deviation of the input images.

format

href

Please set a value for io.imstat.

Maximum number of training predictors (default = 1000) (no limit = -1).

integer

Please set a value for sample.mt.

Maximum number of validation predictors (default = 1000) (no limit = -1).

integer

Please set a value for sample.mv.

Ratio between training and validation samples (0.0 = all training, 1.0 = all validation) (default = 0.5).

number

Please set a value for sample.vtr.

Choice of the classifier to use for the training.

string

Please set a value for classifier.

SVM Kernel Type.

string

Please set a value for classifier.libsvm.k.

Type of SVM formulation.

string

Please set a value for classifier.libsvm.m.

SVM models have a cost parameter C (1 by default) to control the trade-off between training errors and forcing rigid margins.

number

Please set a value for classifier.libsvm.c.

Cost parameter Nu, in the range 0..1, the larger the value, the smoother the decision.

number

Please set a value for classifier.libsvm.nu.

The distance between feature vectors from the training set and the fitting hyper-plane must be less than Epsilon. For outliersthe penalty mutliplier is set by C.

number

Please set a value for classifier.libsvm.opt.

The training algorithm attempts to split each node while its depth is smaller than the maximum possible depth of the tree. The actual depth may be smaller if the other termination criteria are met, and/or if the tree is pruned.

integer

Please set a value for classifier.dt.max.

If the number of samples in a node is smaller than this parameter, then this node will not be split.

integer

Please set a value for classifier.dt.min.

If all absolute differences between an estimated value in a node and the values of the train samples in this node are smaller than this regression accuracy parameter, then the node will not be split further.

number

Please set a value for classifier.dt.ra.

Cluster possible values of a categorical variable into K <= cat clusters to find a suboptimal split.

integer

Please set a value for classifier.dt.cat.

If cv_folds > 1, then it prunes a tree with K-fold cross-validation where K is equal to cv_folds.

integer

Please set a value for classifier.dt.f.

Type of training method for the multilayer perceptron (MLP) neural network.

string

Please set a value for classifier.dt.r.

The number of neurons in each intermediate layer (excluding input and output layers).

string

Please set a value for classifier.ann.sizes.

This function determine whether the output of the node is positive or not depending on the output of the transfert function.

string

Please set a value for classifier.ann.f.

Alpha parameter of the activation function (used only with sigmoid and gaussian functions).

number

Please set a value for classifier.ann.a.

Beta parameter of the activation function (used only with sigmoid and gaussian functions).

number

Please set a value for classifier.ann.b.

Strength of the weight gradient term in the BACKPROP method. The recommended value is about 0.1.

number

Please set a value for classifier.ann.bpdw.

Strength of the momentum term (the difference between weights on the 2 previous iterations). This parameter provides some inertia to smooth the random fluctuations of the weights. It can vary from 0 (the feature is disabled) to 1 and beyond. The value 0.1 or so is good enough.

number

Please set a value for classifier.ann.bpms.

Initial value Delta_0 of update-values Delta_

number

Please set a value for classifier.ann.rdw.

Update-values lower limit Delta_

number

Please set a value for classifier.ann.rdwm.

Termination criteria.

string

Please set a value for classifier.ann.term.

Epsilon value used in the Termination criteria.

number

Please set a value for classifier.ann.eps.

Maximum number of iterations used in the Termination criteria.

integer

Please set a value for classifier.ann.iter.

The depth of the tree. A low value will likely underfit and conversely a high value will likely overfit. The optimal value can be obtained using cross validation or other suitable methods.

integer

Please set a value for classifier.rf.max.

If the number of samples in a node is smaller than this parameter, then the node will not be split. A reasonable value is a small percentage of the total data e.g. 1 percent.

integer

Please set a value for classifier.rf.min.

If all absolute differences between an estimated value in a node and the values of the train samples in this node are smaller than this regression accuracy parameter, then the node will not be split.

number

Please set a value for classifier.rf.ra.

Cluster possible values of a categorical variable into K <= cat clusters to find a suboptimal split.

integer

Please set a value for classifier.rf.cat.

The size of the subset of features, randomly selected at each tree node, that are used to find the best split(s). If you set it to 0, then the size will be set to the square root of the total number of featurLiteralData

integer

Please set a value for classifier.rf.var.

The maximum number of trees in the forest. Typically, the more trees you have, the better the accuracy. However, the improvement in accuracy generally diminishes and reaches an asymptote for a certain number of trees. Also to keep in mind, increasing the number of trees increases the prediction time linearly.

integer

Please set a value for classifier.rf.nbtrees.

Sufficient accuracy (OOB error).

number

Please set a value for classifier.rf.acc.

The number of neighbors to use.

integer

Please set a value for classifier.knn.k.

Decision rule for regression output

string

Please set a value for classifier.knn.rule.

Set specific seed. with integer value.

integer

Outputs

Output file containing the model estimated (.txt format).

format

transmission

Mean square error computed with the validation predictors

transmission

Execution options

successUri

inProgressUri

failedUri

format

mode

Execute End Point

View the execution endpoint of a process.

View the alternative version in HTML.

{"id": "OTB.TrainRegression", "title": "Train a classifier from multiple images to perform regression.", "description": "This application trains a classifier from multiple input images or a csv file, in order to perform regression. Predictors are composed of pixel values in each band optionally centered and reduced using an XML statistics file produced by the ComputeImagesStatistics application. The output value for each predictor is assumed to be the last band (or the last column for CSV files). Training and validation predictor lists are built such that their size is inferior to maximum bounds given by the user, and the proportion corresponds to the balance parameter. Several classifier parameters can be set depending on the chosen classifier. In the validation process, the mean square error is computed between the ground truth and the estimated model. This application is based on LibSVM and on OpenCV Machine Learning classifiers, and is compatible with OpenCV 2.3.1 and later.", "version": "1.0.0", "jobControlOptions": ["sync-execute", "async-execute", "dismiss"], "outputTransmission": ["value", "reference"], "links": [{"rel": "http://www.opengis.net/def/rel/ogc/1.0/execute", "type": "application/json", "title": "Execute End Point", "href": "http://tb17.geolabs.fr:8111/ogc-api/processes/OTB.TrainRegression/execution"}, {"rel": "alternate", "type": "text/html", "title": "Execute End Point", "href": "http://tb17.geolabs.fr:8111/ogc-api/processes/OTB.TrainRegression/execution.html"}], "inputs": {"io.il": {"title": "A list of input images. First (n-1) bands should contain the predictor. The last band should contain the output value to predict.", "description": "A list of input images. First (n-1) bands should contain the predictor. The last band should contain the output value to predict.", "maxOccurs": 1024, "extended-schema": {"type": "array", "minItems": 1, "maxItems": 1024, "items": {"oneOf": [{"allOf": [{"$ref": "http://zoo-project.org/dl/link.json"}, {"type": "object", "properties": {"type": {"enum": ["image/tiff", "image/jpeg", "image/png"]}}}]}, {"type": "object", "required": ["value"], "properties": {"value": {"oneOf": [{"type": "string", "contentEncoding": "base64", "contentMediaType": "image/tiff"}, {"type": "string", "contentEncoding": "base64", "contentMediaType": "image/jpeg"}, {"type": "string", "contentEncoding": "base64", "contentMediaType": "image/png"}]}}}]}}, "schema": {"oneOf": [{"type": "string", "contentEncoding": "base64", "contentMediaType": "image/tiff"}, {"type": "string", "contentEncoding": "base64", "contentMediaType": "image/jpeg"}, {"type": "string", "contentEncoding": "base64", "contentMediaType": "image/png"}]}, "id": "io.il"}, "io.csv": {"title": "Input CSV file containing the predictors, and the output values in last column. Only used when no input image is given", "description": "Input CSV file containing the predictors, and the output values in last column. Only used when no input image is given", "extended-schema": {"oneOf": [{"allOf": [{"$ref": "http://zoo-project.org/dl/link.json"}, {"type": "object", "properties": {"type": {"enum": ["image/tiff", "image/jpeg", "image/png"]}}}]}, {"type": "object", "required": ["value"], "properties": {"value": {"oneOf": [{"type": "string", "contentEncoding": "base64", "contentMediaType": "image/tiff"}, {"type": "string", "contentEncoding": "base64", "contentMediaType": "image/jpeg"}, {"type": "string", "contentEncoding": "base64", "contentMediaType": "image/png"}]}}}], "nullable": true}, "schema": {"oneOf": [{"type": "string", "contentEncoding": "base64", "contentMediaType": "image/tiff"}, {"type": "string", "contentEncoding": "base64", "contentMediaType": "image/jpeg"}, {"type": "string", "contentEncoding": "base64", "contentMediaType": "image/png"}]}, "id": "io.csv"}, "io.imstat": {"title": "Input XML file containing the mean and the standard deviation of the input images.", "description": "Input XML file containing the mean and the standard deviation of the input images.", "extended-schema": {"oneOf": [{"allOf": [{"$ref": "http://zoo-project.org/dl/link.json"}, {"type": "object", "properties": {"type": {"enum": ["text/xml"]}}}]}, {"type": "object", "required": ["value"], "properties": {"value": {"oneOf": [{"type": "string", "contentEncoding": "utf-8", "contentMediaType": "text/xml"}]}}}], "nullable": true}, "schema": {"oneOf": [{"type": "string", "contentEncoding": "utf-8", "contentMediaType": "text/xml"}]}, "id": "io.imstat"}, "sample.mt": {"title": "Maximum number of training predictors (default = 1000) (no limit = -1).", "description": "Maximum number of training predictors (default = 1000) (no limit = -1).", "schema": {"type": "integer", "default": 1000}, "id": "sample.mt"}, "sample.mv": {"title": "Maximum number of validation predictors (default = 1000) (no limit = -1).", "description": "Maximum number of validation predictors (default = 1000) (no limit = -1).", "schema": {"type": "integer", "default": 1000}, "id": "sample.mv"}, "sample.vtr": {"title": "Ratio between training and validation samples (0.0 = all training, 1.0 = all validation) (default = 0.5).", "description": "Ratio between training and validation samples (0.0 = all training, 1.0 = all validation) (default = 0.5).", "schema": {"type": "number", "default": 0.5, "format": "double"}, "id": "sample.vtr"}, "classifier": {"title": "Choice of the classifier to use for the training.", "description": "Choice of the classifier to use for the training.", "schema": {"type": "string", "default": "libsvm", "enum": ["libsvm", "dt", "ann", "rf", "knn"]}, "id": "classifier"}, "classifier.libsvm.k": {"title": "SVM Kernel Type.", "description": "SVM Kernel Type.", "schema": {"type": "string", "default": "linear", "enum": ["linear", "rbf", "poly", "sigmoid"]}, "id": "classifier.libsvm.k"}, "classifier.libsvm.m": {"title": "Type of SVM formulation.", "description": "Type of SVM formulation.", "schema": {"type": "string", "default": "epssvr", "enum": ["epssvr", "nusvr"]}, "id": "classifier.libsvm.m"}, "classifier.libsvm.c": {"title": "SVM models have a cost parameter C (1 by default) to control the trade-off between training errors and forcing rigid margins.", "description": "SVM models have a cost parameter C (1 by default) to control the trade-off between training errors and forcing rigid margins.", "schema": {"type": "number", "default": 1, "format": "double"}, "id": "classifier.libsvm.c"}, "classifier.libsvm.nu": {"title": "Cost parameter Nu, in the range 0..1, the larger the value, the smoother the decision.", "description": "Cost parameter Nu, in the range 0..1, the larger the value, the smoother the decision.", "schema": {"type": "number", "default": 0.5, "format": "double"}, "id": "classifier.libsvm.nu"}, "classifier.libsvm.opt": {"title": "The distance between feature vectors from the training set and the fitting hyper-plane must be less than Epsilon. For outliersthe penalty mutliplier is set by C.", "description": "The distance between feature vectors from the training set and the fitting hyper-plane must be less than Epsilon. For outliersthe penalty mutliplier is set by C.", "schema": {"type": "number", "default": 0.001, "format": "double"}, "id": "classifier.libsvm.opt"}, "classifier.dt.max": {"title": "The training algorithm attempts to split each node while its depth is smaller than the maximum possible depth of the tree. The actual depth may be smaller if the other termination criteria are met, and/or if the tree is pruned.", "description": "The training algorithm attempts to split each node while its depth is smaller than the maximum possible depth of the tree. The actual depth may be smaller if the other termination criteria are met, and/or if the tree is pruned.", "schema": {"type": "integer", "default": 10}, "id": "classifier.dt.max"}, "classifier.dt.min": {"title": "If the number of samples in a node is smaller than this parameter, then this node will not be split.", "description": "If the number of samples in a node is smaller than this parameter, then this node will not be split.", "schema": {"type": "integer", "default": 10}, "id": "classifier.dt.min"}, "classifier.dt.ra": {"title": "If all absolute differences between an estimated value in a node and the values of the train samples in this node are smaller than this regression accuracy parameter, then the node will not be split further.", "description": "If all absolute differences between an estimated value in a node and the values of the train samples in this node are smaller than this regression accuracy parameter, then the node will not be split further.", "schema": {"type": "number", "default": 0.01, "format": "double"}, "id": "classifier.dt.ra"}, "classifier.dt.cat": {"title": "Cluster possible values of a categorical variable into K <= cat clusters to find a suboptimal split.", "description": "Cluster possible values of a categorical variable into K <= cat clusters to find a suboptimal split.", "schema": {"type": "integer", "default": 10}, "id": "classifier.dt.cat"}, "classifier.dt.f": {"title": "If cv_folds > 1, then it prunes a tree with K-fold cross-validation where K is equal to cv_folds.", "description": "If cv_folds > 1, then it prunes a tree with K-fold cross-validation where K is equal to cv_folds.", "schema": {"type": "integer", "default": 0}, "id": "classifier.dt.f"}, "classifier.dt.r": {"title": "Type of training method for the multilayer perceptron (MLP) neural network.", "description": "Type of training method for the multilayer perceptron (MLP) neural network.", "schema": {"type": "string", "default": "reg", "enum": ["back", "reg"]}, "id": "classifier.dt.r"}, "classifier.ann.sizes": {"title": "The number of neurons in each intermediate layer (excluding input and output layers).", "description": "The number of neurons in each intermediate layer (excluding input and output layers).", "maxOccurs": 1024, "schema": {"type": "string", "default": "Any value"}, "id": "classifier.ann.sizes"}, "classifier.ann.f": {"title": "This function determine whether the output of the node is positive or not depending on the output of the transfert function.", "description": "This function determine whether the output of the node is positive or not depending on the output of the transfert function.", "schema": {"type": "string", "default": "sig", "enum": ["ident", "sig", "gau"]}, "id": "classifier.ann.f"}, "classifier.ann.a": {"title": "Alpha parameter of the activation function (used only with sigmoid and gaussian functions).", "description": "Alpha parameter of the activation function (used only with sigmoid and gaussian functions).", "schema": {"type": "number", "default": 1, "format": "double"}, "id": "classifier.ann.a"}, "classifier.ann.b": {"title": "Beta parameter of the activation function (used only with sigmoid and gaussian functions).", "description": "Beta parameter of the activation function (used only with sigmoid and gaussian functions).", "schema": {"type": "number", "default": 1, "format": "double"}, "id": "classifier.ann.b"}, "classifier.ann.bpdw": {"title": "Strength of the weight gradient term in the BACKPROP method. The recommended value is about 0.1.", "description": "Strength of the weight gradient term in the BACKPROP method. The recommended value is about 0.1.", "schema": {"type": "number", "default": 0.1, "format": "double"}, "id": "classifier.ann.bpdw"}, "classifier.ann.bpms": {"title": "Strength of the momentum term (the difference between weights on the 2 previous iterations). This parameter provides some inertia to smooth the random fluctuations of the weights. It can vary from 0 (the feature is disabled) to 1 and beyond. The value 0.1 or so is good enough.", "description": "Strength of the momentum term (the difference between weights on the 2 previous iterations). This parameter provides some inertia to smooth the random fluctuations of the weights. It can vary from 0 (the feature is disabled) to 1 and beyond. The value 0.1 or so is good enough.", "schema": {"type": "number", "default": 0.1, "format": "double"}, "id": "classifier.ann.bpms"}, "classifier.ann.rdw": {"title": "Initial value Delta_0 of update-values Delta_", "description": "Initial value Delta_0 of update-values Delta_", "schema": {"type": "number", "default": 0.1, "format": "double"}, "id": "classifier.ann.rdw"}, "classifier.ann.rdwm": {"title": "Update-values lower limit Delta_", "description": "Update-values lower limit Delta_", "schema": {"type": "number", "default": 1e-07, "format": "double"}, "id": "classifier.ann.rdwm"}, "classifier.ann.term": {"title": "Termination criteria.", "description": "Termination criteria.", "schema": {"type": "string", "default": "all", "enum": ["iter", "eps", "all"]}, "id": "classifier.ann.term"}, "classifier.ann.eps": {"title": "Epsilon value used in the Termination criteria.", "description": "Epsilon value used in the Termination criteria.", "schema": {"type": "number", "default": 0.01, "format": "double"}, "id": "classifier.ann.eps"}, "classifier.ann.iter": {"title": "Maximum number of iterations used in the Termination criteria.", "description": "Maximum number of iterations used in the Termination criteria.", "schema": {"type": "integer", "default": 1000}, "id": "classifier.ann.iter"}, "classifier.rf.max": {"title": "The depth of the tree. A low value will likely underfit and conversely a high value will likely overfit. The optimal value can be obtained using cross validation or other suitable methods.", "description": "The depth of the tree. A low value will likely underfit and conversely a high value will likely overfit. The optimal value can be obtained using cross validation or other suitable methods.", "schema": {"type": "integer", "default": 5}, "id": "classifier.rf.max"}, "classifier.rf.min": {"title": "If the number of samples in a node is smaller than this parameter, then the node will not be split. A reasonable value is a small percentage of the total data e.g. 1 percent.", "description": "If the number of samples in a node is smaller than this parameter, then the node will not be split. A reasonable value is a small percentage of the total data e.g. 1 percent.", "schema": {"type": "integer", "default": 10}, "id": "classifier.rf.min"}, "classifier.rf.ra": {"title": "If all absolute differences between an estimated value in a node and the values of the train samples in this node are smaller than this regression accuracy parameter, then the node will not be split.", "description": "If all absolute differences between an estimated value in a node and the values of the train samples in this node are smaller than this regression accuracy parameter, then the node will not be split.", "schema": {"type": "number", "default": 0, "format": "double"}, "id": "classifier.rf.ra"}, "classifier.rf.cat": {"title": "Cluster possible values of a categorical variable into K <= cat clusters to find a suboptimal split.", "description": "Cluster possible values of a categorical variable into K <= cat clusters to find a suboptimal split.", "schema": {"type": "integer", "default": 10}, "id": "classifier.rf.cat"}, "classifier.rf.var": {"title": "The size of the subset of features, randomly selected at each tree node, that are used to find the best split(s). If you set it to 0, then the size will be set to the square root of the total number of features.", "description": "The size of the subset of features, randomly selected at each tree node, that are used to find the best split(s). If you set it to 0, then the size will be set to the square root of the total number of featurLiteralData", "schema": {"type": "integer", "default": 0}, "id": "classifier.rf.var"}, "classifier.rf.nbtrees": {"title": "The maximum number of trees in the forest. Typically, the more trees you have, the better the accuracy. However, the improvement in accuracy generally diminishes and reaches an asymptote for a certain number of trees. Also to keep in mind, increasing the number of trees increases the prediction time linearly.", "description": "The maximum number of trees in the forest. Typically, the more trees you have, the better the accuracy. However, the improvement in accuracy generally diminishes and reaches an asymptote for a certain number of trees. Also to keep in mind, increasing the number of trees increases the prediction time linearly.", "schema": {"type": "integer", "default": 100}, "id": "classifier.rf.nbtrees"}, "classifier.rf.acc": {"title": "Sufficient accuracy (OOB error).", "description": "Sufficient accuracy (OOB error).", "schema": {"type": "number", "default": 0.01, "format": "double"}, "id": "classifier.rf.acc"}, "classifier.knn.k": {"title": "The number of neighbors to use.", "description": "The number of neighbors to use.", "schema": {"type": "integer", "default": 32}, "id": "classifier.knn.k"}, "classifier.knn.rule": {"title": "Decision rule for regression output", "description": "Decision rule for regression output", "schema": {"type": "string", "default": "mean", "enum": ["mean", "median"]}, "id": "classifier.knn.rule"}, "rand": {"title": "Set specific seed. with integer value.", "description": "Set specific seed. with integer value.", "schema": {"type": "integer", "nullable": true}, "id": "rand"}}, "outputs": {"io.out": {"title": "Output file containing the model estimated (.txt format).", "description": "Output file containing the model estimated (.txt format).", "extended-schema": {"oneOf": [{"allOf": [{"$ref": "http://zoo-project.org/dl/link.json"}, {"type": "object", "properties": {"type": {"enum": ["text/xml", "text/plain"]}}}]}, {"type": "object", "required": ["value"], "properties": {"value": {"oneOf": [{"type": "string", "contentEncoding": "utf-8", "contentMediaType": "text/xml"}, {"type": "string", "contentEncoding": "utf-8", "contentMediaType": "text/plain"}]}}}]}, "schema": {"oneOf": [{"type": "string", "contentEncoding": "utf-8", "contentMediaType": "text/xml"}, {"type": "string", "contentEncoding": "utf-8", "contentMediaType": "text/plain"}]}, "id": "io.out"}, "io.mse": {"title": "Mean square error computed with the validation predictors", "description": "Mean square error computed with the validation predictors", "schema": {"type": "number"}, "id": "io.mse"}}}

Display original data as JSON

http://tb17.geolabs.fr:8111/ogc-api/processes/OTB.TrainRegression.html

Last modified: Sat Dec 4 00:09:36 CET 2021

OTB.TrainRegression: Train a classifier from multiple images to perform regression.

Inputs

io.il

io.csv

io.imstat

sample.mt

sample.mv

sample.vtr

classifier

classifier.libsvm.k

classifier.libsvm.m

classifier.libsvm.c

classifier.libsvm.nu

classifier.libsvm.opt

classifier.dt.max

classifier.dt.min

classifier.dt.ra

classifier.dt.cat

classifier.dt.f

classifier.dt.r

classifier.ann.sizes

classifier.ann.f

classifier.ann.a

classifier.ann.b

classifier.ann.bpdw

classifier.ann.bpms

classifier.ann.rdw

classifier.ann.rdwm

classifier.ann.term

classifier.ann.eps

classifier.ann.iter

classifier.rf.max

classifier.rf.min

classifier.rf.ra

classifier.rf.cat

classifier.rf.var

classifier.rf.nbtrees

classifier.rf.acc

classifier.knn.k

classifier.knn.rule

rand

Outputs

io.out

io.mse

Execution options

Subscribers

Response

Mode

Your request

Execute End Point

Message