OTB.TrainImagesClassifier: Train a classifier from multiple pairs of images and training vector data.
This application performs a classifier training from multiple pairs of input images and training vector data. Samples are composed of pixel values in each band optionally centered and reduced using an XML statistics file produced by the ComputeImagesStatistics application. The training vector data must contain polygons with a positive integer field representing the class label. The name of this field can be set using the "Class label field" parameter. Training and validation sample lists are built such that each class is equally represented in both lists. One parameter allows controlling the ratio between the number of samples in training and validation sets. Two parameters allow managing the size of the training and validation sets per class and per image. Several classifier parameters can be set depending on the chosen classifier. In the validation process, the confusion matrix is organized the following way: rows = reference labels, columns = produced labels. In the header of the optional confusion matrix output file, the validation (reference) and predicted (produced) class labels are ordered according to the rows/columns of the confusion matrix. This application is based on LibSVM, OpenCV Machine Learning (2.3.1 and later), and Shark ML. The output of this application is a text model file, whose format corresponds to the ML model type chosen. There is no image nor vector data output.
Execute End Point
View the execution endpoint of a process.
View the alternative version in HTML.
{"id": "OTB.TrainImagesClassifier", "title": "Train a classifier from multiple pairs of images and training vector data.", "description": "This application performs a classifier training from multiple pairs of input images and training vector data. Samples are composed of pixel values in each band optionally centered and reduced using an XML statistics file produced by the ComputeImagesStatistics application. The training vector data must contain polygons with a positive integer field representing the class label. The name of this field can be set using the \"Class label field\" parameter. Training and validation sample lists are built such that each class is equally represented in both lists. One parameter allows controlling the ratio between the number of samples in training and validation sets. Two parameters allow managing the size of the training and validation sets per class and per image. Several classifier parameters can be set depending on the chosen classifier. In the validation process, the confusion matrix is organized the following way: rows = reference labels, columns = produced labels. In the header of the optional confusion matrix output file, the validation (reference) and predicted (produced) class labels are ordered according to the rows/columns of the confusion matrix. This application is based on LibSVM, OpenCV Machine Learning (2.3.1 and later), and Shark ML. The output of this application is a text model file, whose format corresponds to the ML model type chosen. There is no image nor vector data output.", "version": "1.0.0", "jobControlOptions": ["sync-execute", "async-execute", "dismiss"], "outputTransmission": ["value", "reference"], "links": [{"rel": "http://www.opengis.net/def/rel/ogc/1.0/execute", "type": "application/json", "title": "Execute End Point", "href": "http://tb17.geolabs.fr:8111/ogc-api/processes/OTB.TrainImagesClassifier/execution"}, {"rel": "alternate", "type": "text/html", "title": "Execute End Point", "href": "http://tb17.geolabs.fr:8111/ogc-api/processes/OTB.TrainImagesClassifier/execution.html"}], "inputs": {"io.il": {"title": "A list of input images.", "description": "A list of input images.", "maxOccurs": 1024, "extended-schema": {"type": "array", "minItems": 1, "maxItems": 1024, "items": {"oneOf": [{"allOf": [{"$ref": "http://zoo-project.org/dl/link.json"}, {"type": "object", "properties": {"type": {"enum": ["image/tiff", "image/jpeg", "image/png"]}}}]}, {"type": "object", "required": ["value"], "properties": {"value": {"oneOf": [{"type": "string", "contentEncoding": "base64", "contentMediaType": "image/tiff"}, {"type": "string", "contentEncoding": "base64", "contentMediaType": "image/jpeg"}, {"type": "string", "contentEncoding": "base64", "contentMediaType": "image/png"}]}}}]}}, "schema": {"oneOf": [{"type": "string", "contentEncoding": "base64", "contentMediaType": "image/tiff"}, {"type": "string", "contentEncoding": "base64", "contentMediaType": "image/jpeg"}, {"type": "string", "contentEncoding": "base64", "contentMediaType": "image/png"}]}, "id": "io.il"}, "io.vd": {"title": "A list of vector data to select the training samples.", "description": "A list of vector data to select the training samples.", "maxOccurs": 1024, "extended-schema": {"type": "array", "minItems": 1, "maxItems": 1024, "items": {"oneOf": [{"allOf": [{"$ref": "http://zoo-project.org/dl/link.json"}, {"type": "object", "properties": {"type": {"enum": ["text/xml", "application/vnd.google-earth.kml+xml", "application/json", "application/zip"]}}}]}, {"type": "object", "required": ["value"], "properties": {"value": {"oneOf": [{"type": "string", "contentEncoding": "utf-8", "contentMediaType": "text/xml"}, {"type": "string", "contentEncoding": "utf-8", "contentMediaType": "application/vnd.google-earth.kml+xml"}, {"type": "object"}, {"type": "string", "contentEncoding": "base64", "contentMediaType": "application/zip"}]}}}]}}, "schema": {"oneOf": [{"type": "string", "contentEncoding": "utf-8", "contentMediaType": "text/xml"}, {"type": "string", "contentEncoding": "utf-8", "contentMediaType": "application/vnd.google-earth.kml+xml"}, {"type": "object"}, {"type": "string", "contentEncoding": "base64", "contentMediaType": "application/zip"}]}, "id": "io.vd"}, "io.valid": {"title": "A list of vector data to select the validation samples.", "description": "A list of vector data to select the validation samples.", "maxOccurs": 1024, "extended-schema": {"type": "array", "minItems": 0, "maxItems": 1024, "items": {"oneOf": [{"allOf": [{"$ref": "http://zoo-project.org/dl/link.json"}, {"type": "object", "properties": {"type": {"enum": ["text/xml", "application/vnd.google-earth.kml+xml", "application/json", "application/zip"]}}}]}, {"type": "object", "required": ["value"], "properties": {"value": {"oneOf": [{"type": "string", "contentEncoding": "utf-8", "contentMediaType": "text/xml"}, {"type": "string", "contentEncoding": "utf-8", "contentMediaType": "application/vnd.google-earth.kml+xml"}, {"type": "object"}, {"type": "string", "contentEncoding": "base64", "contentMediaType": "application/zip"}]}}}]}, "nullable": true}, "schema": {"oneOf": [{"type": "string", "contentEncoding": "utf-8", "contentMediaType": "text/xml"}, {"type": "string", "contentEncoding": "utf-8", "contentMediaType": "application/vnd.google-earth.kml+xml"}, {"type": "object"}, {"type": "string", "contentEncoding": "base64", "contentMediaType": "application/zip"}]}, "id": "io.valid"}, "io.imstat": {"title": "XML file containing mean and variance of each feature.", "description": "XML file containing mean and variance of each feature.", "extended-schema": {"oneOf": [{"allOf": [{"$ref": "http://zoo-project.org/dl/link.json"}, {"type": "object", "properties": {"type": {"enum": ["text/xml"]}}}]}, {"type": "object", "required": ["value"], "properties": {"value": {"oneOf": [{"type": "string", "contentEncoding": "utf-8", "contentMediaType": "text/xml"}]}}}], "nullable": true}, "schema": {"oneOf": [{"type": "string", "contentEncoding": "utf-8", "contentMediaType": "text/xml"}]}, "id": "io.imstat"}, "cleanup": {"title": "Maximum size per class (in pixels) of the training sample list (default = 1000) (no limit = -1). If equal to -1, then the maximal size of the available training sample list per class will be equal to the surface area of the smallest class multiplied by the training sample ratio.", "description": "Maximum size per class (in pixels) of the training sample list (default = 1000) (no limit = -1). If equal to -1, then the maximal size of the available training sample list per class will be equal to the surface area of the smallest class multiplied by the training sample ratio.", "schema": {"type": "integer", "default": 1000}, "id": "cleanup"}, "sample.mv": {"title": "Maximum size per class (in pixels) of the validation sample list (default = 1000) (no limit = -1). If equal to -1, then the maximal size of the available validation sample list per class will be equal to the surface area of the smallest class multiplied by the validation sample ratio.", "description": "Maximum size per class (in pixels) of the validation sample list (default = 1000) (no limit = -1). If equal to -1, then the maximal size of the available validation sample list per class will be equal to the surface area of the smallest class multiplied by the validation sample ratio.", "schema": {"type": "integer", "default": 1000}, "id": "sample.mv"}, "sample.bm": {"title": "Bound the number of samples for each class by the number of available samples by the smaller class. Proportions between training and validation are respected. Default is true (=1).", "description": "Bound the number of samples for each class by the number of available samples by the smaller class. Proportions between training and validation are respected. Default is true (=1).", "schema": {"type": "integer", "default": 1}, "id": "sample.bm"}, "sample.vtr": {"title": "Ratio between training and validation samples (0.0 = all training, 1.0 = all validation) (default = 0.5).", "description": "Ratio between training and validation samples (0.0 = all training, 1.0 = all validation) (default = 0.5).", "schema": {"type": "number", "default": 0.5, "format": "double"}, "id": "sample.vtr"}, "sample.vfn": {"title": "Field containing the class id for supervision. The values in this field shall be cast into integers.", "description": "Field containing the class id for supervision. The values in this field shall be cast into integers.", "maxOccurs": 1024, "schema": {"type": "string", "default": "Any value"}, "id": "sample.vfn"}, "ram": {"title": "Available memory for processing (in MB)", "description": "Available memory for processing (in MB)", "schema": {"type": "integer", "default": 128, "nullable": true}, "id": "ram"}, "elev.dem": {"title": "This parameter allows selecting a directory containing Digital Elevation Model files. Note that this directory should contain only DEM files. Unexpected behaviour might occurs if other images are found in this directory.", "description": "This parameter allows selecting a directory containing Digital Elevation Model files. Note that this directory should contain only DEM files. Unexpected behaviour might occurs if other images are found in this directory.", "schema": {"type": "string", "default": "Any value", "nullable": true}, "id": "elev.dem"}, "elev.geoid": {"title": "Use a geoid grid to get the height above the ellipsoid in case there is no DEM available, no coverage for some points or pixels with no_data in the DEM tiles. A version of the geoid can be found on the OTB website(https://gitlab.orfeo-toolbox.org/orfeotoolbox/otb-data/blob/master/Input/DEM/egm96.grd).", "description": "Use a geoid grid to get the height above the ellipsoid in case there is no DEM available, no coverage for some points or pixels with no_data in the DEM tiles. A version of the geoid can be found on the OTB website(https://gitlab.orfeo-toolbox.org/orfeotoolbox/otb-data/blob/master/Input/DEM/egm96.grd).", "extended-schema": {"oneOf": [{"allOf": [{"$ref": "http://zoo-project.org/dl/link.json"}, {"type": "object", "properties": {"type": {"enum": ["application/octet-stream"]}}}]}, {"type": "object", "required": ["value"], "properties": {"value": {"oneOf": [{"type": "string", "contentEncoding": "base64", "contentMediaType": "application/octet-stream"}]}}}], "nullable": true}, "schema": {"oneOf": [{"type": "string", "contentEncoding": "base64", "contentMediaType": "application/octet-stream"}]}, "id": "elev.geoid"}, "elev.default": {"title": "This parameter allows setting the default height above ellipsoid when there is no DEM available, no coverage for some points or pixels with no_data in the DEM tiles, and no geoid file has been set. This is also used by some application as an average elevation value.", "description": "This parameter allows setting the default height above ellipsoid when there is no DEM available, no coverage for some points or pixels with no_data in the DEM tiles, and no geoid file has been set. This is also used by some application as an average elevation value.", "schema": {"type": "number", "default": 0, "format": "double"}, "id": "elev.default"}, "classifier": {"title": "Choice of the classifier to use for the training.", "description": "Choice of the classifier to use for the training.", "schema": {"type": "string", "default": "libsvm", "enum": ["libsvm", "boost", "dt", "ann", "bayes", "rf", "knn"]}, "id": "classifier"}, "classifier.libsvm.k": {"title": "SVM Kernel Type.", "description": "SVM Kernel Type.", "schema": {"type": "string", "default": "linear", "enum": ["linear", "rbf", "poly", "sigmoid"]}, "id": "classifier.libsvm.k"}, "classifier.libsvm.m": {"title": "Type of SVM formulation.", "description": "Type of SVM formulation.", "schema": {"type": "string", "default": "csvc", "enum": ["csvc", "nusvc", "oneclass"]}, "id": "classifier.libsvm.m"}, "classifier.libsvm.c": {"title": "SVM models have a cost parameter C (1 by default) to control the trade-off between training errors and forcing rigid margins.", "description": "SVM models have a cost parameter C (1 by default) to control the trade-off between training errors and forcing rigid margins.", "schema": {"type": "number", "default": 1, "format": "double"}, "id": "classifier.libsvm.c"}, "classifier.libsvm.nu": {"title": "Cost parameter Nu, in the range 0..1, the larger the value, the smoother the decision.", "description": "Cost parameter Nu, in the range 0..1, the larger the value, the smoother the decision.", "schema": {"type": "number", "default": 0.5, "format": "double"}, "id": "classifier.libsvm.nu"}, "classifier.libsvm.opt": {"title": "Type of Boosting algorithm.", "description": "Type of Boosting algorithm.", "schema": {"type": "string", "default": "real", "enum": ["discrete", "real", "logit", "gentle"]}, "id": "classifier.libsvm.opt"}, "classifier.boost.w": {"title": "The number of weak classifiers.", "description": "The number of weak classifiers.", "schema": {"type": "integer", "default": 100}, "id": "classifier.boost.w"}, "classifier.boost.r": {"title": "A threshold between 0 and 1 used to save computational time. Samples with summary weight <= (1 - weight_trim_rate) do not participate in the next iteration of training. Set this parameter to 0 to turn off this functionality.", "description": "A threshold between 0 and 1 used to save computational time. Samples with summary weight <= (1 - weight_trim_rate) do not participate in the next iteration of training. Set this parameter to 0 to turn off this functionality.", "schema": {"type": "number", "default": 0.95, "format": "double"}, "id": "classifier.boost.r"}, "classifier.boost.m": {"title": "Maximum depth of the tree.", "description": "Maximum depth of the tree.", "schema": {"type": "integer", "default": 1}, "id": "classifier.boost.m"}, "classifier.dt.max": {"title": "The training algorithm attempts to split each node while its depth is smaller than the maximum possible depth of the tree. The actual depth may be smaller if the other termination criteria are met, and/or if the tree is pruned.", "description": "The training algorithm attempts to split each node while its depth is smaller than the maximum possible depth of the tree. The actual depth may be smaller if the other termination criteria are met, and/or if the tree is pruned.", "schema": {"type": "integer", "default": 10}, "id": "classifier.dt.max"}, "classifier.dt.min": {"title": "If the number of samples in a node is smaller than this parameter, then this node will not be split.", "description": "If the number of samples in a node is smaller than this parameter, then this node will not be split.", "schema": {"type": "integer", "default": 10}, "id": "classifier.dt.min"}, "classifier.dt.ra": {"title": "If all absolute differences between an estimated value in a node and the values of the train samples in this node are smaller than this regression accuracy parameter, then the node will not be split further.", "description": "If all absolute differences between an estimated value in a node and the values of the train samples in this node are smaller than this regression accuracy parameter, then the node will not be split further.", "schema": {"type": "number", "default": 0.01, "format": "double"}, "id": "classifier.dt.ra"}, "classifier.dt.cat": {"title": "Cluster possible values of a categorical variable into K <= cat clusters to find a suboptimal split.", "description": "Cluster possible values of a categorical variable into K <= cat clusters to find a suboptimal split.", "schema": {"type": "integer", "default": 10}, "id": "classifier.dt.cat"}, "classifier.dt.f": {"title": "If cv_folds > 1, then it prunes a tree with K-fold cross-validation where K is equal to cv_folds.", "description": "If cv_folds > 1, then it prunes a tree with K-fold cross-validation where K is equal to cv_folds.", "schema": {"type": "integer", "default": 0}, "id": "classifier.dt.f"}, "classifier.dt.r": {"title": "Type of training method for the multilayer perceptron (MLP) neural network.", "description": "Type of training method for the multilayer perceptron (MLP) neural network.", "schema": {"type": "string", "default": "reg", "enum": ["back", "reg"]}, "id": "classifier.dt.r"}, "classifier.ann.sizes": {"title": "The number of neurons in each intermediate layer (excluding input and output layers).", "description": "The number of neurons in each intermediate layer (excluding input and output layers).", "maxOccurs": 1024, "schema": {"type": "string", "default": "Any value"}, "id": "classifier.ann.sizes"}, "classifier.ann.f": {"title": "This function determine whether the output of the node is positive or not depending on the output of the transfert function.", "description": "This function determine whether the output of the node is positive or not depending on the output of the transfert function.", "schema": {"type": "string", "default": "sig", "enum": ["ident", "sig", "gau"]}, "id": "classifier.ann.f"}, "classifier.ann.a": {"title": "Alpha parameter of the activation function (used only with sigmoid and gaussian functions).", "description": "Alpha parameter of the activation function (used only with sigmoid and gaussian functions).", "schema": {"type": "number", "default": 1, "format": "double"}, "id": "classifier.ann.a"}, "classifier.ann.b": {"title": "Beta parameter of the activation function (used only with sigmoid and gaussian functions).", "description": "Beta parameter of the activation function (used only with sigmoid and gaussian functions).", "schema": {"type": "number", "default": 1, "format": "double"}, "id": "classifier.ann.b"}, "classifier.ann.bpdw": {"title": "Strength of the weight gradient term in the BACKPROP method. The recommended value is about 0.1.", "description": "Strength of the weight gradient term in the BACKPROP method. The recommended value is about 0.1.", "schema": {"type": "number", "default": 0.1, "format": "double"}, "id": "classifier.ann.bpdw"}, "classifier.ann.bpms": {"title": "Strength of the momentum term (the difference between weights on the 2 previous iterations). This parameter provides some inertia to smooth the random fluctuations of the weights. It can vary from 0 (the feature is disabled) to 1 and beyond. The value 0.1 or so is good enough.", "description": "Strength of the momentum term (the difference between weights on the 2 previous iterations). This parameter provides some inertia to smooth the random fluctuations of the weights. It can vary from 0 (the feature is disabled) to 1 and beyond. The value 0.1 or so is good enough.", "schema": {"type": "number", "default": 0.1, "format": "double"}, "id": "classifier.ann.bpms"}, "classifier.ann.rdw": {"title": "Initial value Delta_0 of update-values Delta_", "description": "Initial value Delta_0 of update-values Delta_", "schema": {"type": "number", "default": 0.1, "format": "double"}, "id": "classifier.ann.rdw"}, "classifier.ann.rdwm": {"title": "Update-values lower limit Delta_", "description": "Update-values lower limit Delta_", "schema": {"type": "number", "default": 1e-07, "format": "double"}, "id": "classifier.ann.rdwm"}, "classifier.ann.term": {"title": "Termination criteria.", "description": "Termination criteria.", "schema": {"type": "string", "default": "all", "enum": ["iter", "eps", "all"]}, "id": "classifier.ann.term"}, "classifier.ann.eps": {"title": "Epsilon value used in the Termination criteria.", "description": "Epsilon value used in the Termination criteria.", "schema": {"type": "number", "default": 0.01, "format": "double"}, "id": "classifier.ann.eps"}, "classifier.ann.iter": {"title": "Maximum number of iterations used in the Termination criteria.", "description": "Maximum number of iterations used in the Termination criteria.", "schema": {"type": "integer", "default": 1000}, "id": "classifier.ann.iter"}, "classifier.rf.max": {"title": "The depth of the tree. A low value will likely underfit and conversely a high value will likely overfit. The optimal value can be obtained using cross validation or other suitable methods.", "description": "The depth of the tree. A low value will likely underfit and conversely a high value will likely overfit. The optimal value can be obtained using cross validation or other suitable methods.", "schema": {"type": "integer", "default": 5}, "id": "classifier.rf.max"}, "classifier.rf.min": {"title": "If the number of samples in a node is smaller than this parameter, then the node will not be split. A reasonable value is a small percentage of the total data e.g. 1 percent.", "description": "If the number of samples in a node is smaller than this parameter, then the node will not be split. A reasonable value is a small percentage of the total data e.g. 1 percent.", "schema": {"type": "integer", "default": 10}, "id": "classifier.rf.min"}, "classifier.rf.ra": {"title": "If all absolute differences between an estimated value in a node and the values of the train samples in this node are smaller than this regression accuracy parameter, then the node will not be split.", "description": "If all absolute differences between an estimated value in a node and the values of the train samples in this node are smaller than this regression accuracy parameter, then the node will not be split.", "schema": {"type": "number", "default": 0, "format": "double"}, "id": "classifier.rf.ra"}, "classifier.rf.cat": {"title": "Cluster possible values of a categorical variable into K <= cat clusters to find a suboptimal split.", "description": "Cluster possible values of a categorical variable into K <= cat clusters to find a suboptimal split.", "schema": {"type": "integer", "default": 10}, "id": "classifier.rf.cat"}, "classifier.rf.var": {"title": "The size of the subset of features, randomly selected at each tree node, that are used to find the best split(s). If you set it to 0, then the size will be set to the square root of the total number of features.", "description": "The size of the subset of features, randomly selected at each tree node, that are used to find the best split(s). If you set it to 0, then the size will be set to the square root of the total number of features.", "schema": {"type": "integer", "default": 0}, "id": "classifier.rf.var"}, "classifier.rf.nbtrees": {"title": "The maximum number of trees in the forest. Typically, the more trees you have, the better the accuracy. However, the improvement in accuracy generally diminishes and reaches an asymptote for a certain number of trees. Also to keep in mind, increasing the number of trees increases the prediction time linearly.", "description": "The maximum number of trees in the forest. Typically, the more trees you have, the better the accuracy. However, the improvement in accuracy generally diminishes and reaches an asymptote for a certain number of trees. Also to keep in mind, increasing the number of trees increases the prediction time linearly.", "schema": {"type": "integer", "default": 100}, "id": "classifier.rf.nbtrees"}, "classifier.rf.acc": {"title": "Sufficient accuracy (OOB error).", "description": "Sufficient accuracy (OOB error).", "schema": {"type": "number", "default": 0.01, "format": "double"}, "id": "classifier.rf.acc"}, "classifier.knn.k": {"title": "The number of neighbors to use.", "description": "The number of neighbors to use.", "schema": {"type": "integer", "default": 32}, "id": "classifier.knn.k"}, "rand": {"title": "Set specific seed. with integer value.", "description": "Set specific seed. with integer value.", "schema": {"type": "integer", "nullable": true}, "id": "rand"}}, "outputs": {"io.out": {"title": "Output file containing the model estimated (.txt format).", "description": "Output file containing the model estimated (.txt format).", "extended-schema": {"oneOf": [{"allOf": [{"$ref": "http://zoo-project.org/dl/link.json"}, {"type": "object", "properties": {"type": {"enum": ["text/xml", "text/plain"]}}}]}, {"type": "object", "required": ["value"], "properties": {"value": {"oneOf": [{"type": "string", "contentEncoding": "utf-8", "contentMediaType": "text/xml"}, {"type": "string", "contentEncoding": "utf-8", "contentMediaType": "text/plain"}]}}}]}, "schema": {"oneOf": [{"type": "string", "contentEncoding": "utf-8", "contentMediaType": "text/xml"}, {"type": "string", "contentEncoding": "utf-8", "contentMediaType": "text/plain"}]}, "id": "io.out"}, "io.confmatout": {"title": "Output file containing the confusion matrix or contingency table (.csv format).The contingency table is output when we unsupervised algorithms is used otherwise the confusion matrix is output.", "description": "Output file containing the confusion matrix or contingency table (.csv format).The contingency table is output when we unsupervised algorithms is used otherwise the confusion matrix is output.", "extended-schema": {"oneOf": [{"allOf": [{"$ref": "http://zoo-project.org/dl/link.json"}, {"type": "object", "properties": {"type": {"enum": ["text/csv"]}}}]}, {"type": "object", "required": ["value"], "properties": {"value": {"oneOf": [{"type": "string", "contentEncoding": "utf-8", "contentMediaType": "text/csv"}]}}}]}, "schema": {"oneOf": [{"type": "string", "contentEncoding": "utf-8", "contentMediaType": "text/csv"}]}, "id": "io.confmatout"}}}
Display original data as JSON
http://tb17.geolabs.fr:8111/ogc-api/processes/OTB.TrainImagesClassifier.html
Last modified: Sat Dec 4 00:09:36 CET 2021