LogisticRegressionWithLBFGS¶

class pyspark.mllib.classification.LogisticRegressionWithLBFGS[source]¶

Train a classification model for Multinomial/Binary Logistic Regression using Limited-memory BFGS.

Standard feature scaling and L2 regularization are used by default. .. versionadded:: 1.2.0

Methods

train(data[, iterations, initialWeights, …])

Train a logistic regression model on the given data.

Methods Documentation

classmethod train(data, iterations=100, initialWeights=None, regParam=0.0, regType='l2', intercept=False, corrections=10, tolerance=1e-06, validateData=True, numClasses=2)[source]¶

Train a logistic regression model on the given data.

New in version 1.2.0.

Parameters:

datapyspark.RDD

The training data, an RDD of pyspark.mllib.regression.LabeledPoint.

iterationsint, optional

The number of iterations. (default: 100)

initialWeightspyspark.mllib.linalg.Vector or convertible, optional

The initial weights. (default: None)

regParamfloat, optional

The regularizer parameter. (default: 0.01)

regTypestr, optional

The type of regularizer used for training our model. Supported values:

“l1” for using L1 regularization
“l2” for using L2 regularization (default)
None for no regularization

interceptbool, optional

Boolean parameter which indicates the use or not of the augmented representation for training data (i.e., whether bias features are activated or not). (default: False)

correctionsint, optional

The number of corrections used in the LBFGS update. If a known updater is used for binary classification, it calls the ml implementation and this parameter will have no effect. (default: 10)

tolerancefloat, optional

The convergence tolerance of iterations for L-BFGS. (default: 1e-6)

validateDatabool, optional

Boolean parameter which indicates if the algorithm should validate data before training. (default: True)

numClassesint, optional

The number of classes (i.e., outcomes) a label can take in Multinomial Logistic Regression. (default: 2)

Examples

>>> data = [
...     LabeledPoint(0.0, [0.0, 1.0]),
...     LabeledPoint(1.0, [1.0, 0.0]),
... ]
>>> lrm = LogisticRegressionWithLBFGS.train(sc.parallelize(data), iterations=10)
>>> lrm.predict([1.0, 0.0])
1
>>> lrm.predict([0.0, 1.0])
0

LogisticRegressionWithSGD SVMModel