LogisticRegressionWithLBFGS

class pyspark.mllib.classification.LogisticRegressionWithLBFGS[source]

Train a classification model for Multinomial/Binary Logistic Regression using Limited-memory BFGS.

Standard feature scaling and L2 regularization are used by default. .. versionadded:: 1.2.0

Methods

train(data[, iterations, initialWeights, …])

Train a logistic regression model on the given data.

Methods Documentation

classmethod train(data, iterations=100, initialWeights=None, regParam=0.0, regType='l2', intercept=False, corrections=10, tolerance=1e-06, validateData=True, numClasses=2)[source]

Train a logistic regression model on the given data.

New in version 1.2.0.

Parameters:
datapyspark.RDD

The training data, an RDD of pyspark.mllib.regression.LabeledPoint.

iterationsint, optional

The number of iterations. (default: 100)

initialWeightspyspark.mllib.linalg.Vector or convertible, optional

The initial weights. (default: None)

regParamfloat, optional

The regularizer parameter. (default: 0.01)

regTypestr, optional

The type of regularizer used for training our model. Supported values:

  • “l1” for using L1 regularization

  • “l2” for using L2 regularization (default)

  • None for no regularization

interceptbool, optional

Boolean parameter which indicates the use or not of the augmented representation for training data (i.e., whether bias features are activated or not). (default: False)

correctionsint, optional

The number of corrections used in the LBFGS update. If a known updater is used for binary classification, it calls the ml implementation and this parameter will have no effect. (default: 10)

tolerancefloat, optional

The convergence tolerance of iterations for L-BFGS. (default: 1e-6)

validateDatabool, optional

Boolean parameter which indicates if the algorithm should validate data before training. (default: True)

numClassesint, optional

The number of classes (i.e., outcomes) a label can take in Multinomial Logistic Regression. (default: 2)

Examples

>>> data = [
...     LabeledPoint(0.0, [0.0, 1.0]),
...     LabeledPoint(1.0, [1.0, 0.0]),
... ]
>>> lrm = LogisticRegressionWithLBFGS.train(sc.parallelize(data), iterations=10)
>>> lrm.predict([1.0, 0.0])
1
>>> lrm.predict([0.0, 1.0])
0