Modulo (a.k.a.
Modulo (a.k.a. remainder) expression.
1.3.0
Boolean AND.
Boolean AND.
// Scala: The following selects people that are in school and employed at the same time. people.select( people("inSchool") && people("isEmployed") ) // Java: people.select( people("inSchool").and(people("isEmployed")) );
1.3.0
Multiplication of this expression and another expression.
Multiplication of this expression and another expression.
// Scala: The following multiplies a person's height by their weight. people.select( people("height") * people("weight") ) // Java: people.select( people("height").multiply(people("weight")) );
1.3.0
Sum of this expression and another expression.
Sum of this expression and another expression.
// Scala: The following selects the sum of a person's height and weight. people.select( people("height") + people("weight") ) // Java: people.select( people("height").plus(people("weight")) );
1.3.0
Subtraction.
Subtraction. Subtract the other expression from this expression.
// Scala: The following selects the difference between people's height and their weight. people.select( people("height") - people("weight") ) // Java: people.select( people("height").minus(people("weight")) );
1.3.0
Division this expression by another expression.
Division this expression by another expression.
// Scala: The following divides a person's height by their weight. people.select( people("height") / people("weight") ) // Java: people.select( people("height").divide(people("weight")) );
1.3.0
Less than.
Less than.
// Scala: The following selects people younger than 21. people.select( people("age") < 21 ) // Java: people.select( people("age").lt(21) );
1.3.0
Less than or equal to.
Less than or equal to.
// Scala: The following selects people age 21 or younger than 21. people.select( people("age") <= 21 ) // Java: people.select( people("age").leq(21) );
1.3.0
Equality test that is safe for null values.
Equality test that is safe for null values.
1.3.0
Inequality test.
Inequality test.
// Scala: df.select( df("colA") =!= df("colB") ) df.select( !(df("colA") === df("colB")) ) // Java: import static org.apache.spark.sql.functions.*; df.filter( col("colA").notEqual(col("colB")) );
2.0.0
Equality test.
Equality test.
// Scala: df.filter( df("colA") === df("colB") ) // Java import static org.apache.spark.sql.functions.*; df.filter( col("colA").equalTo(col("colB")) );
1.3.0
Greater than.
Greater than.
// Scala: The following selects people older than 21. people.select( people("age") > 21 ) // Java: import static org.apache.spark.sql.functions.*; people.select( people("age").gt(21) );
1.3.0
Greater than or equal to an expression.
Greater than or equal to an expression.
// Scala: The following selects people age 21 or older than 21. people.select( people("age") >= 21 ) // Java: people.select( people("age").geq(21) )
1.3.0
Gives the column an alias.
Gives the column an alias. Same as as
.
// Renames colA to colB in select output. df.select($"colA".alias("colB"))
1.4.0
Boolean AND.
Boolean AND.
// Scala: The following selects people that are in school and employed at the same time. people.select( people("inSchool") && people("isEmployed") ) // Java: people.select( people("inSchool").and(people("isEmployed")) );
1.3.0
Extracts a value or values from a complex type.
Extracts a value or values from a complex type. The following types of extraction are supported:
1.4.0
Gives the column an alias with metadata.
Gives the column an alias with metadata.
val metadata: Metadata = ... df.select($"colA".as("colB", metadata))
1.3.0
Gives the column an alias.
Gives the column an alias.
// Renames colA to colB in select output. df.select($"colA".as('colB))
If the current column has metadata associated with it, this metadata will be propagated
to the new column. If this not desired, use as
with explicitly empty metadata.
1.3.0
Assigns the given aliases to the results of a table generating function.
Assigns the given aliases to the results of a table generating function.
// Renames colA to colB in select output. df.select(explode($"myMap").as("key" :: "value" :: Nil))
1.4.0
(Scala-specific) Assigns the given aliases to the results of a table generating function.
(Scala-specific) Assigns the given aliases to the results of a table generating function.
// Renames colA to colB in select output. df.select(explode($"myMap").as("key" :: "value" :: Nil))
1.4.0
Gives the column an alias.
Gives the column an alias.
// Renames colA to colB in select output. df.select($"colA".as("colB"))
If the current column has metadata associated with it, this metadata will be propagated
to the new column. If this not desired, use as
with explicitly empty metadata.
1.3.0
Provides a type hint about the expected return value of this column.
Provides a type hint about the expected return value of this column. This information can
be used by operations such as select
on a Dataset to automatically convert the
results into the correct JVM types.
1.6.0
Returns a sort expression based on ascending order of the column.
Returns a sort expression based on ascending order of the column.
// Scala: sort a DataFrame by age column in ascending order. df.sort(df("age").asc) // Java df.sort(df.col("age").asc());
1.3.0
Returns a sort expression based on ascending order of the column, and null values return before non-null values.
Returns a sort expression based on ascending order of the column, and null values return before non-null values.
// Scala: sort a DataFrame by age column in ascending order and null values appearing first. df.sort(df("age").asc_nulls_last) // Java df.sort(df.col("age").asc_nulls_last());
2.1.0
Returns a sort expression based on ascending order of the column, and null values appear after non-null values.
Returns a sort expression based on ascending order of the column, and null values appear after non-null values.
// Scala: sort a DataFrame by age column in ascending order and null values appearing last. df.sort(df("age").asc_nulls_last) // Java df.sort(df.col("age").asc_nulls_last());
2.1.0
True if the current column is between the lower bound and upper bound, inclusive.
True if the current column is between the lower bound and upper bound, inclusive.
1.4.0
Compute bitwise AND of this expression with another expression.
Compute bitwise AND of this expression with another expression.
df.select($"colA".bitwiseAND($"colB"))
1.4.0
Compute bitwise OR of this expression with another expression.
Compute bitwise OR of this expression with another expression.
df.select($"colA".bitwiseOR($"colB"))
1.4.0
Compute bitwise XOR of this expression with another expression.
Compute bitwise XOR of this expression with another expression.
df.select($"colA".bitwiseXOR($"colB"))
1.4.0
Casts the column to a different data type, using the canonical string representation of the type.
Casts the column to a different data type, using the canonical string representation
of the type. The supported types are: string
, boolean
, byte
, short
, int
, long
,
float
, double
, decimal
, date
, timestamp
.
// Casts colA to integer. df.select(df("colA").cast("int"))
1.3.0
Casts the column to a different data type.
Casts the column to a different data type.
// Casts colA to IntegerType. import org.apache.spark.sql.types.IntegerType df.select(df("colA").cast(IntegerType)) // equivalent to df.select(df("colA").cast("int"))
1.3.0
Contains the other element.
Contains the other element. Returns a boolean column based on a string match.
1.3.0
Returns a sort expression based on the descending order of the column.
Returns a sort expression based on the descending order of the column.
// Scala df.sort(df("age").desc) // Java df.sort(df.col("age").desc());
1.3.0
Returns a sort expression based on the descending order of the column, and null values appear before non-null values.
Returns a sort expression based on the descending order of the column, and null values appear before non-null values.
// Scala: sort a DataFrame by age column in descending order and null values appearing first. df.sort(df("age").desc_nulls_first) // Java df.sort(df.col("age").desc_nulls_first());
2.1.0
Returns a sort expression based on the descending order of the column, and null values appear after non-null values.
Returns a sort expression based on the descending order of the column, and null values appear after non-null values.
// Scala: sort a DataFrame by age column in descending order and null values appearing last. df.sort(df("age").desc_nulls_last) // Java df.sort(df.col("age").desc_nulls_last());
2.1.0
Division this expression by another expression.
Division this expression by another expression.
// Scala: The following divides a person's height by their weight. people.select( people("height") / people("weight") ) // Java: people.select( people("height").divide(people("weight")) );
1.3.0
String ends with another string literal.
String ends with another string literal. Returns a boolean column based on a string match.
1.3.0
String ends with.
String ends with. Returns a boolean column based on a string match.
1.3.0
Equality test that is safe for null values.
Equality test that is safe for null values.
1.3.0
Equality test.
Equality test.
// Scala: df.filter( df("colA") === df("colB") ) // Java import static org.apache.spark.sql.functions.*; df.filter( col("colA").equalTo(col("colB")) );
1.3.0
Prints the expression to the console for debugging purposes.
Prints the expression to the console for debugging purposes.
1.3.0
Greater than or equal to an expression.
Greater than or equal to an expression.
// Scala: The following selects people age 21 or older than 21. people.select( people("age") >= 21 ) // Java: people.select( people("age").geq(21) )
1.3.0
An expression that gets a field by name in a StructType
.
An expression that gets a field by name in a StructType
.
1.3.0
An expression that gets an item at position ordinal
out of an array,
or gets a value by key key
in a MapType
.
An expression that gets an item at position ordinal
out of an array,
or gets a value by key key
in a MapType
.
1.3.0
Greater than.
Greater than.
// Scala: The following selects people older than 21. people.select( people("age") > lit(21) ) // Java: import static org.apache.spark.sql.functions.*; people.select( people("age").gt(21) );
1.3.0
True if the current expression is NaN.
True if the current expression is NaN.
1.5.0
True if the current expression is NOT null.
True if the current expression is NOT null.
1.3.0
True if the current expression is null.
True if the current expression is null.
1.3.0
A boolean expression that is evaluated to true if the value of this expression is contained by the evaluated values of the arguments.
A boolean expression that is evaluated to true if the value of this expression is contained by the evaluated values of the arguments.
1.5.0
Less than or equal to.
Less than or equal to.
// Scala: The following selects people age 21 or younger than 21. people.select( people("age") <= 21 ) // Java: people.select( people("age").leq(21) );
1.3.0
SQL like expression.
SQL like expression. Returns a boolean column based on a SQL LIKE match.
1.3.0
Less than.
Less than.
// Scala: The following selects people younger than 21. people.select( people("age") < 21 ) // Java: people.select( people("age").lt(21) );
1.3.0
Subtraction.
Subtraction. Subtract the other expression from this expression.
// Scala: The following selects the difference between people's height and their weight. people.select( people("height") - people("weight") ) // Java: people.select( people("height").minus(people("weight")) );
1.3.0
Modulo (a.k.a.
Modulo (a.k.a. remainder) expression.
1.3.0
Multiplication of this expression and another expression.
Multiplication of this expression and another expression.
// Scala: The following multiplies a person's height by their weight. people.select( people("height") * people("weight") ) // Java: people.select( people("height").multiply(people("weight")) );
1.3.0
Gives the column a name (alias).
Gives the column a name (alias).
// Renames colA to colB in select output. df.select($"colA".name("colB"))
If the current column has metadata associated with it, this metadata will be propagated
to the new column. If this not desired, use as
with explicitly empty metadata.
2.0.0
Inequality test.
Inequality test.
// Scala: df.select( df("colA") !== df("colB") ) df.select( !(df("colA") === df("colB")) ) // Java: import static org.apache.spark.sql.functions.*; df.filter( col("colA").notEqual(col("colB")) );
1.3.0
Boolean OR.
Boolean OR.
// Scala: The following selects people that are in school or employed. people.filter( people("inSchool") || people("isEmployed") ) // Java: people.filter( people("inSchool").or(people("isEmployed")) );
1.3.0
Evaluates a list of conditions and returns one of multiple possible result expressions.
Evaluates a list of conditions and returns one of multiple possible result expressions. If otherwise is not defined at the end, null is returned for unmatched conditions.
// Example: encoding gender string column into integer. // Scala: people.select(when(people("gender") === "male", 0) .when(people("gender") === "female", 1) .otherwise(2)) // Java: people.select(when(col("gender").equalTo("male"), 0) .when(col("gender").equalTo("female"), 1) .otherwise(2))
1.4.0
Defines an empty analytic clause.
Defines an empty analytic clause. In this case the analytic function is applied and presented for all rows in the result set.
df.select( sum("price").over(), avg("price").over() )
2.0.0
Defines a windowing column.
Defines a windowing column.
val w = Window.partitionBy("name").orderBy("id") df.select( sum("price").over(w.rangeBetween(Window.unboundedPreceding, 2)), avg("price").over(w.rowsBetween(Window.currentRow, 4)) )
1.4.0
Sum of this expression and another expression.
Sum of this expression and another expression.
// Scala: The following selects the sum of a person's height and weight. people.select( people("height") + people("weight") ) // Java: people.select( people("height").plus(people("weight")) );
1.3.0
SQL RLIKE expression (LIKE with Regex).
SQL RLIKE expression (LIKE with Regex). Returns a boolean column based on a regex match.
1.3.0
String starts with another string literal.
String starts with another string literal. Returns a boolean column based on a string match.
1.3.0
String starts with.
String starts with. Returns a boolean column based on a string match.
1.3.0
An expression that returns a substring.
An expression that returns a substring.
starting position.
length of the substring.
1.3.0
An expression that returns a substring.
An expression that returns a substring.
expression for the starting position.
expression for the length of the substring.
1.3.0
Inversion of boolean expression, i.e.
Inversion of boolean expression, i.e. NOT.
// Scala: select rows that are not active (isActive === false) df.filter( !df("isActive") ) // Java: import static org.apache.spark.sql.functions.*; df.filter( not(df.col("isActive")) );
1.3.0
Unary minus, i.e.
Unary minus, i.e. negate the expression.
// Scala: select the amount column and negates all values. df.select( -df("amount") ) // Java: import static org.apache.spark.sql.functions.*; df.select( negate(col("amount") );
1.3.0
Evaluates a list of conditions and returns one of multiple possible result expressions.
Evaluates a list of conditions and returns one of multiple possible result expressions. If otherwise is not defined at the end, null is returned for unmatched conditions.
// Example: encoding gender string column into integer. // Scala: people.select(when(people("gender") === "male", 0) .when(people("gender") === "female", 1) .otherwise(2)) // Java: people.select(when(col("gender").equalTo("male"), 0) .when(col("gender").equalTo("female"), 1) .otherwise(2))
1.4.0
Boolean OR.
Boolean OR.
// Scala: The following selects people that are in school or employed. people.filter( people("inSchool") || people("isEmployed") ) // Java: people.filter( people("inSchool").or(people("isEmployed")) );
1.3.0
Inequality test.
Inequality test.
// Scala: df.select( df("colA") !== df("colB") ) df.select( !(df("colA") === df("colB")) ) // Java: import static org.apache.spark.sql.functions.*; df.filter( col("colA").notEqual(col("colB")) );
(Since version 2.0.0) !== does not have the same precedence as ===, use =!= instead
1.3.0
A column that will be computed based on the data in a
DataFrame
.A new column can be constructed based on the input columns present in a DataFrame:
Column objects can be composed to form complex expressions:
1.3.0
The internal Catalyst expression can be accessed via expr, but this method is for debugging purposes only and can change in any future Spark releases.