public class functions
extends Object
Constructor and Description |
---|
functions() |
Modifier and Type | Method and Description |
---|---|
static Column |
abs(Column e)
Computes the absolute value.
|
static Column |
acos(Column e)
Computes the cosine inverse of the given value; the returned angle is in the range
0.0 through pi.
|
static Column |
acos(String columnName)
Computes the cosine inverse of the given column; the returned angle is in the range
0.0 through pi.
|
static Column |
add_months(Column startDate,
int numMonths)
Returns the date that is numMonths after startDate.
|
static Column |
approx_count_distinct(Column e)
Aggregate function: returns the approximate number of distinct items in a group.
|
static Column |
approx_count_distinct(Column e,
double rsd)
Aggregate function: returns the approximate number of distinct items in a group.
|
static Column |
approx_count_distinct(String columnName)
Aggregate function: returns the approximate number of distinct items in a group.
|
static Column |
approx_count_distinct(String columnName,
double rsd)
Aggregate function: returns the approximate number of distinct items in a group.
|
static Column |
approxCountDistinct(Column e)
Deprecated.
Use approx_count_distinct. Since 2.1.0.
|
static Column |
approxCountDistinct(Column e,
double rsd)
Deprecated.
Use approx_count_distinct. Since 2.1.0.
|
static Column |
approxCountDistinct(String columnName)
Deprecated.
Use approx_count_distinct. Since 2.1.0.
|
static Column |
approxCountDistinct(String columnName,
double rsd)
Deprecated.
Use approx_count_distinct. Since 2.1.0.
|
static Column |
array_contains(Column column,
Object value)
Returns null if the array is null, true if the array contains
value , and false otherwise. |
static Column |
array(Column... cols)
Creates a new array column.
|
static Column |
array(scala.collection.Seq<Column> cols)
Creates a new array column.
|
static Column |
array(String colName,
scala.collection.Seq<String> colNames)
Creates a new array column.
|
static Column |
array(String colName,
String... colNames)
Creates a new array column.
|
static Column |
asc_nulls_first(String columnName)
Returns a sort expression based on ascending order of the column,
and null values return before non-null values.
|
static Column |
asc_nulls_last(String columnName)
Returns a sort expression based on ascending order of the column,
and null values appear after non-null values.
|
static Column |
asc(String columnName)
Returns a sort expression based on ascending order of the column.
|
static Column |
ascii(Column e)
Computes the numeric value of the first character of the string column, and returns the
result as an int column.
|
static Column |
asin(Column e)
Computes the sine inverse of the given value; the returned angle is in the range
-pi/2 through pi/2.
|
static Column |
asin(String columnName)
Computes the sine inverse of the given column; the returned angle is in the range
-pi/2 through pi/2.
|
static Column |
atan(Column e)
Computes the tangent inverse of the given value.
|
static Column |
atan(String columnName)
Computes the tangent inverse of the given column.
|
static Column |
atan2(Column l,
Column r)
Returns the angle theta from the conversion of rectangular coordinates (x, y) to
polar coordinates (r, theta).
|
static Column |
atan2(Column l,
double r)
Returns the angle theta from the conversion of rectangular coordinates (x, y) to
polar coordinates (r, theta).
|
static Column |
atan2(Column l,
String rightName)
Returns the angle theta from the conversion of rectangular coordinates (x, y) to
polar coordinates (r, theta).
|
static Column |
atan2(double l,
Column r)
Returns the angle theta from the conversion of rectangular coordinates (x, y) to
polar coordinates (r, theta).
|
static Column |
atan2(double l,
String rightName)
Returns the angle theta from the conversion of rectangular coordinates (x, y) to
polar coordinates (r, theta).
|
static Column |
atan2(String leftName,
Column r)
Returns the angle theta from the conversion of rectangular coordinates (x, y) to
polar coordinates (r, theta).
|
static Column |
atan2(String leftName,
double r)
Returns the angle theta from the conversion of rectangular coordinates (x, y) to
polar coordinates (r, theta).
|
static Column |
atan2(String leftName,
String rightName)
Returns the angle theta from the conversion of rectangular coordinates (x, y) to
polar coordinates (r, theta).
|
static Column |
avg(Column e)
Aggregate function: returns the average of the values in a group.
|
static Column |
avg(String columnName)
Aggregate function: returns the average of the values in a group.
|
static Column |
base64(Column e)
Computes the BASE64 encoding of a binary column and returns it as a string column.
|
static Column |
bin(Column e)
An expression that returns the string representation of the binary value of the given long
column.
|
static Column |
bin(String columnName)
An expression that returns the string representation of the binary value of the given long
column.
|
static Column |
bitwiseNOT(Column e)
Computes bitwise NOT.
|
static <T> Dataset<T> |
broadcast(Dataset<T> df)
Marks a DataFrame as small enough for use in broadcast joins.
|
static Column |
bround(Column e)
Returns the value of the column
e rounded to 0 decimal places with HALF_EVEN round mode. |
static Column |
bround(Column e,
int scale)
Round the value of
e to scale decimal places with HALF_EVEN round mode
if scale is greater than or equal to 0 or at integral part when scale is less than 0. |
static Column |
callUDF(String udfName,
Column... cols)
Call an user-defined function.
|
static Column |
callUDF(String udfName,
scala.collection.Seq<Column> cols)
Call an user-defined function.
|
static Column |
cbrt(Column e)
Computes the cube-root of the given value.
|
static Column |
cbrt(String columnName)
Computes the cube-root of the given column.
|
static Column |
ceil(Column e)
Computes the ceiling of the given value.
|
static Column |
ceil(String columnName)
Computes the ceiling of the given column.
|
static Column |
coalesce(Column... e)
Returns the first column that is not null, or null if all inputs are null.
|
static Column |
coalesce(scala.collection.Seq<Column> e)
Returns the first column that is not null, or null if all inputs are null.
|
static Column |
col(String colName)
Returns a
Column based on the given column name. |
static Column |
collect_list(Column e)
Aggregate function: returns a list of objects with duplicates.
|
static Column |
collect_list(String columnName)
Aggregate function: returns a list of objects with duplicates.
|
static Column |
collect_set(Column e)
Aggregate function: returns a set of objects with duplicate elements eliminated.
|
static Column |
collect_set(String columnName)
Aggregate function: returns a set of objects with duplicate elements eliminated.
|
static Column |
column(String colName)
Returns a
Column based on the given column name. |
static Column |
concat_ws(String sep,
Column... exprs)
Concatenates multiple input string columns together into a single string column,
using the given separator.
|
static Column |
concat_ws(String sep,
scala.collection.Seq<Column> exprs)
Concatenates multiple input string columns together into a single string column,
using the given separator.
|
static Column |
concat(Column... exprs)
Concatenates multiple input string columns together into a single string column.
|
static Column |
concat(scala.collection.Seq<Column> exprs)
Concatenates multiple input string columns together into a single string column.
|
static Column |
conv(Column num,
int fromBase,
int toBase)
Convert a number in a string column from one base to another.
|
static Column |
corr(Column column1,
Column column2)
Aggregate function: returns the Pearson Correlation Coefficient for two columns.
|
static Column |
corr(String columnName1,
String columnName2)
Aggregate function: returns the Pearson Correlation Coefficient for two columns.
|
static Column |
cos(Column e)
Computes the cosine of the given value.
|
static Column |
cos(String columnName)
Computes the cosine of the given column.
|
static Column |
cosh(Column e)
Computes the hyperbolic cosine of the given value.
|
static Column |
cosh(String columnName)
Computes the hyperbolic cosine of the given column.
|
static Column |
count(Column e)
Aggregate function: returns the number of items in a group.
|
static TypedColumn<Object,Object> |
count(String columnName)
Aggregate function: returns the number of items in a group.
|
static Column |
countDistinct(Column expr,
Column... exprs)
Aggregate function: returns the number of distinct items in a group.
|
static Column |
countDistinct(Column expr,
scala.collection.Seq<Column> exprs)
Aggregate function: returns the number of distinct items in a group.
|
static Column |
countDistinct(String columnName,
scala.collection.Seq<String> columnNames)
Aggregate function: returns the number of distinct items in a group.
|
static Column |
countDistinct(String columnName,
String... columnNames)
Aggregate function: returns the number of distinct items in a group.
|
static Column |
covar_pop(Column column1,
Column column2)
Aggregate function: returns the population covariance for two columns.
|
static Column |
covar_pop(String columnName1,
String columnName2)
Aggregate function: returns the population covariance for two columns.
|
static Column |
covar_samp(Column column1,
Column column2)
Aggregate function: returns the sample covariance for two columns.
|
static Column |
covar_samp(String columnName1,
String columnName2)
Aggregate function: returns the sample covariance for two columns.
|
static Column |
crc32(Column e)
Calculates the cyclic redundancy check value (CRC32) of a binary column and
returns the value as a bigint.
|
static Column |
cume_dist()
Window function: returns the cumulative distribution of values within a window partition,
i.e.
|
static Column |
current_date()
Returns the current date as a date column.
|
static Column |
current_timestamp()
Returns the current timestamp as a timestamp column.
|
static Column |
date_add(Column start,
int days)
Returns the date that is
days days after start |
static Column |
date_format(Column dateExpr,
String format)
Converts a date/timestamp/string to a value of string in the format specified by the date
format given by the second argument.
|
static Column |
date_sub(Column start,
int days)
Returns the date that is
days days before start |
static Column |
datediff(Column end,
Column start)
Returns the number of days from
start to end . |
static Column |
dayofmonth(Column e)
Extracts the day of the month as an integer from a given date/timestamp/string.
|
static Column |
dayofyear(Column e)
Extracts the day of the year as an integer from a given date/timestamp/string.
|
static Column |
decode(Column value,
String charset)
Computes the first argument into a string from a binary using the provided character set
(one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16').
|
static Column |
degrees(Column e)
Converts an angle measured in radians to an approximately equivalent angle measured in degrees.
|
static Column |
degrees(String columnName)
Converts an angle measured in radians to an approximately equivalent angle measured in degrees.
|
static Column |
dense_rank()
Window function: returns the rank of rows within a window partition, without any gaps.
|
static Column |
desc_nulls_first(String columnName)
Returns a sort expression based on the descending order of the column,
and null values appear before non-null values.
|
static Column |
desc_nulls_last(String columnName)
Returns a sort expression based on the descending order of the column,
and null values appear after non-null values.
|
static Column |
desc(String columnName)
Returns a sort expression based on the descending order of the column.
|
static Column |
encode(Column value,
String charset)
Computes the first argument into a binary from a string using the provided character set
(one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16').
|
static Column |
exp(Column e)
Computes the exponential of the given value.
|
static Column |
exp(String columnName)
Computes the exponential of the given column.
|
static Column |
explode_outer(Column e)
Creates a new row for each element in the given array or map column.
|
static Column |
explode(Column e)
Creates a new row for each element in the given array or map column.
|
static Column |
expm1(Column e)
Computes the exponential of the given value minus one.
|
static Column |
expm1(String columnName)
Computes the exponential of the given column.
|
static Column |
expr(String expr)
Parses the expression string into the column that it represents, similar to
DataFrame.selectExpr
|
static Column |
factorial(Column e)
Computes the factorial of the given value.
|
static Column |
first(Column e)
Aggregate function: returns the first value in a group.
|
static Column |
first(Column e,
boolean ignoreNulls)
Aggregate function: returns the first value in a group.
|
static Column |
first(String columnName)
Aggregate function: returns the first value of a column in a group.
|
static Column |
first(String columnName,
boolean ignoreNulls)
Aggregate function: returns the first value of a column in a group.
|
static Column |
floor(Column e)
Computes the floor of the given value.
|
static Column |
floor(String columnName)
Computes the floor of the given column.
|
static Column |
format_number(Column x,
int d)
Formats numeric column x to a format like '#,###,###.##', rounded to d decimal places
with HALF_EVEN round mode, and returns the result as a string column.
|
static Column |
format_string(String format,
Column... arguments)
Formats the arguments in printf-style and returns the result as a string column.
|
static Column |
format_string(String format,
scala.collection.Seq<Column> arguments)
Formats the arguments in printf-style and returns the result as a string column.
|
static Column |
from_json(Column e,
DataType schema)
Parses a column containing a JSON string into a
StructType or ArrayType of StructType s
with the specified schema. |
static Column |
from_json(Column e,
DataType schema,
scala.collection.immutable.Map<String,String> options)
(Scala-specific) Parses a column containing a JSON string into a
StructType or ArrayType
of StructType s with the specified schema. |
static Column |
from_json(Column e,
DataType schema,
java.util.Map<String,String> options)
(Java-specific) Parses a column containing a JSON string into a
StructType or ArrayType
of StructType s with the specified schema. |
static Column |
from_json(Column e,
String schema,
java.util.Map<String,String> options)
Parses a column containing a JSON string into a
StructType or ArrayType of StructType s
with the specified schema. |
static Column |
from_json(Column e,
StructType schema)
Parses a column containing a JSON string into a
StructType with the specified schema. |
static Column |
from_json(Column e,
StructType schema,
scala.collection.immutable.Map<String,String> options)
(Scala-specific) Parses a column containing a JSON string into a
StructType with the
specified schema. |
static Column |
from_json(Column e,
StructType schema,
java.util.Map<String,String> options)
(Java-specific) Parses a column containing a JSON string into a
StructType with the
specified schema. |
static Column |
from_unixtime(Column ut)
Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string
representing the timestamp of that moment in the current system time zone in the given
format.
|
static Column |
from_unixtime(Column ut,
String f)
Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string
representing the timestamp of that moment in the current system time zone in the given
format.
|
static Column |
from_utc_timestamp(Column ts,
String tz)
Given a timestamp, which corresponds to a certain time of day in UTC, returns another timestamp
that corresponds to the same time of day in the given timezone.
|
static Column |
get_json_object(Column e,
String path)
Extracts json object from a json string based on json path specified, and returns json string
of the extracted json object.
|
static Column |
greatest(Column... exprs)
Returns the greatest value of the list of values, skipping null values.
|
static Column |
greatest(scala.collection.Seq<Column> exprs)
Returns the greatest value of the list of values, skipping null values.
|
static Column |
greatest(String columnName,
scala.collection.Seq<String> columnNames)
Returns the greatest value of the list of column names, skipping null values.
|
static Column |
greatest(String columnName,
String... columnNames)
Returns the greatest value of the list of column names, skipping null values.
|
static Column |
grouping_id(scala.collection.Seq<Column> cols)
Aggregate function: returns the level of grouping, equals to
|
static Column |
grouping_id(String colName,
scala.collection.Seq<String> colNames)
Aggregate function: returns the level of grouping, equals to
|
static Column |
grouping(Column e)
Aggregate function: indicates whether a specified column in a GROUP BY list is aggregated
or not, returns 1 for aggregated or 0 for not aggregated in the result set.
|
static Column |
grouping(String columnName)
Aggregate function: indicates whether a specified column in a GROUP BY list is aggregated
or not, returns 1 for aggregated or 0 for not aggregated in the result set.
|
static Column |
hash(Column... cols)
Calculates the hash code of given columns, and returns the result as an int column.
|
static Column |
hash(scala.collection.Seq<Column> cols)
Calculates the hash code of given columns, and returns the result as an int column.
|
static Column |
hex(Column column)
Computes hex value of the given column.
|
static Column |
hour(Column e)
Extracts the hours as an integer from a given date/timestamp/string.
|
static Column |
hypot(Column l,
Column r)
Computes
sqrt(a^2^ + b^2^) without intermediate overflow or underflow. |
static Column |
hypot(Column l,
double r)
Computes
sqrt(a^2^ + b^2^) without intermediate overflow or underflow. |
static Column |
hypot(Column l,
String rightName)
Computes
sqrt(a^2^ + b^2^) without intermediate overflow or underflow. |
static Column |
hypot(double l,
Column r)
Computes
sqrt(a^2^ + b^2^) without intermediate overflow or underflow. |
static Column |
hypot(double l,
String rightName)
Computes
sqrt(a^2^ + b^2^) without intermediate overflow or underflow. |
static Column |
hypot(String leftName,
Column r)
Computes
sqrt(a^2^ + b^2^) without intermediate overflow or underflow. |
static Column |
hypot(String leftName,
double r)
Computes
sqrt(a^2^ + b^2^) without intermediate overflow or underflow. |
static Column |
hypot(String leftName,
String rightName)
Computes
sqrt(a^2^ + b^2^) without intermediate overflow or underflow. |
static Column |
initcap(Column e)
Returns a new string column by converting the first letter of each word to uppercase.
|
static Column |
input_file_name()
Creates a string column for the file name of the current Spark task.
|
static Column |
instr(Column str,
String substring)
Locate the position of the first occurrence of substr column in the given string.
|
static Column |
isnan(Column e)
Return true iff the column is NaN.
|
static Column |
isnull(Column e)
Return true iff the column is null.
|
static Column |
json_tuple(Column json,
scala.collection.Seq<String> fields)
Creates a new row for a json column according to the given field names.
|
static Column |
json_tuple(Column json,
String... fields)
Creates a new row for a json column according to the given field names.
|
static Column |
kurtosis(Column e)
Aggregate function: returns the kurtosis of the values in a group.
|
static Column |
kurtosis(String columnName)
Aggregate function: returns the kurtosis of the values in a group.
|
static Column |
lag(Column e,
int offset)
Window function: returns the value that is
offset rows before the current row, and
null if there is less than offset rows before the current row. |
static Column |
lag(Column e,
int offset,
Object defaultValue)
Window function: returns the value that is
offset rows before the current row, and
defaultValue if there is less than offset rows before the current row. |
static Column |
lag(String columnName,
int offset)
Window function: returns the value that is
offset rows before the current row, and
null if there is less than offset rows before the current row. |
static Column |
lag(String columnName,
int offset,
Object defaultValue)
Window function: returns the value that is
offset rows before the current row, and
defaultValue if there is less than offset rows before the current row. |
static Column |
last_day(Column e)
Given a date column, returns the last day of the month which the given date belongs to.
|
static Column |
last(Column e)
Aggregate function: returns the last value in a group.
|
static Column |
last(Column e,
boolean ignoreNulls)
Aggregate function: returns the last value in a group.
|
static Column |
last(String columnName)
Aggregate function: returns the last value of the column in a group.
|
static Column |
last(String columnName,
boolean ignoreNulls)
Aggregate function: returns the last value of the column in a group.
|
static Column |
lead(Column e,
int offset)
Window function: returns the value that is
offset rows after the current row, and
null if there is less than offset rows after the current row. |
static Column |
lead(Column e,
int offset,
Object defaultValue)
Window function: returns the value that is
offset rows after the current row, and
defaultValue if there is less than offset rows after the current row. |
static Column |
lead(String columnName,
int offset)
Window function: returns the value that is
offset rows after the current row, and
null if there is less than offset rows after the current row. |
static Column |
lead(String columnName,
int offset,
Object defaultValue)
Window function: returns the value that is
offset rows after the current row, and
defaultValue if there is less than offset rows after the current row. |
static Column |
least(Column... exprs)
Returns the least value of the list of values, skipping null values.
|
static Column |
least(scala.collection.Seq<Column> exprs)
Returns the least value of the list of values, skipping null values.
|
static Column |
least(String columnName,
scala.collection.Seq<String> columnNames)
Returns the least value of the list of column names, skipping null values.
|
static Column |
least(String columnName,
String... columnNames)
Returns the least value of the list of column names, skipping null values.
|
static Column |
length(Column e)
Computes the length of a given string or binary column.
|
static Column |
levenshtein(Column l,
Column r)
Computes the Levenshtein distance of the two given string columns.
|
static Column |
lit(Object literal)
Creates a
Column of literal value. |
static Column |
locate(String substr,
Column str)
Locate the position of the first occurrence of substr.
|
static Column |
locate(String substr,
Column str,
int pos)
Locate the position of the first occurrence of substr in a string column, after position pos.
|
static Column |
log(Column e)
Computes the natural logarithm of the given value.
|
static Column |
log(double base,
Column a)
Returns the first argument-base logarithm of the second argument.
|
static Column |
log(double base,
String columnName)
Returns the first argument-base logarithm of the second argument.
|
static Column |
log(String columnName)
Computes the natural logarithm of the given column.
|
static Column |
log10(Column e)
Computes the logarithm of the given value in base 10.
|
static Column |
log10(String columnName)
Computes the logarithm of the given value in base 10.
|
static Column |
log1p(Column e)
Computes the natural logarithm of the given value plus one.
|
static Column |
log1p(String columnName)
Computes the natural logarithm of the given column plus one.
|
static Column |
log2(Column expr)
Computes the logarithm of the given column in base 2.
|
static Column |
log2(String columnName)
Computes the logarithm of the given value in base 2.
|
static Column |
lower(Column e)
Converts a string column to lower case.
|
static Column |
lpad(Column str,
int len,
String pad)
Left-pad the string column with pad to a length of len.
|
static Column |
ltrim(Column e)
Trim the spaces from left end for the specified string value.
|
static Column |
map(Column... cols)
Creates a new map column.
|
static Column |
map(scala.collection.Seq<Column> cols)
Creates a new map column.
|
static Column |
max(Column e)
Aggregate function: returns the maximum value of the expression in a group.
|
static Column |
max(String columnName)
Aggregate function: returns the maximum value of the column in a group.
|
static Column |
md5(Column e)
Calculates the MD5 digest of a binary column and returns the value
as a 32 character hex string.
|
static Column |
mean(Column e)
Aggregate function: returns the average of the values in a group.
|
static Column |
mean(String columnName)
Aggregate function: returns the average of the values in a group.
|
static Column |
min(Column e)
Aggregate function: returns the minimum value of the expression in a group.
|
static Column |
min(String columnName)
Aggregate function: returns the minimum value of the column in a group.
|
static Column |
minute(Column e)
Extracts the minutes as an integer from a given date/timestamp/string.
|
static Column |
monotonically_increasing_id()
A column expression that generates monotonically increasing 64-bit integers.
|
static Column |
monotonicallyIncreasingId()
Deprecated.
Use monotonically_increasing_id(). Since 2.0.0.
|
static Column |
month(Column e)
Extracts the month as an integer from a given date/timestamp/string.
|
static Column |
months_between(Column date1,
Column date2)
Returns number of months between dates
date1 and date2 . |
static Column |
nanvl(Column col1,
Column col2)
Returns col1 if it is not NaN, or col2 if col1 is NaN.
|
static Column |
negate(Column e)
Unary minus, i.e.
|
static Column |
next_day(Column date,
String dayOfWeek)
Given a date column, returns the first date which is later than the value of the date column
that is on the specified day of the week.
|
static Column |
not(Column e)
Inversion of boolean expression, i.e.
|
static Column |
ntile(int n)
Window function: returns the ntile group id (from 1 to
n inclusive) in an ordered window
partition. |
static Column |
percent_rank()
Window function: returns the relative rank (i.e.
|
static Column |
pmod(Column dividend,
Column divisor)
Returns the positive value of dividend mod divisor.
|
static Column |
posexplode_outer(Column e)
Creates a new row for each element with position in the given array or map column.
|
static Column |
posexplode(Column e)
Creates a new row for each element with position in the given array or map column.
|
static Column |
pow(Column l,
Column r)
Returns the value of the first argument raised to the power of the second argument.
|
static Column |
pow(Column l,
double r)
Returns the value of the first argument raised to the power of the second argument.
|
static Column |
pow(Column l,
String rightName)
Returns the value of the first argument raised to the power of the second argument.
|
static Column |
pow(double l,
Column r)
Returns the value of the first argument raised to the power of the second argument.
|
static Column |
pow(double l,
String rightName)
Returns the value of the first argument raised to the power of the second argument.
|
static Column |
pow(String leftName,
Column r)
Returns the value of the first argument raised to the power of the second argument.
|
static Column |
pow(String leftName,
double r)
Returns the value of the first argument raised to the power of the second argument.
|
static Column |
pow(String leftName,
String rightName)
Returns the value of the first argument raised to the power of the second argument.
|
static Column |
quarter(Column e)
Extracts the quarter as an integer from a given date/timestamp/string.
|
static Column |
radians(Column e)
Converts an angle measured in degrees to an approximately equivalent angle measured in radians.
|
static Column |
radians(String columnName)
Converts an angle measured in degrees to an approximately equivalent angle measured in radians.
|
static Column |
rand()
Generate a random column with independent and identically distributed (i.i.d.) samples
from U[0.0, 1.0].
|
static Column |
rand(long seed)
Generate a random column with independent and identically distributed (i.i.d.) samples
from U[0.0, 1.0].
|
static Column |
randn()
Generate a column with independent and identically distributed (i.i.d.) samples from
the standard normal distribution.
|
static Column |
randn(long seed)
Generate a column with independent and identically distributed (i.i.d.) samples from
the standard normal distribution.
|
static Column |
rank()
Window function: returns the rank of rows within a window partition.
|
static Column |
regexp_extract(Column e,
String exp,
int groupIdx)
Extract a specific group matched by a Java regex, from the specified string column.
|
static Column |
regexp_replace(Column e,
Column pattern,
Column replacement)
Replace all substrings of the specified string value that match regexp with rep.
|
static Column |
regexp_replace(Column e,
String pattern,
String replacement)
Replace all substrings of the specified string value that match regexp with rep.
|
static Column |
repeat(Column str,
int n)
Repeats a string column n times, and returns it as a new string column.
|
static Column |
reverse(Column str)
Reverses the string column and returns it as a new string column.
|
static Column |
rint(Column e)
Returns the double value that is closest in value to the argument and
is equal to a mathematical integer.
|
static Column |
rint(String columnName)
Returns the double value that is closest in value to the argument and
is equal to a mathematical integer.
|
static Column |
round(Column e)
Returns the value of the column
e rounded to 0 decimal places with HALF_UP round mode. |
static Column |
round(Column e,
int scale)
Round the value of
e to scale decimal places with HALF_UP round mode
if scale is greater than or equal to 0 or at integral part when scale is less than 0. |
static Column |
row_number()
Window function: returns a sequential number starting at 1 within a window partition.
|
static Column |
rpad(Column str,
int len,
String pad)
Right-pad the string column with pad to a length of len.
|
static Column |
rtrim(Column e)
Trim the spaces from right end for the specified string value.
|
static Column |
second(Column e)
Extracts the seconds as an integer from a given date/timestamp/string.
|
static Column |
sha1(Column e)
Calculates the SHA-1 digest of a binary column and returns the value
as a 40 character hex string.
|
static Column |
sha2(Column e,
int numBits)
Calculates the SHA-2 family of hash functions of a binary column and
returns the value as a hex string.
|
static Column |
shiftLeft(Column e,
int numBits)
Shift the given value numBits left.
|
static Column |
shiftRight(Column e,
int numBits)
(Signed) shift the given value numBits right.
|
static Column |
shiftRightUnsigned(Column e,
int numBits)
Unsigned shift the given value numBits right.
|
static Column |
signum(Column e)
Computes the signum of the given value.
|
static Column |
signum(String columnName)
Computes the signum of the given column.
|
static Column |
sin(Column e)
Computes the sine of the given value.
|
static Column |
sin(String columnName)
Computes the sine of the given column.
|
static Column |
sinh(Column e)
Computes the hyperbolic sine of the given value.
|
static Column |
sinh(String columnName)
Computes the hyperbolic sine of the given column.
|
static Column |
size(Column e)
Returns length of array or map.
|
static Column |
skewness(Column e)
Aggregate function: returns the skewness of the values in a group.
|
static Column |
skewness(String columnName)
Aggregate function: returns the skewness of the values in a group.
|
static Column |
sort_array(Column e)
Sorts the input array for the given column in ascending order,
according to the natural ordering of the array elements.
|
static Column |
sort_array(Column e,
boolean asc)
Sorts the input array for the given column in ascending or descending order,
according to the natural ordering of the array elements.
|
static Column |
soundex(Column e)
* Return the soundex code for the specified expression.
|
static Column |
spark_partition_id()
Partition ID.
|
static Column |
split(Column str,
String pattern)
Splits str around pattern (pattern is a regular expression).
|
static Column |
sqrt(Column e)
Computes the square root of the specified float value.
|
static Column |
sqrt(String colName)
Computes the square root of the specified float value.
|
static Column |
stddev_pop(Column e)
Aggregate function: returns the population standard deviation of
the expression in a group.
|
static Column |
stddev_pop(String columnName)
Aggregate function: returns the population standard deviation of
the expression in a group.
|
static Column |
stddev_samp(Column e)
Aggregate function: returns the sample standard deviation of
the expression in a group.
|
static Column |
stddev_samp(String columnName)
Aggregate function: returns the sample standard deviation of
the expression in a group.
|
static Column |
stddev(Column e)
Aggregate function: alias for
stddev_samp . |
static Column |
stddev(String columnName)
Aggregate function: alias for
stddev_samp . |
static Column |
struct(Column... cols)
Creates a new struct column.
|
static Column |
struct(scala.collection.Seq<Column> cols)
Creates a new struct column.
|
static Column |
struct(String colName,
scala.collection.Seq<String> colNames)
Creates a new struct column that composes multiple input columns.
|
static Column |
struct(String colName,
String... colNames)
Creates a new struct column that composes multiple input columns.
|
static Column |
substring_index(Column str,
String delim,
int count)
Returns the substring from string str before count occurrences of the delimiter delim.
|
static Column |
substring(Column str,
int pos,
int len)
Substring starts at
pos and is of length len when str is String type or
returns the slice of byte array that starts at pos in byte and is of length len
when str is Binary type |
static Column |
sum(Column e)
Aggregate function: returns the sum of all values in the expression.
|
static Column |
sum(String columnName)
Aggregate function: returns the sum of all values in the given column.
|
static Column |
sumDistinct(Column e)
Aggregate function: returns the sum of distinct values in the expression.
|
static Column |
sumDistinct(String columnName)
Aggregate function: returns the sum of distinct values in the expression.
|
static Column |
tan(Column e)
Computes the tangent of the given value.
|
static Column |
tan(String columnName)
Computes the tangent of the given column.
|
static Column |
tanh(Column e)
Computes the hyperbolic tangent of the given value.
|
static Column |
tanh(String columnName)
Computes the hyperbolic tangent of the given column.
|
static Column |
to_date(Column e)
Converts the column into DateType.
|
static Column |
to_date(Column e,
String fmt)
Converts the column into a DateType with a specified format
(see [http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html])
return null if fail.
|
static Column |
to_json(Column e)
Converts a column containing a
StructType or ArrayType of StructType s into a JSON string
with the specified schema. |
static Column |
to_json(Column e,
scala.collection.immutable.Map<String,String> options)
(Scala-specific) Converts a column containing a
StructType or ArrayType of StructType s
into a JSON string with the specified schema. |
static Column |
to_json(Column e,
java.util.Map<String,String> options)
(Java-specific) Converts a column containing a
StructType or ArrayType of StructType s
into a JSON string with the specified schema. |
static Column |
to_timestamp(Column s)
Convert time string to a Unix timestamp (in seconds).
|
static Column |
to_timestamp(Column s,
String fmt)
Convert time string to a Unix timestamp (in seconds) with a specified format
(see [http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html])
to Unix timestamp (in seconds), return null if fail.
|
static Column |
to_utc_timestamp(Column ts,
String tz)
Given a timestamp, which corresponds to a certain time of day in the given timezone, returns
another timestamp that corresponds to the same time of day in UTC.
|
static Column |
toDegrees(Column e)
Deprecated.
Use degrees. Since 2.1.0.
|
static Column |
toDegrees(String columnName)
Deprecated.
Use degrees. Since 2.1.0.
|
static Column |
toRadians(Column e)
Deprecated.
Use radians. Since 2.1.0.
|
static Column |
toRadians(String columnName)
Deprecated.
Use radians. Since 2.1.0.
|
static Column |
translate(Column src,
String matchingString,
String replaceString)
Translate any character in the src by a character in replaceString.
|
static Column |
trim(Column e)
Trim the spaces from both ends for the specified string column.
|
static Column |
trunc(Column date,
String format)
Returns date truncated to the unit specified by the format.
|
static <T> Column |
typedLit(T literal,
scala.reflect.api.TypeTags.TypeTag<T> evidence$1)
Creates a
Column of literal value. |
static <RT> UserDefinedFunction |
udf(scala.Function0<RT> f,
scala.reflect.api.TypeTags.TypeTag<RT> evidence$2)
Defines a user-defined function of 0 arguments as user-defined function (UDF).
|
static <RT,A1> UserDefinedFunction |
udf(scala.Function1<A1,RT> f,
scala.reflect.api.TypeTags.TypeTag<RT> evidence$3,
scala.reflect.api.TypeTags.TypeTag<A1> evidence$4)
Defines a user-defined function of 1 arguments as user-defined function (UDF).
|
static <RT,A1,A2,A3,A4,A5,A6,A7,A8,A9,A10> |
udf(scala.Function10<A1,A2,A3,A4,A5,A6,A7,A8,A9,A10,RT> f,
scala.reflect.api.TypeTags.TypeTag<RT> evidence$57,
scala.reflect.api.TypeTags.TypeTag<A1> evidence$58,
scala.reflect.api.TypeTags.TypeTag<A2> evidence$59,
scala.reflect.api.TypeTags.TypeTag<A3> evidence$60,
scala.reflect.api.TypeTags.TypeTag<A4> evidence$61,
scala.reflect.api.TypeTags.TypeTag<A5> evidence$62,
scala.reflect.api.TypeTags.TypeTag<A6> evidence$63,
scala.reflect.api.TypeTags.TypeTag<A7> evidence$64,
scala.reflect.api.TypeTags.TypeTag<A8> evidence$65,
scala.reflect.api.TypeTags.TypeTag<A9> evidence$66,
scala.reflect.api.TypeTags.TypeTag<A10> evidence$67)
Defines a user-defined function of 10 arguments as user-defined function (UDF).
|
static <RT,A1,A2> UserDefinedFunction |
udf(scala.Function2<A1,A2,RT> f,
scala.reflect.api.TypeTags.TypeTag<RT> evidence$5,
scala.reflect.api.TypeTags.TypeTag<A1> evidence$6,
scala.reflect.api.TypeTags.TypeTag<A2> evidence$7)
Defines a user-defined function of 2 arguments as user-defined function (UDF).
|
static <RT,A1,A2,A3> |
udf(scala.Function3<A1,A2,A3,RT> f,
scala.reflect.api.TypeTags.TypeTag<RT> evidence$8,
scala.reflect.api.TypeTags.TypeTag<A1> evidence$9,
scala.reflect.api.TypeTags.TypeTag<A2> evidence$10,
scala.reflect.api.TypeTags.TypeTag<A3> evidence$11)
Defines a user-defined function of 3 arguments as user-defined function (UDF).
|
static <RT,A1,A2,A3,A4> |
udf(scala.Function4<A1,A2,A3,A4,RT> f,
scala.reflect.api.TypeTags.TypeTag<RT> evidence$12,
scala.reflect.api.TypeTags.TypeTag<A1> evidence$13,
scala.reflect.api.TypeTags.TypeTag<A2> evidence$14,
scala.reflect.api.TypeTags.TypeTag<A3> evidence$15,
scala.reflect.api.TypeTags.TypeTag<A4> evidence$16)
Defines a user-defined function of 4 arguments as user-defined function (UDF).
|
static <RT,A1,A2,A3,A4,A5> |
udf(scala.Function5<A1,A2,A3,A4,A5,RT> f,
scala.reflect.api.TypeTags.TypeTag<RT> evidence$17,
scala.reflect.api.TypeTags.TypeTag<A1> evidence$18,
scala.reflect.api.TypeTags.TypeTag<A2> evidence$19,
scala.reflect.api.TypeTags.TypeTag<A3> evidence$20,
scala.reflect.api.TypeTags.TypeTag<A4> evidence$21,
scala.reflect.api.TypeTags.TypeTag<A5> evidence$22)
Defines a user-defined function of 5 arguments as user-defined function (UDF).
|
static <RT,A1,A2,A3,A4,A5,A6> |
udf(scala.Function6<A1,A2,A3,A4,A5,A6,RT> f,
scala.reflect.api.TypeTags.TypeTag<RT> evidence$23,
scala.reflect.api.TypeTags.TypeTag<A1> evidence$24,
scala.reflect.api.TypeTags.TypeTag<A2> evidence$25,
scala.reflect.api.TypeTags.TypeTag<A3> evidence$26,
scala.reflect.api.TypeTags.TypeTag<A4> evidence$27,
scala.reflect.api.TypeTags.TypeTag<A5> evidence$28,
scala.reflect.api.TypeTags.TypeTag<A6> evidence$29)
Defines a user-defined function of 6 arguments as user-defined function (UDF).
|
static <RT,A1,A2,A3,A4,A5,A6,A7> |
udf(scala.Function7<A1,A2,A3,A4,A5,A6,A7,RT> f,
scala.reflect.api.TypeTags.TypeTag<RT> evidence$30,
scala.reflect.api.TypeTags.TypeTag<A1> evidence$31,
scala.reflect.api.TypeTags.TypeTag<A2> evidence$32,
scala.reflect.api.TypeTags.TypeTag<A3> evidence$33,
scala.reflect.api.TypeTags.TypeTag<A4> evidence$34,
scala.reflect.api.TypeTags.TypeTag<A5> evidence$35,
scala.reflect.api.TypeTags.TypeTag<A6> evidence$36,
scala.reflect.api.TypeTags.TypeTag<A7> evidence$37)
Defines a user-defined function of 7 arguments as user-defined function (UDF).
|
static <RT,A1,A2,A3,A4,A5,A6,A7,A8> |
udf(scala.Function8<A1,A2,A3,A4,A5,A6,A7,A8,RT> f,
scala.reflect.api.TypeTags.TypeTag<RT> evidence$38,
scala.reflect.api.TypeTags.TypeTag<A1> evidence$39,
scala.reflect.api.TypeTags.TypeTag<A2> evidence$40,
scala.reflect.api.TypeTags.TypeTag<A3> evidence$41,
scala.reflect.api.TypeTags.TypeTag<A4> evidence$42,
scala.reflect.api.TypeTags.TypeTag<A5> evidence$43,
scala.reflect.api.TypeTags.TypeTag<A6> evidence$44,
scala.reflect.api.TypeTags.TypeTag<A7> evidence$45,
scala.reflect.api.TypeTags.TypeTag<A8> evidence$46)
Defines a user-defined function of 8 arguments as user-defined function (UDF).
|
static <RT,A1,A2,A3,A4,A5,A6,A7,A8,A9> |
udf(scala.Function9<A1,A2,A3,A4,A5,A6,A7,A8,A9,RT> f,
scala.reflect.api.TypeTags.TypeTag<RT> evidence$47,
scala.reflect.api.TypeTags.TypeTag<A1> evidence$48,
scala.reflect.api.TypeTags.TypeTag<A2> evidence$49,
scala.reflect.api.TypeTags.TypeTag<A3> evidence$50,
scala.reflect.api.TypeTags.TypeTag<A4> evidence$51,
scala.reflect.api.TypeTags.TypeTag<A5> evidence$52,
scala.reflect.api.TypeTags.TypeTag<A6> evidence$53,
scala.reflect.api.TypeTags.TypeTag<A7> evidence$54,
scala.reflect.api.TypeTags.TypeTag<A8> evidence$55,
scala.reflect.api.TypeTags.TypeTag<A9> evidence$56)
Defines a user-defined function of 9 arguments as user-defined function (UDF).
|
static UserDefinedFunction |
udf(Object f,
DataType dataType)
Defines a user-defined function (UDF) using a Scala closure.
|
static Column |
unbase64(Column e)
Decodes a BASE64 encoded string column and returns it as a binary column.
|
static Column |
unhex(Column column)
Inverse of hex.
|
static Column |
unix_timestamp()
Gets current Unix timestamp in seconds.
|
static Column |
unix_timestamp(Column s)
Converts time string in format yyyy-MM-dd HH:mm:ss to Unix timestamp (in seconds),
using the default timezone and the default locale, return null if fail.
|
static Column |
unix_timestamp(Column s,
String p)
Convert time string with given pattern
(see [http://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html])
to Unix time stamp (in seconds), return null if fail.
|
static Column |
upper(Column e)
Converts a string column to upper case.
|
static Column |
var_pop(Column e)
Aggregate function: returns the population variance of the values in a group.
|
static Column |
var_pop(String columnName)
Aggregate function: returns the population variance of the values in a group.
|
static Column |
var_samp(Column e)
Aggregate function: returns the unbiased variance of the values in a group.
|
static Column |
var_samp(String columnName)
Aggregate function: returns the unbiased variance of the values in a group.
|
static Column |
variance(Column e)
Aggregate function: alias for
var_samp . |
static Column |
variance(String columnName)
Aggregate function: alias for
var_samp . |
static Column |
weekofyear(Column e)
Extracts the week number as an integer from a given date/timestamp/string.
|
static Column |
when(Column condition,
Object value)
Evaluates a list of conditions and returns one of multiple possible result expressions.
|
static Column |
window(Column timeColumn,
String windowDuration)
Generates tumbling time windows given a timestamp specifying column.
|
static Column |
window(Column timeColumn,
String windowDuration,
String slideDuration)
Bucketize rows into one or more time windows given a timestamp specifying column.
|
static Column |
window(Column timeColumn,
String windowDuration,
String slideDuration,
String startTime)
Bucketize rows into one or more time windows given a timestamp specifying column.
|
static Column |
year(Column e)
Extracts the year as an integer from a given date/timestamp/string.
|
public static Column countDistinct(Column expr, Column... exprs)
expr
- (undocumented)exprs
- (undocumented)public static Column countDistinct(String columnName, String... columnNames)
columnName
- (undocumented)columnNames
- (undocumented)public static Column array(Column... cols)
cols
- (undocumented)public static Column array(String colName, String... colNames)
colName
- (undocumented)colNames
- (undocumented)public static Column map(Column... cols)
cols
- (undocumented)public static Column coalesce(Column... e)
For example, coalesce(a, b, c)
will return a if a is not null,
or b if a is null and b is not null, or c if both a and b are null but c is not null.
e
- (undocumented)public static Column struct(Column... cols)
DataFrame
, or a derived column expression
that is named (i.e. aliased), its name would be remained as the StructField's name,
otherwise, the newly generated StructField's name would be auto generated as
col
with a suffix index + 1
, i.e. col1, col2, col3, ...
cols
- (undocumented)public static Column struct(String colName, String... colNames)
colName
- (undocumented)colNames
- (undocumented)public static Column greatest(Column... exprs)
exprs
- (undocumented)public static Column greatest(String columnName, String... columnNames)
columnName
- (undocumented)columnNames
- (undocumented)public static Column least(Column... exprs)
exprs
- (undocumented)public static Column least(String columnName, String... columnNames)
columnName
- (undocumented)columnNames
- (undocumented)public static Column hash(Column... cols)
cols
- (undocumented)public static Column concat(Column... exprs)
exprs
- (undocumented)public static Column concat_ws(String sep, Column... exprs)
sep
- (undocumented)exprs
- (undocumented)public static Column format_string(String format, Column... arguments)
format
- (undocumented)arguments
- (undocumented)public static Column json_tuple(Column json, String... fields)
json
- (undocumented)fields
- (undocumented)public static Column callUDF(String udfName, Column... cols)
import org.apache.spark.sql._
val df = Seq(("id1", 1), ("id2", 4), ("id3", 5)).toDF("id", "value")
val spark = df.sparkSession
spark.udf.register("simpleUDF", (v: Int) => v * v)
df.select($"id", callUDF("simpleUDF", $"value"))
udfName
- (undocumented)cols
- (undocumented)public static Column col(String colName)
Column
based on the given column name.
colName
- (undocumented)public static Column column(String colName)
Column
based on the given column name. Alias of col
.
colName
- (undocumented)public static Column lit(Object literal)
Column
of literal value.
The passed in object is returned directly if it is already a Column
.
If the object is a Scala Symbol, it is converted into a Column
also.
Otherwise, a new Column
is created to represent the literal value.
literal
- (undocumented)public static <T> Column typedLit(T literal, scala.reflect.api.TypeTags.TypeTag<T> evidence$1)
Column
of literal value.
The passed in object is returned directly if it is already a Column
.
If the object is a Scala Symbol, it is converted into a Column
also.
Otherwise, a new Column
is created to represent the literal value.
The difference between this function and lit
is that this function
can handle parameterized scala types e.g.: List, Seq and Map.
literal
- (undocumented)evidence$1
- (undocumented)public static Column asc(String columnName)
df.sort(asc("dept"), desc("age"))
columnName
- (undocumented)public static Column asc_nulls_first(String columnName)
df.sort(asc_nulls_last("dept"), desc("age"))
columnName
- (undocumented)public static Column asc_nulls_last(String columnName)
df.sort(asc_nulls_last("dept"), desc("age"))
columnName
- (undocumented)public static Column desc(String columnName)
df.sort(asc("dept"), desc("age"))
columnName
- (undocumented)public static Column desc_nulls_first(String columnName)
df.sort(asc("dept"), desc_nulls_first("age"))
columnName
- (undocumented)public static Column desc_nulls_last(String columnName)
df.sort(asc("dept"), desc_nulls_last("age"))
columnName
- (undocumented)public static Column approxCountDistinct(Column e)
e
- (undocumented)public static Column approxCountDistinct(String columnName)
columnName
- (undocumented)public static Column approxCountDistinct(Column e, double rsd)
e
- (undocumented)rsd
- (undocumented)public static Column approxCountDistinct(String columnName, double rsd)
columnName
- (undocumented)rsd
- (undocumented)public static Column approx_count_distinct(Column e)
e
- (undocumented)public static Column approx_count_distinct(String columnName)
columnName
- (undocumented)public static Column approx_count_distinct(Column e, double rsd)
rsd
- maximum estimation error allowed (default = 0.05)
e
- (undocumented)public static Column approx_count_distinct(String columnName, double rsd)
rsd
- maximum estimation error allowed (default = 0.05)
columnName
- (undocumented)public static Column avg(Column e)
e
- (undocumented)public static Column avg(String columnName)
columnName
- (undocumented)public static Column collect_list(Column e)
e
- (undocumented)public static Column collect_list(String columnName)
columnName
- (undocumented)public static Column collect_set(Column e)
e
- (undocumented)public static Column collect_set(String columnName)
columnName
- (undocumented)public static Column corr(Column column1, Column column2)
column1
- (undocumented)column2
- (undocumented)public static Column corr(String columnName1, String columnName2)
columnName1
- (undocumented)columnName2
- (undocumented)public static Column count(Column e)
e
- (undocumented)public static TypedColumn<Object,Object> count(String columnName)
columnName
- (undocumented)public static Column countDistinct(Column expr, scala.collection.Seq<Column> exprs)
expr
- (undocumented)exprs
- (undocumented)public static Column countDistinct(String columnName, scala.collection.Seq<String> columnNames)
columnName
- (undocumented)columnNames
- (undocumented)public static Column covar_pop(Column column1, Column column2)
column1
- (undocumented)column2
- (undocumented)public static Column covar_pop(String columnName1, String columnName2)
columnName1
- (undocumented)columnName2
- (undocumented)public static Column covar_samp(Column column1, Column column2)
column1
- (undocumented)column2
- (undocumented)public static Column covar_samp(String columnName1, String columnName2)
columnName1
- (undocumented)columnName2
- (undocumented)public static Column first(Column e, boolean ignoreNulls)
The function by default returns the first values it sees. It will return the first non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.
e
- (undocumented)ignoreNulls
- (undocumented)public static Column first(String columnName, boolean ignoreNulls)
The function by default returns the first values it sees. It will return the first non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.
columnName
- (undocumented)ignoreNulls
- (undocumented)public static Column first(Column e)
The function by default returns the first values it sees. It will return the first non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.
e
- (undocumented)public static Column first(String columnName)
The function by default returns the first values it sees. It will return the first non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.
columnName
- (undocumented)public static Column grouping(Column e)
e
- (undocumented)public static Column grouping(String columnName)
columnName
- (undocumented)public static Column grouping_id(scala.collection.Seq<Column> cols)
(grouping(c1) <<; (n-1)) + (grouping(c2) <<; (n-2)) + ... + grouping(cn)
cols
- (undocumented)public static Column grouping_id(String colName, scala.collection.Seq<String> colNames)
(grouping(c1) <<; (n-1)) + (grouping(c2) <<; (n-2)) + ... + grouping(cn)
colName
- (undocumented)colNames
- (undocumented)public static Column kurtosis(Column e)
e
- (undocumented)public static Column kurtosis(String columnName)
columnName
- (undocumented)public static Column last(Column e, boolean ignoreNulls)
The function by default returns the last values it sees. It will return the last non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.
e
- (undocumented)ignoreNulls
- (undocumented)public static Column last(String columnName, boolean ignoreNulls)
The function by default returns the last values it sees. It will return the last non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.
columnName
- (undocumented)ignoreNulls
- (undocumented)public static Column last(Column e)
The function by default returns the last values it sees. It will return the last non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.
e
- (undocumented)public static Column last(String columnName)
The function by default returns the last values it sees. It will return the last non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.
columnName
- (undocumented)public static Column max(Column e)
e
- (undocumented)public static Column max(String columnName)
columnName
- (undocumented)public static Column mean(Column e)
e
- (undocumented)public static Column mean(String columnName)
columnName
- (undocumented)public static Column min(Column e)
e
- (undocumented)public static Column min(String columnName)
columnName
- (undocumented)public static Column skewness(Column e)
e
- (undocumented)public static Column skewness(String columnName)
columnName
- (undocumented)public static Column stddev(Column e)
stddev_samp
.
e
- (undocumented)public static Column stddev(String columnName)
stddev_samp
.
columnName
- (undocumented)public static Column stddev_samp(Column e)
e
- (undocumented)public static Column stddev_samp(String columnName)
columnName
- (undocumented)public static Column stddev_pop(Column e)
e
- (undocumented)public static Column stddev_pop(String columnName)
columnName
- (undocumented)public static Column sum(Column e)
e
- (undocumented)public static Column sum(String columnName)
columnName
- (undocumented)public static Column sumDistinct(Column e)
e
- (undocumented)public static Column sumDistinct(String columnName)
columnName
- (undocumented)public static Column variance(Column e)
var_samp
.
e
- (undocumented)public static Column variance(String columnName)
var_samp
.
columnName
- (undocumented)public static Column var_samp(Column e)
e
- (undocumented)public static Column var_samp(String columnName)
columnName
- (undocumented)public static Column var_pop(Column e)
e
- (undocumented)public static Column var_pop(String columnName)
columnName
- (undocumented)public static Column cume_dist()
N = total number of rows in the partition
cumeDist(x) = number of values before (and including) x / N
public static Column dense_rank()
The difference between rank and dense_rank is that denseRank leaves no gaps in ranking sequence when there are ties. That is, if you were ranking a competition using dense_rank and had three people tie for second place, you would say that all three were in second place and that the next person came in third. Rank would give me sequential numbers, making the person that came in third place (after the ties) would register as coming in fifth.
This is equivalent to the DENSE_RANK function in SQL.
public static Column lag(Column e, int offset)
offset
rows before the current row, and
null
if there is less than offset
rows before the current row. For example,
an offset
of one will return the previous row at any given point in the window partition.
This is equivalent to the LAG function in SQL.
e
- (undocumented)offset
- (undocumented)public static Column lag(String columnName, int offset)
offset
rows before the current row, and
null
if there is less than offset
rows before the current row. For example,
an offset
of one will return the previous row at any given point in the window partition.
This is equivalent to the LAG function in SQL.
columnName
- (undocumented)offset
- (undocumented)public static Column lag(String columnName, int offset, Object defaultValue)
offset
rows before the current row, and
defaultValue
if there is less than offset
rows before the current row. For example,
an offset
of one will return the previous row at any given point in the window partition.
This is equivalent to the LAG function in SQL.
columnName
- (undocumented)offset
- (undocumented)defaultValue
- (undocumented)public static Column lag(Column e, int offset, Object defaultValue)
offset
rows before the current row, and
defaultValue
if there is less than offset
rows before the current row. For example,
an offset
of one will return the previous row at any given point in the window partition.
This is equivalent to the LAG function in SQL.
e
- (undocumented)offset
- (undocumented)defaultValue
- (undocumented)public static Column lead(String columnName, int offset)
offset
rows after the current row, and
null
if there is less than offset
rows after the current row. For example,
an offset
of one will return the next row at any given point in the window partition.
This is equivalent to the LEAD function in SQL.
columnName
- (undocumented)offset
- (undocumented)public static Column lead(Column e, int offset)
offset
rows after the current row, and
null
if there is less than offset
rows after the current row. For example,
an offset
of one will return the next row at any given point in the window partition.
This is equivalent to the LEAD function in SQL.
e
- (undocumented)offset
- (undocumented)public static Column lead(String columnName, int offset, Object defaultValue)
offset
rows after the current row, and
defaultValue
if there is less than offset
rows after the current row. For example,
an offset
of one will return the next row at any given point in the window partition.
This is equivalent to the LEAD function in SQL.
columnName
- (undocumented)offset
- (undocumented)defaultValue
- (undocumented)public static Column lead(Column e, int offset, Object defaultValue)
offset
rows after the current row, and
defaultValue
if there is less than offset
rows after the current row. For example,
an offset
of one will return the next row at any given point in the window partition.
This is equivalent to the LEAD function in SQL.
e
- (undocumented)offset
- (undocumented)defaultValue
- (undocumented)public static Column ntile(int n)
n
inclusive) in an ordered window
partition. For example, if n
is 4, the first quarter of the rows will get value 1, the second
quarter will get 2, the third quarter will get 3, and the last quarter will get 4.
This is equivalent to the NTILE function in SQL.
n
- (undocumented)public static Column percent_rank()
This is computed by:
(rank of row in its partition - 1) / (number of rows in the partition - 1)
This is equivalent to the PERCENT_RANK function in SQL.
public static Column rank()
The difference between rank and dense_rank is that dense_rank leaves no gaps in ranking sequence when there are ties. That is, if you were ranking a competition using dense_rank and had three people tie for second place, you would say that all three were in second place and that the next person came in third. Rank would give me sequential numbers, making the person that came in third place (after the ties) would register as coming in fifth.
This is equivalent to the RANK function in SQL.
public static Column row_number()
public static Column abs(Column e)
e
- (undocumented)public static Column array(scala.collection.Seq<Column> cols)
cols
- (undocumented)public static Column array(String colName, scala.collection.Seq<String> colNames)
colName
- (undocumented)colNames
- (undocumented)public static Column map(scala.collection.Seq<Column> cols)
cols
- (undocumented)public static <T> Dataset<T> broadcast(Dataset<T> df)
The following example marks the right DataFrame for broadcast hash join using joinKey
.
// left and right are DataFrames
left.join(broadcast(right), "joinKey")
df
- (undocumented)public static Column coalesce(scala.collection.Seq<Column> e)
For example, coalesce(a, b, c)
will return a if a is not null,
or b if a is null and b is not null, or c if both a and b are null but c is not null.
e
- (undocumented)public static Column input_file_name()
public static Column isnan(Column e)
e
- (undocumented)public static Column isnull(Column e)
e
- (undocumented)public static Column monotonicallyIncreasingId()
The generated ID is guaranteed to be monotonically increasing and unique, but not consecutive. The current implementation puts the partition ID in the upper 31 bits, and the record number within each partition in the lower 33 bits. The assumption is that the data frame has less than 1 billion partitions, and each partition has less than 8 billion records.
As an example, consider a DataFrame
with two partitions, each with 3 records.
This expression would return the following IDs:
0, 1, 2, 8589934592 (1L << 33), 8589934593, 8589934594.
public static Column monotonically_increasing_id()
The generated ID is guaranteed to be monotonically increasing and unique, but not consecutive. The current implementation puts the partition ID in the upper 31 bits, and the record number within each partition in the lower 33 bits. The assumption is that the data frame has less than 1 billion partitions, and each partition has less than 8 billion records.
As an example, consider a DataFrame
with two partitions, each with 3 records.
This expression would return the following IDs:
0, 1, 2, 8589934592 (1L << 33), 8589934593, 8589934594.
public static Column nanvl(Column col1, Column col2)
Both inputs should be floating point columns (DoubleType or FloatType).
col1
- (undocumented)col2
- (undocumented)public static Column negate(Column e)
// Select the amount column and negates all values.
// Scala:
df.select( -df("amount") )
// Java:
df.select( negate(df.col("amount")) );
e
- (undocumented)public static Column not(Column e)
// Scala: select rows that are not active (isActive === false)
df.filter( !df("isActive") )
// Java:
df.filter( not(df.col("isActive")) );
e
- (undocumented)public static Column rand(long seed)
seed
- (undocumented)public static Column rand()
public static Column randn(long seed)
seed
- (undocumented)public static Column randn()
public static Column spark_partition_id()
public static Column sqrt(Column e)
e
- (undocumented)public static Column sqrt(String colName)
colName
- (undocumented)public static Column struct(scala.collection.Seq<Column> cols)
DataFrame
, or a derived column expression
that is named (i.e. aliased), its name would be remained as the StructField's name,
otherwise, the newly generated StructField's name would be auto generated as
col
with a suffix index + 1
, i.e. col1, col2, col3, ...
cols
- (undocumented)public static Column struct(String colName, scala.collection.Seq<String> colNames)
colName
- (undocumented)colNames
- (undocumented)public static Column when(Column condition, Object value)
// Example: encoding gender string column into integer.
// Scala:
people.select(when(people("gender") === "male", 0)
.when(people("gender") === "female", 1)
.otherwise(2))
// Java:
people.select(when(col("gender").equalTo("male"), 0)
.when(col("gender").equalTo("female"), 1)
.otherwise(2))
condition
- (undocumented)value
- (undocumented)public static Column bitwiseNOT(Column e)
e
- (undocumented)public static Column expr(String expr)
// get the number of words of each length
df.groupBy(expr("length(word)")).count()
expr
- (undocumented)public static Column acos(Column e)
e
- (undocumented)public static Column acos(String columnName)
columnName
- (undocumented)public static Column asin(Column e)
e
- (undocumented)public static Column asin(String columnName)
columnName
- (undocumented)public static Column atan(Column e)
e
- (undocumented)public static Column atan(String columnName)
columnName
- (undocumented)public static Column atan2(Column l, Column r)
l
- (undocumented)r
- (undocumented)public static Column atan2(Column l, String rightName)
l
- (undocumented)rightName
- (undocumented)public static Column atan2(String leftName, Column r)
leftName
- (undocumented)r
- (undocumented)public static Column atan2(String leftName, String rightName)
leftName
- (undocumented)rightName
- (undocumented)public static Column atan2(Column l, double r)
l
- (undocumented)r
- (undocumented)public static Column atan2(String leftName, double r)
leftName
- (undocumented)r
- (undocumented)public static Column atan2(double l, Column r)
l
- (undocumented)r
- (undocumented)public static Column atan2(double l, String rightName)
l
- (undocumented)rightName
- (undocumented)public static Column bin(Column e)
e
- (undocumented)public static Column bin(String columnName)
columnName
- (undocumented)public static Column cbrt(Column e)
e
- (undocumented)public static Column cbrt(String columnName)
columnName
- (undocumented)public static Column ceil(Column e)
e
- (undocumented)public static Column ceil(String columnName)
columnName
- (undocumented)public static Column conv(Column num, int fromBase, int toBase)
num
- (undocumented)fromBase
- (undocumented)toBase
- (undocumented)public static Column cos(Column e)
e
- (undocumented)public static Column cos(String columnName)
columnName
- (undocumented)public static Column cosh(Column e)
e
- (undocumented)public static Column cosh(String columnName)
columnName
- (undocumented)public static Column exp(Column e)
e
- (undocumented)public static Column exp(String columnName)
columnName
- (undocumented)public static Column expm1(Column e)
e
- (undocumented)public static Column expm1(String columnName)
columnName
- (undocumented)public static Column factorial(Column e)
e
- (undocumented)public static Column floor(Column e)
e
- (undocumented)public static Column floor(String columnName)
columnName
- (undocumented)public static Column greatest(scala.collection.Seq<Column> exprs)
exprs
- (undocumented)public static Column greatest(String columnName, scala.collection.Seq<String> columnNames)
columnName
- (undocumented)columnNames
- (undocumented)public static Column hex(Column column)
column
- (undocumented)public static Column unhex(Column column)
column
- (undocumented)public static Column hypot(Column l, Column r)
sqrt(a^2^ + b^2^)
without intermediate overflow or underflow.
l
- (undocumented)r
- (undocumented)public static Column hypot(Column l, String rightName)
sqrt(a^2^ + b^2^)
without intermediate overflow or underflow.
l
- (undocumented)rightName
- (undocumented)public static Column hypot(String leftName, Column r)
sqrt(a^2^ + b^2^)
without intermediate overflow or underflow.
leftName
- (undocumented)r
- (undocumented)public static Column hypot(String leftName, String rightName)
sqrt(a^2^ + b^2^)
without intermediate overflow or underflow.
leftName
- (undocumented)rightName
- (undocumented)public static Column hypot(Column l, double r)
sqrt(a^2^ + b^2^)
without intermediate overflow or underflow.
l
- (undocumented)r
- (undocumented)public static Column hypot(String leftName, double r)
sqrt(a^2^ + b^2^)
without intermediate overflow or underflow.
leftName
- (undocumented)r
- (undocumented)public static Column hypot(double l, Column r)
sqrt(a^2^ + b^2^)
without intermediate overflow or underflow.
l
- (undocumented)r
- (undocumented)public static Column hypot(double l, String rightName)
sqrt(a^2^ + b^2^)
without intermediate overflow or underflow.
l
- (undocumented)rightName
- (undocumented)public static Column least(scala.collection.Seq<Column> exprs)
exprs
- (undocumented)public static Column least(String columnName, scala.collection.Seq<String> columnNames)
columnName
- (undocumented)columnNames
- (undocumented)public static Column log(Column e)
e
- (undocumented)public static Column log(String columnName)
columnName
- (undocumented)public static Column log(double base, Column a)
base
- (undocumented)a
- (undocumented)public static Column log(double base, String columnName)
base
- (undocumented)columnName
- (undocumented)public static Column log10(Column e)
e
- (undocumented)public static Column log10(String columnName)
columnName
- (undocumented)public static Column log1p(Column e)
e
- (undocumented)public static Column log1p(String columnName)
columnName
- (undocumented)public static Column log2(Column expr)
expr
- (undocumented)public static Column log2(String columnName)
columnName
- (undocumented)public static Column pow(Column l, Column r)
l
- (undocumented)r
- (undocumented)public static Column pow(Column l, String rightName)
l
- (undocumented)rightName
- (undocumented)public static Column pow(String leftName, Column r)
leftName
- (undocumented)r
- (undocumented)public static Column pow(String leftName, String rightName)
leftName
- (undocumented)rightName
- (undocumented)public static Column pow(Column l, double r)
l
- (undocumented)r
- (undocumented)public static Column pow(String leftName, double r)
leftName
- (undocumented)r
- (undocumented)public static Column pow(double l, Column r)
l
- (undocumented)r
- (undocumented)public static Column pow(double l, String rightName)
l
- (undocumented)rightName
- (undocumented)public static Column pmod(Column dividend, Column divisor)
dividend
- (undocumented)divisor
- (undocumented)public static Column rint(Column e)
e
- (undocumented)public static Column rint(String columnName)
columnName
- (undocumented)public static Column round(Column e)
e
rounded to 0 decimal places with HALF_UP round mode.
e
- (undocumented)public static Column round(Column e, int scale)
e
to scale
decimal places with HALF_UP round mode
if scale
is greater than or equal to 0 or at integral part when scale
is less than 0.
e
- (undocumented)scale
- (undocumented)public static Column bround(Column e)
e
rounded to 0 decimal places with HALF_EVEN round mode.
e
- (undocumented)public static Column bround(Column e, int scale)
e
to scale
decimal places with HALF_EVEN round mode
if scale
is greater than or equal to 0 or at integral part when scale
is less than 0.
e
- (undocumented)scale
- (undocumented)public static Column shiftLeft(Column e, int numBits)
e
- (undocumented)numBits
- (undocumented)public static Column shiftRight(Column e, int numBits)
e
- (undocumented)numBits
- (undocumented)public static Column shiftRightUnsigned(Column e, int numBits)
e
- (undocumented)numBits
- (undocumented)public static Column signum(Column e)
e
- (undocumented)public static Column signum(String columnName)
columnName
- (undocumented)public static Column sin(Column e)
e
- (undocumented)public static Column sin(String columnName)
columnName
- (undocumented)public static Column sinh(Column e)
e
- (undocumented)public static Column sinh(String columnName)
columnName
- (undocumented)public static Column tan(Column e)
e
- (undocumented)public static Column tan(String columnName)
columnName
- (undocumented)public static Column tanh(Column e)
e
- (undocumented)public static Column tanh(String columnName)
columnName
- (undocumented)public static Column toDegrees(Column e)
e
- (undocumented)public static Column toDegrees(String columnName)
columnName
- (undocumented)public static Column degrees(Column e)
e
- (undocumented)public static Column degrees(String columnName)
columnName
- (undocumented)public static Column toRadians(Column e)
e
- (undocumented)public static Column toRadians(String columnName)
columnName
- (undocumented)public static Column radians(Column e)
e
- (undocumented)public static Column radians(String columnName)
columnName
- (undocumented)public static Column md5(Column e)
e
- (undocumented)public static Column sha1(Column e)
e
- (undocumented)public static Column sha2(Column e, int numBits)
e
- column to compute SHA-2 on.numBits
- one of 224, 256, 384, or 512.
public static Column crc32(Column e)
e
- (undocumented)public static Column hash(scala.collection.Seq<Column> cols)
cols
- (undocumented)public static Column ascii(Column e)
e
- (undocumented)public static Column base64(Column e)
e
- (undocumented)public static Column concat(scala.collection.Seq<Column> exprs)
exprs
- (undocumented)public static Column concat_ws(String sep, scala.collection.Seq<Column> exprs)
sep
- (undocumented)exprs
- (undocumented)public static Column decode(Column value, String charset)
value
- (undocumented)charset
- (undocumented)public static Column encode(Column value, String charset)
value
- (undocumented)charset
- (undocumented)public static Column format_number(Column x, int d)
If d is 0, the result has no decimal point or fractional part. If d is less than 0, the result will be null.
x
- (undocumented)d
- (undocumented)public static Column format_string(String format, scala.collection.Seq<Column> arguments)
format
- (undocumented)arguments
- (undocumented)public static Column initcap(Column e)
For example, "hello world" will become "Hello World".
e
- (undocumented)public static Column instr(Column str, String substring)
str
- (undocumented)substring
- (undocumented)public static Column length(Column e)
e
- (undocumented)public static Column lower(Column e)
e
- (undocumented)public static Column levenshtein(Column l, Column r)
l
- (undocumented)r
- (undocumented)public static Column locate(String substr, Column str)
substr
- (undocumented)str
- (undocumented)public static Column locate(String substr, Column str, int pos)
substr
- (undocumented)str
- (undocumented)pos
- (undocumented)public static Column lpad(Column str, int len, String pad)
str
- (undocumented)len
- (undocumented)pad
- (undocumented)public static Column ltrim(Column e)
e
- (undocumented)public static Column regexp_extract(Column e, String exp, int groupIdx)
e
- (undocumented)exp
- (undocumented)groupIdx
- (undocumented)public static Column regexp_replace(Column e, String pattern, String replacement)
e
- (undocumented)pattern
- (undocumented)replacement
- (undocumented)public static Column regexp_replace(Column e, Column pattern, Column replacement)
e
- (undocumented)pattern
- (undocumented)replacement
- (undocumented)public static Column unbase64(Column e)
e
- (undocumented)public static Column rpad(Column str, int len, String pad)
str
- (undocumented)len
- (undocumented)pad
- (undocumented)public static Column repeat(Column str, int n)
str
- (undocumented)n
- (undocumented)public static Column reverse(Column str)
str
- (undocumented)public static Column rtrim(Column e)
e
- (undocumented)public static Column soundex(Column e)
e
- (undocumented)public static Column split(Column str, String pattern)
str
- (undocumented)pattern
- (undocumented)public static Column substring(Column str, int pos, int len)
pos
and is of length len
when str is String type or
returns the slice of byte array that starts at pos
in byte and is of length len
when str is Binary type
str
- (undocumented)pos
- (undocumented)len
- (undocumented)public static Column substring_index(Column str, String delim, int count)
str
- (undocumented)delim
- (undocumented)count
- (undocumented)public static Column translate(Column src, String matchingString, String replaceString)
matchingString
.
src
- (undocumented)matchingString
- (undocumented)replaceString
- (undocumented)public static Column trim(Column e)
e
- (undocumented)public static Column upper(Column e)
e
- (undocumented)public static Column add_months(Column startDate, int numMonths)
startDate
- (undocumented)numMonths
- (undocumented)public static Column current_date()
public static Column current_timestamp()
public static Column date_format(Column dateExpr, String format)
A pattern could be for instance dd.MM.yyyy
and could return a string like '18.03.1993'. All
pattern letters of java.text.SimpleDateFormat
can be used.
dateExpr
- (undocumented)format
- (undocumented)year
. These benefit from a
specialized implementation.
public static Column date_add(Column start, int days)
days
days after start
start
- (undocumented)days
- (undocumented)public static Column date_sub(Column start, int days)
days
days before start
start
- (undocumented)days
- (undocumented)public static Column datediff(Column end, Column start)
start
to end
.end
- (undocumented)start
- (undocumented)public static Column year(Column e)
e
- (undocumented)public static Column quarter(Column e)
e
- (undocumented)public static Column month(Column e)
e
- (undocumented)public static Column dayofmonth(Column e)
e
- (undocumented)public static Column dayofyear(Column e)
e
- (undocumented)public static Column hour(Column e)
e
- (undocumented)public static Column last_day(Column e)
e
- (undocumented)public static Column minute(Column e)
e
- (undocumented)public static Column months_between(Column date1, Column date2)
date1
and date2
.date1
- (undocumented)date2
- (undocumented)public static Column next_day(Column date, String dayOfWeek)
For example, next_day('2015-07-27', "Sunday")
returns 2015-08-02 because that is the first
Sunday after 2015-07-27.
Day of the week parameter is case insensitive, and accepts: "Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun".
date
- (undocumented)dayOfWeek
- (undocumented)public static Column second(Column e)
e
- (undocumented)public static Column weekofyear(Column e)
e
- (undocumented)public static Column from_unixtime(Column ut)
ut
- (undocumented)public static Column from_unixtime(Column ut, String f)
ut
- (undocumented)f
- (undocumented)public static Column unix_timestamp()
public static Column unix_timestamp(Column s)
s
- (undocumented)public static Column unix_timestamp(Column s, String p)
s
- (undocumented)p
- (undocumented)public static Column to_timestamp(Column s)
s
- (undocumented)public static Column to_timestamp(Column s, String fmt)
s
- (undocumented)fmt
- (undocumented)public static Column to_date(Column e)
e
- (undocumented)public static Column to_date(Column e, String fmt)
e
- (undocumented)fmt
- (undocumented)public static Column trunc(Column date, String format)
format:
- 'year', 'yyyy', 'yy' for truncate by year,
or 'month', 'mon', 'mm' for truncate by month
date
- (undocumented)public static Column from_utc_timestamp(Column ts, String tz)
ts
- (undocumented)tz
- (undocumented)public static Column to_utc_timestamp(Column ts, String tz)
ts
- (undocumented)tz
- (undocumented)public static Column window(Column timeColumn, String windowDuration, String slideDuration, String startTime)
val df = ... // schema => timestamp: TimestampType, stockId: StringType, price: DoubleType
df.groupBy(window($"time", "1 minute", "10 seconds", "5 seconds"), $"stockId")
.agg(mean("price"))
The windows will look like:
09:00:05-09:01:05
09:00:15-09:01:15
09:00:25-09:01:25 ...
For a streaming query, you may use the function current_timestamp
to generate windows on
processing time.
timeColumn
- The column or the expression to use as the timestamp for windowing by time.
The time column must be of TimestampType.windowDuration
- A string specifying the width of the window, e.g. 10 minutes
,
1 second
. Check org.apache.spark.unsafe.types.CalendarInterval
for
valid duration identifiers. Note that the duration is a fixed length of
time, and does not vary over time according to a calendar. For example,
1 day
always means 86,400,000 milliseconds, not a calendar day.slideDuration
- A string specifying the sliding interval of the window, e.g. 1 minute
.
A new window will be generated every slideDuration
. Must be less than
or equal to the windowDuration
. Check
org.apache.spark.unsafe.types.CalendarInterval
for valid duration
identifiers. This duration is likewise absolute, and does not vary
according to a calendar.startTime
- The offset with respect to 1970-01-01 00:00:00 UTC with which to start
window intervals. For example, in order to have hourly tumbling windows that
start 15 minutes past the hour, e.g. 12:15-13:15, 13:15-14:15... provide
startTime
as 15 minutes
.
public static Column window(Column timeColumn, String windowDuration, String slideDuration)
val df = ... // schema => timestamp: TimestampType, stockId: StringType, price: DoubleType
df.groupBy(window($"time", "1 minute", "10 seconds"), $"stockId")
.agg(mean("price"))
The windows will look like:
09:00:00-09:01:00
09:00:10-09:01:10
09:00:20-09:01:20 ...
For a streaming query, you may use the function current_timestamp
to generate windows on
processing time.
timeColumn
- The column or the expression to use as the timestamp for windowing by time.
The time column must be of TimestampType.windowDuration
- A string specifying the width of the window, e.g. 10 minutes
,
1 second
. Check org.apache.spark.unsafe.types.CalendarInterval
for
valid duration identifiers. Note that the duration is a fixed length of
time, and does not vary over time according to a calendar. For example,
1 day
always means 86,400,000 milliseconds, not a calendar day.slideDuration
- A string specifying the sliding interval of the window, e.g. 1 minute
.
A new window will be generated every slideDuration
. Must be less than
or equal to the windowDuration
. Check
org.apache.spark.unsafe.types.CalendarInterval
for valid duration
identifiers. This duration is likewise absolute, and does not vary
according to a calendar.
public static Column window(Column timeColumn, String windowDuration)
val df = ... // schema => timestamp: TimestampType, stockId: StringType, price: DoubleType
df.groupBy(window($"time", "1 minute"), $"stockId")
.agg(mean("price"))
The windows will look like:
09:00:00-09:01:00
09:01:00-09:02:00
09:02:00-09:03:00 ...
For a streaming query, you may use the function current_timestamp
to generate windows on
processing time.
timeColumn
- The column or the expression to use as the timestamp for windowing by time.
The time column must be of TimestampType.windowDuration
- A string specifying the width of the window, e.g. 10 minutes
,
1 second
. Check org.apache.spark.unsafe.types.CalendarInterval
for
valid duration identifiers.
public static Column array_contains(Column column, Object value)
value
, and false otherwise.column
- (undocumented)value
- (undocumented)public static Column explode(Column e)
e
- (undocumented)public static Column explode_outer(Column e)
e
- (undocumented)public static Column posexplode(Column e)
e
- (undocumented)public static Column posexplode_outer(Column e)
e
- (undocumented)public static Column get_json_object(Column e, String path)
e
- (undocumented)path
- (undocumented)public static Column json_tuple(Column json, scala.collection.Seq<String> fields)
json
- (undocumented)fields
- (undocumented)public static Column from_json(Column e, StructType schema, scala.collection.immutable.Map<String,String> options)
StructType
with the
specified schema. Returns null
, in the case of an unparseable string.
e
- a string column containing JSON data.schema
- the schema to use when parsing the json stringoptions
- options to control how the json is parsed. Accepts the same options as the
json data source.
public static Column from_json(Column e, DataType schema, scala.collection.immutable.Map<String,String> options)
StructType
or ArrayType
of StructType
s with the specified schema. Returns null
, in the case of an unparseable
string.
e
- a string column containing JSON data.schema
- the schema to use when parsing the json stringoptions
- options to control how the json is parsed. accepts the same options and the
json data source.
public static Column from_json(Column e, StructType schema, java.util.Map<String,String> options)
StructType
with the
specified schema. Returns null
, in the case of an unparseable string.
e
- a string column containing JSON data.schema
- the schema to use when parsing the json stringoptions
- options to control how the json is parsed. accepts the same options and the
json data source.
public static Column from_json(Column e, DataType schema, java.util.Map<String,String> options)
StructType
or ArrayType
of StructType
s with the specified schema. Returns null
, in the case of an unparseable
string.
e
- a string column containing JSON data.schema
- the schema to use when parsing the json stringoptions
- options to control how the json is parsed. accepts the same options and the
json data source.
public static Column from_json(Column e, StructType schema)
StructType
with the specified schema.
Returns null
, in the case of an unparseable string.
e
- a string column containing JSON data.schema
- the schema to use when parsing the json string
public static Column from_json(Column e, DataType schema)
StructType
or ArrayType
of StructType
s
with the specified schema. Returns null
, in the case of an unparseable string.
e
- a string column containing JSON data.schema
- the schema to use when parsing the json string
public static Column from_json(Column e, String schema, java.util.Map<String,String> options)
StructType
or ArrayType
of StructType
s
with the specified schema. Returns null
, in the case of an unparseable string.
e
- a string column containing JSON data.schema
- the schema to use when parsing the json string as a json string. In Spark 2.1,
the user-provided schema has to be in JSON format. Since Spark 2.2, the DDL
format is also supported for the schema.
options
- (undocumented)public static Column to_json(Column e, scala.collection.immutable.Map<String,String> options)
StructType
or ArrayType
of StructType
s
into a JSON string with the specified schema. Throws an exception, in the case of an
unsupported type.
e
- a column containing a struct or array of the structs.options
- options to control how the struct column is converted into a json string.
accepts the same options and the json data source.
public static Column to_json(Column e, java.util.Map<String,String> options)
StructType
or ArrayType
of StructType
s
into a JSON string with the specified schema. Throws an exception, in the case of an
unsupported type.
e
- a column containing a struct or array of the structs.options
- options to control how the struct column is converted into a json string.
accepts the same options and the json data source.
public static Column to_json(Column e)
StructType
or ArrayType
of StructType
s into a JSON string
with the specified schema. Throws an exception, in the case of an unsupported type.
e
- a column containing a struct or array of the structs.
public static Column size(Column e)
e
- (undocumented)public static Column sort_array(Column e)
e
- (undocumented)public static Column sort_array(Column e, boolean asc)
e
- (undocumented)asc
- (undocumented)public static <RT> UserDefinedFunction udf(scala.Function0<RT> f, scala.reflect.api.TypeTags.TypeTag<RT> evidence$2)
f
- (undocumented)evidence$2
- (undocumented)public static <RT,A1> UserDefinedFunction udf(scala.Function1<A1,RT> f, scala.reflect.api.TypeTags.TypeTag<RT> evidence$3, scala.reflect.api.TypeTags.TypeTag<A1> evidence$4)
f
- (undocumented)evidence$3
- (undocumented)evidence$4
- (undocumented)public static <RT,A1,A2> UserDefinedFunction udf(scala.Function2<A1,A2,RT> f, scala.reflect.api.TypeTags.TypeTag<RT> evidence$5, scala.reflect.api.TypeTags.TypeTag<A1> evidence$6, scala.reflect.api.TypeTags.TypeTag<A2> evidence$7)
f
- (undocumented)evidence$5
- (undocumented)evidence$6
- (undocumented)evidence$7
- (undocumented)public static <RT,A1,A2,A3> UserDefinedFunction udf(scala.Function3<A1,A2,A3,RT> f, scala.reflect.api.TypeTags.TypeTag<RT> evidence$8, scala.reflect.api.TypeTags.TypeTag<A1> evidence$9, scala.reflect.api.TypeTags.TypeTag<A2> evidence$10, scala.reflect.api.TypeTags.TypeTag<A3> evidence$11)
f
- (undocumented)evidence$8
- (undocumented)evidence$9
- (undocumented)evidence$10
- (undocumented)evidence$11
- (undocumented)public static <RT,A1,A2,A3,A4> UserDefinedFunction udf(scala.Function4<A1,A2,A3,A4,RT> f, scala.reflect.api.TypeTags.TypeTag<RT> evidence$12, scala.reflect.api.TypeTags.TypeTag<A1> evidence$13, scala.reflect.api.TypeTags.TypeTag<A2> evidence$14, scala.reflect.api.TypeTags.TypeTag<A3> evidence$15, scala.reflect.api.TypeTags.TypeTag<A4> evidence$16)
f
- (undocumented)evidence$12
- (undocumented)evidence$13
- (undocumented)evidence$14
- (undocumented)evidence$15
- (undocumented)evidence$16
- (undocumented)public static <RT,A1,A2,A3,A4,A5> UserDefinedFunction udf(scala.Function5<A1,A2,A3,A4,A5,RT> f, scala.reflect.api.TypeTags.TypeTag<RT> evidence$17, scala.reflect.api.TypeTags.TypeTag<A1> evidence$18, scala.reflect.api.TypeTags.TypeTag<A2> evidence$19, scala.reflect.api.TypeTags.TypeTag<A3> evidence$20, scala.reflect.api.TypeTags.TypeTag<A4> evidence$21, scala.reflect.api.TypeTags.TypeTag<A5> evidence$22)
f
- (undocumented)evidence$17
- (undocumented)evidence$18
- (undocumented)evidence$19
- (undocumented)evidence$20
- (undocumented)evidence$21
- (undocumented)evidence$22
- (undocumented)public static <RT,A1,A2,A3,A4,A5,A6> UserDefinedFunction udf(scala.Function6<A1,A2,A3,A4,A5,A6,RT> f, scala.reflect.api.TypeTags.TypeTag<RT> evidence$23, scala.reflect.api.TypeTags.TypeTag<A1> evidence$24, scala.reflect.api.TypeTags.TypeTag<A2> evidence$25, scala.reflect.api.TypeTags.TypeTag<A3> evidence$26, scala.reflect.api.TypeTags.TypeTag<A4> evidence$27, scala.reflect.api.TypeTags.TypeTag<A5> evidence$28, scala.reflect.api.TypeTags.TypeTag<A6> evidence$29)
f
- (undocumented)evidence$23
- (undocumented)evidence$24
- (undocumented)evidence$25
- (undocumented)evidence$26
- (undocumented)evidence$27
- (undocumented)evidence$28
- (undocumented)evidence$29
- (undocumented)public static <RT,A1,A2,A3,A4,A5,A6,A7> UserDefinedFunction udf(scala.Function7<A1,A2,A3,A4,A5,A6,A7,RT> f, scala.reflect.api.TypeTags.TypeTag<RT> evidence$30, scala.reflect.api.TypeTags.TypeTag<A1> evidence$31, scala.reflect.api.TypeTags.TypeTag<A2> evidence$32, scala.reflect.api.TypeTags.TypeTag<A3> evidence$33, scala.reflect.api.TypeTags.TypeTag<A4> evidence$34, scala.reflect.api.TypeTags.TypeTag<A5> evidence$35, scala.reflect.api.TypeTags.TypeTag<A6> evidence$36, scala.reflect.api.TypeTags.TypeTag<A7> evidence$37)
f
- (undocumented)evidence$30
- (undocumented)evidence$31
- (undocumented)evidence$32
- (undocumented)evidence$33
- (undocumented)evidence$34
- (undocumented)evidence$35
- (undocumented)evidence$36
- (undocumented)evidence$37
- (undocumented)public static <RT,A1,A2,A3,A4,A5,A6,A7,A8> UserDefinedFunction udf(scala.Function8<A1,A2,A3,A4,A5,A6,A7,A8,RT> f, scala.reflect.api.TypeTags.TypeTag<RT> evidence$38, scala.reflect.api.TypeTags.TypeTag<A1> evidence$39, scala.reflect.api.TypeTags.TypeTag<A2> evidence$40, scala.reflect.api.TypeTags.TypeTag<A3> evidence$41, scala.reflect.api.TypeTags.TypeTag<A4> evidence$42, scala.reflect.api.TypeTags.TypeTag<A5> evidence$43, scala.reflect.api.TypeTags.TypeTag<A6> evidence$44, scala.reflect.api.TypeTags.TypeTag<A7> evidence$45, scala.reflect.api.TypeTags.TypeTag<A8> evidence$46)
f
- (undocumented)evidence$38
- (undocumented)evidence$39
- (undocumented)evidence$40
- (undocumented)evidence$41
- (undocumented)evidence$42
- (undocumented)evidence$43
- (undocumented)evidence$44
- (undocumented)evidence$45
- (undocumented)evidence$46
- (undocumented)public static <RT,A1,A2,A3,A4,A5,A6,A7,A8,A9> UserDefinedFunction udf(scala.Function9<A1,A2,A3,A4,A5,A6,A7,A8,A9,RT> f, scala.reflect.api.TypeTags.TypeTag<RT> evidence$47, scala.reflect.api.TypeTags.TypeTag<A1> evidence$48, scala.reflect.api.TypeTags.TypeTag<A2> evidence$49, scala.reflect.api.TypeTags.TypeTag<A3> evidence$50, scala.reflect.api.TypeTags.TypeTag<A4> evidence$51, scala.reflect.api.TypeTags.TypeTag<A5> evidence$52, scala.reflect.api.TypeTags.TypeTag<A6> evidence$53, scala.reflect.api.TypeTags.TypeTag<A7> evidence$54, scala.reflect.api.TypeTags.TypeTag<A8> evidence$55, scala.reflect.api.TypeTags.TypeTag<A9> evidence$56)
f
- (undocumented)evidence$47
- (undocumented)evidence$48
- (undocumented)evidence$49
- (undocumented)evidence$50
- (undocumented)evidence$51
- (undocumented)evidence$52
- (undocumented)evidence$53
- (undocumented)evidence$54
- (undocumented)evidence$55
- (undocumented)evidence$56
- (undocumented)public static <RT,A1,A2,A3,A4,A5,A6,A7,A8,A9,A10> UserDefinedFunction udf(scala.Function10<A1,A2,A3,A4,A5,A6,A7,A8,A9,A10,RT> f, scala.reflect.api.TypeTags.TypeTag<RT> evidence$57, scala.reflect.api.TypeTags.TypeTag<A1> evidence$58, scala.reflect.api.TypeTags.TypeTag<A2> evidence$59, scala.reflect.api.TypeTags.TypeTag<A3> evidence$60, scala.reflect.api.TypeTags.TypeTag<A4> evidence$61, scala.reflect.api.TypeTags.TypeTag<A5> evidence$62, scala.reflect.api.TypeTags.TypeTag<A6> evidence$63, scala.reflect.api.TypeTags.TypeTag<A7> evidence$64, scala.reflect.api.TypeTags.TypeTag<A8> evidence$65, scala.reflect.api.TypeTags.TypeTag<A9> evidence$66, scala.reflect.api.TypeTags.TypeTag<A10> evidence$67)
f
- (undocumented)evidence$57
- (undocumented)evidence$58
- (undocumented)evidence$59
- (undocumented)evidence$60
- (undocumented)evidence$61
- (undocumented)evidence$62
- (undocumented)evidence$63
- (undocumented)evidence$64
- (undocumented)evidence$65
- (undocumented)evidence$66
- (undocumented)evidence$67
- (undocumented)public static UserDefinedFunction udf(Object f, DataType dataType)
f
- A closure in ScaladataType
- The output data type of the UDF
public static Column callUDF(String udfName, scala.collection.Seq<Column> cols)
import org.apache.spark.sql._
val df = Seq(("id1", 1), ("id2", 4), ("id3", 5)).toDF("id", "value")
val spark = df.sparkSession
spark.udf.register("simpleUDF", (v: Int) => v * v)
df.select($"id", callUDF("simpleUDF", $"value"))
udfName
- (undocumented)cols
- (undocumented)