Module leapyear.functions

Functions for Attributes.

Datetime functions

Time functions for Attributes.

Missing functions compared to spark: * current_date * current_timestamp * date_format * from_unixtime * from_utc_timestamp * unix_timestamp

leapyear.functions.time.add_months(start_date, num_months)

Return the date that is num_months after start_date.

NOTE: This function can take a datetime as input, but produces a date output regardless.

Examples

>>> str(dt)
'2004-02-29 23:59:59'
>>> add_months(dt, 1)
'2004-03-29'
>>> add_months(dt, 13)
'2005-03-01'
Return type

Attribute

leapyear.functions.time.date_add(start, days)

Return the date that is days days after start.

NOTE: This function can take a datetime as input, but produces a date output regardless.

Examples

>>> str(dt)
'2004-02-28 23:59:59'
>>> date_add(dt, 1)
'2004-02-29'
>>> date_add(dt, 2)
'2004-03-01'
Return type

Attribute

leapyear.functions.time.date_sub(start, days)

Return the date that is days days before start.

NOTE: This function can take a datetime as input, but produces a date output regardless.

Examples

>>> str(dt)
'2004-03-01 23:59:59'
>>> date_sub(dt, 1)
'2004-02-29'
>>> date_sub(dt, 2)
'2004-02-29'
Return type

Attribute

leapyear.functions.time.datediff(end, start)

Return the number of days from start to end.

Examples

1. Create date diff column ‘date_sub’ that is a difference between datetime column ‘col1’ and datetime column ‘col2’ in ds1:

>>> import leapyear.functions as f
>>> ds2 = ds1.with_attributes({'date_sub':f.time.datediff(f.col('col1'),f.col('col2'))})
Return type

Attribute

leapyear.functions.time.dayofmonth(e)

Extract the day of the month as an integer from a given date/datetime attribute.

Return type

Attribute

leapyear.functions.time.dayofyear(e)

Extract the day of the year as an integer from a given date/datetime attribute.

Return type

Attribute

leapyear.functions.time.hour(e)

Extract the hours as an integer from a given date/datetime attribute.

Return type

Attribute

leapyear.functions.time.last_day(e)

Return the last day of the month which the given date belongs to.

NOTE: This function can take a datetime as input, but produces a date output regardless.

Examples

>>> str(dt)
'2004-02-01 23:59:59'
>>> last_day(dt)
'2004-02-29'
Return type

Attribute

leapyear.functions.time.minute(e)

Extract the minutes as an integer from a given date/datetime attribute.

Return type

Attribute

leapyear.functions.time.month(e)

Extract the month as an integer from a given date/datetime attribute.

Return type

Attribute

leapyear.functions.time.months_between(date1, date2)

Return number of months between dates date1 and date2.

The result will be negative if date1 is >=1 month before date2

Examples

>>> str(date1)
'2004-01-01'
>>> str(date2)
'2004-02-01'
>>> months_between(date1, date2)
-1
>>> months_between(date1, date2)
1
Return type

Attribute

leapyear.functions.time.next_day(date, day_of_week)

Return the next date that falls on the specified day of the week.

NOTE: This function can take a datetime as input, but produces a date output regardless.

Examples

>>> str(dt)
'2004-02-23 23:59:59'
>>> next_day(dt, "Sunday")
'2004-02-29'
Return type

Attribute

leapyear.functions.time.quarter(e)

Extract the quarter as an integer from a given date/datetime attribute.

Return type

Attribute

leapyear.functions.time.second(e)

Extract the seconds as an integer from a given date/datetime attribute.

Return type

Attribute

leapyear.functions.time.to_date(e)

Convert the column into Date type.

Examples

  1. Create truncated column ‘date_trunc’ from a datetime column ‘col1’ in ds1:

>>> import leapyear.functions as f
>>> ds2 = ds1.with_attributes({'date_trunc':f.time.to_date(f.col('col1'))})
Return type

Attribute

leapyear.functions.time.to_datetime(ts)

Convert the column into Datetime type.

Return type

Attribute

leapyear.functions.time.trunc(date, fmt)

Return date truncated to the unit specified by the format.

fmt can be one of: “year”, “month”, “day”

NOTE: This function can take a datetime as input, but produces a date output regardless.

Examples

>>> str(dt)
'2004-02-29 23:59:59'
>>> trunc(dt, "month")
'2004-02-01'
Return type

Attribute

leapyear.functions.time.weekofyear(e)

Extract the week number as an integer from a given date/datetime attribute.

Week numbers start at 1, and new weeks start on Monday.

Return type

Attribute

leapyear.functions.time.year(e)

Extract the year as an integer from a given date/datetime attribute.

Return type

Attribute

leapyear.functions.time.dayofweek(e)

Extract the day of the week number as an integer from a given date/datetime attribute, where Monday = 1, …, Sunday = 7.

Return type

Attribute

leapyear.functions.time.parse_clamped_time(e, bounds, *, fmt='yyyy-MM-dd HH:mm:ss')

Parse a Text column and return a DateTime column using the given format clamped to bounds.

Format specification follows: https://docs.oracle.com/javase/tutorial/i18n/format/simpleDateFormat.html

Examples

>>> import datetime.datetime as dt
>>> parse_clamped_time(f.col("date"), bounds=(dt(2000, 1, 1), dt(2020, 12, 31)), fmt="yyyMMdd HHmmss")
Return type

Attribute

Math functions

Math functions for Attributes.

leapyear.functions.math.acos(e)

Compute the cosine inverse of the given value.

The returned angle is in the range 0 through \(\pi\).

Return type

Attribute

leapyear.functions.math.asin(e)

Compute the sine inverse of the given value.

The returned angle is in the range -pi/2 through pi/2.

Return type

Attribute

leapyear.functions.math.atan(e)

Compute the tangent inverse of the given value.

Return type

Attribute

leapyear.functions.math.cbrt(e)

Compute the cube-root of the given value.

Return type

Attribute

leapyear.functions.math.ceil(e)

Compute the ceiling of the given value.

Return type

Attribute

leapyear.functions.math.cos(e)

Compute the cosine of the given value.

Return type

Attribute

leapyear.functions.math.cosh(e)

Compute the hyperbolic cosine of the given value.

Return type

Attribute

leapyear.functions.math.degrees(e)

Convert an angle measured in radians to an equivalent angle measured in degrees.

Return type

Attribute

leapyear.functions.math.exp(e)

Compute the exponential of the given value.

Return type

Attribute

leapyear.functions.math.expm1(e)

Compute the exponential of the given value minus one.

Return type

Attribute

leapyear.functions.math.floor(e)

Compute the floor of the given value.

Return type

Attribute

leapyear.functions.math.hypot(x, y)

Compute sqrt(x^2 + y^2) without intermediate overflow or underflow.

Return type

Attribute

leapyear.functions.math.log(e)

Compute the natural logarithm of the given value.

Return type

Attribute

leapyear.functions.math.log10(e)

Compute the logarithm of the given value in base 10.

Return type

Attribute

leapyear.functions.math.log1p(e)

Compute the natural logarithm of the given value plus one.

Return type

Attribute

leapyear.functions.math.log2(e)

Compute the logarithm of the given column in base 2.

Return type

Attribute

leapyear.functions.math.sigmoid(e)

Compute the sigmoid of the given value.

Return type

Attribute

leapyear.functions.math.pow(l, r)

Return the value of the first argument raised to the power of the second argument.

Return type

Attribute

leapyear.functions.math.radians(e)

Convert an angle measured in degrees to an equivalent angle measured in radians.

Return type

Attribute

leapyear.functions.math.round(e, scale=0)

Round the value of e to scale decimal places.

Examples

1. Create a new attribute ‘col1_trn’ which is derived by rounding ‘col1’ to 2 digits:

>>> import leapyear.functions as f
>>> ds2 = ds1.with_attributes({'col1_trn': f.math.round(f.col('col1'),2)})
Return type

Attribute

leapyear.functions.math.signum(e)

Compute the signum of the given value.

Return type

Attribute

leapyear.functions.math.sin(e)

Compute the sine of the given value.

Return type

Attribute

leapyear.functions.math.sinh(e)

Compute the hyperbolic sine of the given value.

Return type

Attribute

leapyear.functions.math.sqrt(e)

Compute the square root of the specified float value.

Return type

Attribute

leapyear.functions.math.tan(e)

Compute the tangent of the given value.

Return type

Attribute

leapyear.functions.math.tanh(e)

Compute the hyperbolic tangent of the given value.

Return type

Attribute

leapyear.functions.math.erf(e)

Compute the error function on the given value.

Return type

Attribute

leapyear.functions.math.erfc(e)

Compute the error function on the given value.

Return type

Attribute

leapyear.functions.math.inverf(e)

Compute the error function on the given value.

Return type

Attribute

leapyear.functions.math.inverfc(e)

Compute the error function on the given value.

Return type

Attribute

Non-aggregate functions

Non-aggregate functions for Attributes.

leapyear.functions.non_aggregate.all(*exprs)

Return True if all columns are True.

Return type

Attribute

leapyear.functions.non_aggregate.attr_in(input_attr_name, in_list)

Expression for filtering rows based on attribute values that are IN a list of values.

Parameters
  • input_attr_name (String) – Name of Attribute based on which to filter

  • in_list (Iterable) – List values for filtering

Returns

Boolean Attribute

Return type

Attribute

Examples

  1. Filtering based on an attribute taking a value from a list of values:

>>> import leapyear.functions as f
>>> ds2 = ds.where(f.attr_in('col1',in_list=[val1,val2,val3]))
leapyear.functions.non_aggregate.attr_not_in(input_attr_name, not_in_list)

Expression for filtering rows based on attribute values that are IN a list of values.

Parameters
  • input_attr_name (str) – Name of Attribute based on which to filter

  • not_in_list (Iterable[+T_co]) – List values for filtering

Returns

Boolean Attribute

Return type

Attribute

Examples

  1. Filtering based on an attribute NOT taking a value from a list of values:

>>> import leapyear.functions as f
>>> ds2 = ds.where(f.attr_not_in('col1',not_in_list=[val1,val2,val3]))
leapyear.functions.non_aggregate.any(*exprs)

Return True if any column is True.

Return type

Attribute

leapyear.functions.non_aggregate.abs(e)

Compute the absolute value.

Return type

Attribute

leapyear.functions.non_aggregate.col(col_name)

Return an Attribute based on the given name.

Return type

Attribute

leapyear.functions.non_aggregate.column(col_name)

Return an Attribute based on the given name.

Return type

Attribute

leapyear.functions.non_aggregate.greatest(*exprs)

Return the greatest value of the list of column names, skipping null values.

Return type

Attribute

leapyear.functions.non_aggregate.isnull(e)

Return true iff the column is null.

Return type

Attribute

leapyear.functions.non_aggregate.least(*exprs)

Return the greatest value of the list of column names, skipping null values.

Return type

Attribute

leapyear.functions.non_aggregate.lit(literal)

Create an Attribute of literal value.

Return type

Attribute

leapyear.functions.non_aggregate.negate(e)

Unary minus.

Return type

Attribute

leapyear.functions.non_aggregate.not_(e)

Inversion of boolean expression.

Return type

Attribute

leapyear.functions.non_aggregate.when(condition, value)

Evaluate a list of conditions and returns one of multiple possible result expressions.

If otherwise is not defined at the end, null is returned for unmatched conditions.

Example

Encoding gender string column into an integer.

>>> df.select(when(col("gender") == "male", 0)
...          .when(col("gender") == "female", 1)
...          .otherwise(2))
Return type

Attribute

leapyear.functions.non_aggregate.to_text(e)

Convert an attribute to a string representation.

Return type

Attribute

String functions

Math functions for Attributes.

leapyear.functions.string.ascii(e)

Convert the first character of the string to its ASCII value.

Return type

Attribute

leapyear.functions.string.concat(*exprs, sep=None)

Concatenation of strings with optional separator.

Concatenating factors {‘a’, ‘b’} and {‘c’, ‘d’} gives {‘ac’, ‘ad’, ‘bc’, ‘cd’}. Including the separator ” ” gives {‘a c’, ‘a d’, ‘b c’, ‘c d’}.

If any of the expressions are type TEXT, then the result will be TEXT.

Examples

1. Create new column ‘col1_trn’ which concatenation of ‘col1’ and ‘col2’ with separator ‘-’. It can be used to contruct a date column from day and month columns:

>>> import leapyear.functions as f
>>> ds2 = ds1.with_attributes({'col1_trn':f.concat(ds1['col1'],ds1['col2'],sep='-')})
Return type

Attribute

leapyear.functions.string.instr(attr, substr)

Give the position of substring in the Attribute, otherwise 0.

Return type

Attribute

leapyear.functions.string.length(attr)

Return the length of the string.

Return type

Attribute

leapyear.functions.string.levenshtein(attr1, attr2)

Compute the Levenshtein distance between strings.

Return type

Attribute

leapyear.functions.string.locate(attr, substr, pos=None)

Give the position of substr in the Attribute, optionally after pos.

Position is returned as a positive integer starting at 1 if the substring is found, and 0 if the substring is not found.

leapyear.functions.string.lpad(attr, len_, pad)

Pad the string on the left with pad to make the total length len_.

When pad is an empty string, the string is returned without modification, or truncated to len_.

Return type

Attribute

leapyear.functions.string.ltrim(attr)

Remove whitespace characters on the beginning of the string.

Return type

Attribute

leapyear.functions.string.reverse(attr)

Reverse the string.

Return type

Attribute

leapyear.functions.string.repeat(attr, n)

Repeat the string n times.

Return type

Attribute

leapyear.functions.string.rpad(attr, len_, pad)

Pad the string on the right with pad to make the total length len_.

When pad is an empty string, the string is returned without modification, or truncated to len_.

Return type

Attribute

leapyear.functions.string.rtrim(attr)

Remove whitespace characters at the end of the string.

Return type

Attribute

leapyear.functions.string.soundex(attr)

Return the soundex code for the specified expression.

Return type

Attribute

leapyear.functions.string.substring(attr, start, len_)

Return the substring of length len_ starting at start.

Return type

Attribute

leapyear.functions.string.substring_index(attr, delim, len_)

Return the substring from string str before count occurrences of the delimiter delim.

If count is positive, everything the left of the final delimiter (counting from left) is returned. If count is negative, every to the right of the final delimiter (counting from the right) is returned. substring_index performs a case-sensitive match when searching for delim.

Return type

Attribute

leapyear.functions.string.translate(attr, match, replace)

Translate any character in the src by a character in replace.

The characters in replace correspond to the characters in matching. The translate will happen when any character in the string matches the character in the match.

Return type

Attribute

leapyear.functions.string.trim(attr)

Remove the whitespace from the beginning and end of the string.

Return type

Attribute

leapyear.functions.string.lex_lt(attr1, attr2)

Lexicographical less-than operation.

Return type

Attribute

leapyear.functions.string.lex_lte(attr1, attr2)

Lexicographical less-than-or-equal operation.

Return type

Attribute

leapyear.functions.string.lex_gt(attr1, attr2)

Lexicographical greater-than operation.

Return type

Attribute

leapyear.functions.string.lex_gte(attr1, attr2)

Lexicographical greater-than-or-equal operation.

Return type

Attribute

leapyear.functions.string.remove_accents(attr)

Remove all accents from the string.

Return type

Attribute

leapyear.functions.string.lower(attr)

Convert strings to lowercase.

Return type

Attribute

leapyear.functions.string.upper(attr)

Convert strings to uppercase.

Return type

Attribute

leapyear.functions.string.regex_extract(attr, pattern, group_idx)

Use a regular expression pattern to extract a part of the string.

This function always results in a nullable Text type.

Return type

Attribute

leapyear.functions.string.regex_replace(attr, pattern, replace)

Use a regular expression pattern to replace a part of the string.

This function always results in a nullable Text type.

Return type

Attribute

Windowing functions

Window functions for Attributes.

leapyear.functions.window.lead(e, i)

Compute the lead value.

Return type

WindowAttribute

leapyear.functions.window.lag(e, i)

Compute the lag value.

Return type

WindowAttribute

leapyear.functions.window.first(e)

Compute the first non-null value.

Return type

WindowAttribute

leapyear.functions.window.last(e)

Compute the last non-null value.

Return type

WindowAttribute

leapyear.functions.window.count(e=None)

Compute the number of non-null entries.

Return type

WindowAttribute

leapyear.functions.window.approx_count_distinct(attrs)

Compute the number of distinct entries (using an approximate streaming algorithm).

Return type

WindowAttribute

leapyear.functions.window.mean(e)

Compute the mean value.

Examples

1. Create a window specification that partitions based on ‘col1’ and orders by a ‘date_col’ and picks past 14 rows from current row 0. Then, we can compute mean of ‘col2’ over this window:

>>> import leapyear.functions as f
>>> from leapyear.dataset import Window
>>> ws1 = Window.partition_by(f.col('col1'))
                .order_by(f.col('date_col').asc())
                .rows_between(start=-14,end=0)
>>> ds2 = ds1.project(['date_col','col1','col2'])
             .with_attribute('mean_col2', f.window.mean(f.col('col2')).over(ws1))
Return type

WindowAttribute

leapyear.functions.window.avg(e)

Compute the mean value.

Examples

1. Create a window specification that partitions based on ‘col1’ and orders by a ‘date_col’ and picks past 14 rows from current row 0. Then, we can compute mean of ‘col2’ over this window:

>>> import leapyear.functions as f
>>> from leapyear.dataset import Window
>>> ws1 = Window.partition_by(f.col('col1'))
                .order_by(f.col('date_col').asc())
                .rows_between(start=-14,end=0)
>>> ds2 = ds1.project(['date_col','col1','col2'])
             .with_attribute('mean_col2', f.window.mean(f.col('col2')).over(ws1))
Return type

WindowAttribute

leapyear.functions.window.sum(e)

Compute the sum.

Return type

WindowAttribute

leapyear.functions.window.or_(e)

Compute the or of boolean values.

Return type

WindowAttribute

leapyear.functions.window.and_(e)

Compute the and of boolean values.

Return type

WindowAttribute

leapyear.functions.window.min(e)

Compute the min value.

Return type

WindowAttribute

leapyear.functions.window.max(e)

Compute the max value.

Return type

WindowAttribute

leapyear.functions.window.stddev(e)

Compute the sample standard deviation.

Return type

WindowAttribute

leapyear.functions.window.stddev_samp(e)

Compute the sample standard deviation.

Return type

WindowAttribute

leapyear.functions.window.stddev_pop(e)

Compute the population standard deviation.

Return type

WindowAttribute

leapyear.functions.window.variance(e)

Compute the sample variance.

Return type

WindowAttribute

leapyear.functions.window.variance_samp(e)

Compute the sample variance.

Return type

WindowAttribute

leapyear.functions.window.variance_pop(e)

Compute the population variance.

Return type

WindowAttribute

leapyear.functions.window.skewness(e)

Compute the skewness.

Return type

WindowAttribute

leapyear.functions.window.kurtosis(e)

Compute the kurtosis.

Return type

WindowAttribute

leapyear.functions.window.covar_samp(e1, e2)

Compute the sample covariance.

Return type

WindowAttribute

leapyear.functions.window.covar_pop(e1, e2)

Compute the population covariance.

Return type

WindowAttribute

leapyear.functions.window.corr(e1, e2)

Compute the correlation.

Return type

WindowAttribute