Module leapyear.ext

Quick client

leapyear.ext.user.client(config_file=PosixPath('~/.leapyear_client.ini'), debug=False, **kwargs)

Use environment variables or a configuration file to quickly connect to LeapYear.

This function uses values found in environment variables, a config file, keyword arguments and built-in defaults to try to establish a connection to a LeapYear server. In order of precedence, values are taken from the kwargs of this function, then environment variable, the config file, and finally, default values, if they exist.

By default, the config file is ~/.leapyear_client.ini. The config file requires a [leapyear.io] section where values can be found.

The following table contains the names of the values when specified by particular methods, and the default values if no other value can be determined.

environment variable

ini key/keyword argument

default value

LY_URL

url

'http://localhost:4408'

LY_USERNAME

username

None

LY_PASSWORD

password

None

LY_DEFAULT_ANALYSIS_CACHING

default_analysis_caching

True

LY_DEFAULT_ALLOW_MAX_BUDGET_ALLOCATION

default_allow_max_budget_allocation

True

LY_LOGGING_LEVEL

logging_level

'NOTSET'

At least username and password must be supplied to establish a connection to the LeapYear server.

logging_level should be the name of a logging level in logging.

Example

Contents of ~/.leapyear_client.ini are

[leapyear.io]
username = alice
password = lihjAgsd324$
url = http://leapyear-core.domain.com:4408

Next, we execute a basic test with debug=True to see the values that are passed to the Client constructor.

>>> from leapyear.ext.user import client
>>> import logging
>>> logging.basicConfig()
>>> c = client(debug=True)
DEBUG:leapyear.ext.user:Found config file with [leapyear.io] section.
DEBUG:leapyear.ext.user:Resolved the following values:
DEBUG:leapyear.ext.user:  url <- <str: 'http://leapyear-core.domain.com:4408'>
DEBUG:leapyear.ext.user:  username <- <str: 'alice'>
DEBUG:leapyear.ext.user:  password <- <str: 'lihjAgsd324$'>
DEBUG:leapyear.ext.user:  default_analysis_caching <- <bool: True>
DEBUG:leapyear.ext.user:  default_allow_max_budget_allocationt <- <bool: True>
DEBUG:leapyear.ext.user:  logging_level <- <int: 0>
>>> print(c.connected)
True
>>> c.close()
Parameters
  • config_file (pathlib.Path) – Specify an alternate config file.

  • debug (bool) – Set to True to enable extra debugging information. basicConfig() may be useful to run so that debugging information will print to stdout.

Return type

Client

Pandas utilities

These are utilities that allow the manipulation of pandas DataFrames or external data sets.

leapyear.ext.pandas.left_join_df(ly_dataset, pandas_df, ly_key_column_name, pandas_key_column_name, pandas_value_column_names, output_value_column_names=None)

Left join a LeapYear DataSet with a pandas DataFrame.

This function allows a user to perform a left join of a LeapYear DataSet with a pandas DataFrame. See the Example for more information.

Note

Currently, only a single column can be used as the key in the join.

Parameters
  • ly_dataset (DataSet) – The input LeapYear DataSet (left).

  • pandas_df (pd.DataFrame) – The pandas DataFrame to join with (right).

  • ly_key_column_name (str) – The name of the column with the join key in the LeapYear DataSet.

  • pandas_key_column_name (str) – The name of the column with the join key in the pandas DataFrame.

  • pandas_value_column_names (List[str]) – A list of the names of the columns in the pandas DataFrame with the values to join.

  • output_value_column_names (Optional[List[str]]) – A list of the names of the columns in the output DataSet that contain the joined input values. This must have the same length as pandas_value_column_names, if provided. If not provided, the column names default to pandas_value_column_names.

Returns

A LeapYear DataSet with a new column, named output_value_column_name, containing the result of joining the ly_key_column_name column to the pandas_key_column_name column.

Return type

DataSet

Example

Suppose that we have a table with a Sex column containing the values male and female:

Sex

Age

male

22

female

38

female

26

female

35

male

35

We’d like to encode the abbreviations (coming from the first letter) as a new column, coming from a pandas DataFrame of the following form:

Sex

first_letter

male

m

female

f

To do so, we call this function, assuming the LeapYear DataSet is called ds, and we wish to call the new column sex_first_letter:

new_ds = left_join_df(ly_dataset=ds,
                      pandas_df=encoding_df,
                      pandas_key_column_name='Sex',
                      pandas_value_column_names=['first_letter'],
                      ly_key_column_name='Sex',
                      output_value_column_names=['sex_first_letter'])

This will produce a DataSet which looks like:

Sex

Age

sex_first_letter

male

22

m

female

38

f

female

26

f

female

35

f

male

35

m

leapyear.ext.pandas.left_join_csv_with_header(ly_dataset, csv_location, csv_key_column_name, csv_value_column_names, ly_key_column_name, output_value_column_names=None, **kwargs)

Left join a LeapYear DataSet with a CSV that has a header row.

This function allows a user to perform a left join of a LeapYear DataSet with a CSV that has a header row. The user must specify names for the CSV columns that contain the join key and values. The key column must have unique values.

Note

Currently, only a single column can be used as the key in the join.

Parameters
  • ly_dataset (DataSet) – The LeapYear DataSet to join on.

  • csv_location (str) – The path to the CSV.

  • csv_key_column_name (str) – The name of the key column in the CSV.

  • csv_value_column_names (List[str]) – A list of the names of the value column in the CSV.

  • ly_key_column_name (str) – The column name to join on.

  • output_value_column_names (Optional[List[str]]) – A list of the names of the columns in the output DataSet that contain the joined input values. This must have the same length as csv_value_column_names, if provided. If not provided, the column names default to csv_value_column_names.

  • kwargs (Any) – The kwargs are passed to the pandas read_csv function.

Returns

A LeapYear DataSet with a new column, named output_value_column_name, containing the result of joining the ly_key_column_name column to the csv_key_column_name column.

Return type

DataSet

Example

Suppose that we have a table with a Sex column containing the values male and female:

Sex

Age

male

22

female

38

female

26

female

35

male

35

We’d like to encode the abbreviations (coming from the first letter) as a new column, coming from a CSV (with a header) of the following form:

Sex

first_letter

male

m

female

f

To do so, we call this function, assuming that the CSV is found at csv_location, the LeapYear DataSet is called ds, and we wish to call the new column sex_first_letter:

new_ds = left_join_csv_with_header(ly_dataset=ds,
                                   csv_location=csv_location,
                                   csv_key_column_name='Sex',
                                   csv_value_column_names=['first_letter'],
                                   ly_key_column_name='Sex',
                                   output_value_column_names=['sex_first_letter'])

This will produce a DataSet which looks like:

Sex

Age

sex_first_letter

male

22

m

female

38

f

female

26

f

female

35

f

male

35

m

leapyear.ext.pandas.left_join_csv_no_header(ly_dataset, csv_location, csv_key_column_index, csv_value_column_indices, ly_key_column_name, output_value_column_names=None, **kwargs)

Left join a LeapYear DataSet with a CSV that has no header row.

This function allows a user to perform a left join of a LeapYear DataSet with a CSV that has no header row. The user must specify an index (column number) for the CSV column that contains the join key and the column with values. The key column must have unique values.

Note

Currently, only a single column can be used as the key in the join.

Parameters
  • ly_dataset (DataSet) – The LeapYear DataSet to join on.

  • csv_location (str) – The path to the CSV.

  • csv_key_column_index (int) – The index of the key column in the CSV (0-indexed).

  • csv_value_column_indices (List[int]) – A list of the indices of the value columns in the CSV (0-indexed).

  • ly_key_column_name (str) – The column name to join on.

  • output_value_column_names (Optional[List[str]]) – The desired column names for the value column in the output DataSet. Defaults to the index of the column, must be same length as csv_value_column_indices if provided.

  • kwargs (Any) – The kwargs are passed to the pandas read_csv function.

Returns

A LeapYear DataSet with a new column, named output_value_column_name, containing the result of joining the ly_key_column_name column to the column in position csv_key_column_index.

Return type

DataSet

Example

Suppose that we have a table with a Sex column containing the values male and female:

Sex

Age

male

22

female

38

female

26

female

35

male

35

We’d like to encode the abbreviations (coming from the first letter) as a new column, coming from a CSV (with no header) of the following form:

male

m

female

f

To do so, we call this function, assuming that the CSV is found at csv_location, the LeapYear DataSet is called ds, and we wish to call the new column sex_first_letter:

new_ds = left_join_csv_no_header(ly_dataset=ds,
                                 csv_location=csv_location,
                                 csv_key_column_index=0,
                                 csv_value_column_indices=[1],
                                 ly_key_column_name='Sex',
                                 output_value_column_names=['sex_first_letter'])

This will produce a DataSet which looks like:

Sex

Age

sex_first_letter

male

22

m

female

38

f

female

26

f

female

35

f

male

35

m