Module leapyear.ext¶
Quick client¶
-
leapyear.ext.user.
client
(config_file=PosixPath('~/.leapyear_client.ini'), debug=False, **kwargs)¶ Use environment variables or a configuration file to quickly connect to LeapYear.
This function uses values found in environment variables, a config file, keyword arguments and built-in defaults to try to establish a connection to a LeapYear server. In order of precedence, values are taken from the
kwargs
of this function, then environment variable, the config file, and finally, default values, if they exist.By default, the config file is
~/.leapyear_client.ini
. The config file requires a[leapyear.io]
section where values can be found.The following table contains the names of the values when specified by particular methods, and the default values if no other value can be determined.
environment variable
ini key/keyword argument
default value
LY_URL
url
'http://localhost:4401'
LY_USERNAME
username
None
LY_PASSWORD
password
None
LY_DEFAULT_ANALYSIS_CACHING
default_analysis_caching
True
LY_DEFAULT_ALLOW_MAX_BUDGET_ALLOCATION
default_allow_max_budget_allocation
True
LY_LOGGING_LEVEL
logging_level
'NOTSET'
At least
username
andpassword
must be supplied to establish a connection to the LeapYear server.logging_level
should be the name of a logging level inlogging
.Example
Contents of
~/.leapyear_client.ini
are[leapyear.io] username = alice password = lihjAgsd324$ url = http://api.leapyear.domain.com:4401
Next, we execute a basic test with
debug=True
to see the values that are passed to theClient
constructor.>>> from leapyear.ext.user import client >>> import logging >>> logging.basicConfig() >>> c = client(debug=True) DEBUG:leapyear.ext.user:Found config file with [leapyear.io] section. DEBUG:leapyear.ext.user:Resolved the following values: DEBUG:leapyear.ext.user: url <- <str: 'http://api.leapyear.domain.com:4401'> DEBUG:leapyear.ext.user: username <- <str: 'alice'> DEBUG:leapyear.ext.user: password <- <str: 'lihjAgsd324$'> DEBUG:leapyear.ext.user: default_analysis_caching <- <bool: True> DEBUG:leapyear.ext.user: default_allow_max_budget_allocationt <- <bool: True> DEBUG:leapyear.ext.user: logging_level <- <int: 0> >>> print(c.connected) True >>> c.close()
- Parameters
config_file (pathlib.Path) – Specify an alternate config file.
debug (bool) – Set to
True
to enable extra debugging information.basicConfig()
may be useful to run so that debugging information will print to stdout.
- Return type
Pandas utilities¶
These are utilities that allow the manipulation of pandas DataFrames or external data sets.
-
leapyear.ext.pandas.
left_join_df
(ly_dataset, pandas_df, ly_key_column_name, pandas_key_column_name, pandas_value_column_names, output_value_column_names=None)¶ Left join a LeapYear DataSet with a pandas DataFrame.
This function allows a user to perform a left join of a LeapYear DataSet with a pandas DataFrame. See the Example for more information.
Note
Currently, only a single column can be used as the key in the join.
- Parameters
ly_dataset (DataSet) – The input LeapYear DataSet (left).
pandas_df (pd.DataFrame) – The pandas DataFrame to join with (right).
ly_key_column_name (str) – The name of the column with the join key in the LeapYear DataSet.
pandas_key_column_name (str) – The name of the column with the join key in the pandas DataFrame.
pandas_value_column_names (List[str]) – A list of the names of the columns in the pandas DataFrame with the values to join.
output_value_column_names (Optional[List[str]]) – A list of the names of the columns in the output DataSet that contain the joined input values. This must have the same length as
pandas_value_column_names
, if provided. If not provided, the column names default topandas_value_column_names
.
- Returns
A LeapYear DataSet with a new column, named output_value_column_name, containing the result of joining the ly_key_column_name column to the pandas_key_column_name column.
- Return type
Example
Suppose that we have a table with a
Sex
column containing the valuesmale
andfemale
:Sex
Age
male
22
female
38
female
26
female
35
male
35
We’d like to encode the abbreviations (coming from the first letter) as a new column, coming from a pandas DataFrame of the following form:
Sex
first_letter
male
m
female
f
To do so, we call this function, assuming the LeapYear DataSet is called
ds
, and we wish to call the new columnsex_first_letter
:new_ds = left_join_df(ly_dataset=ds, pandas_df=encoding_df, pandas_key_column_name='Sex', pandas_value_column_names=['first_letter'], ly_key_column_name='Sex', output_value_column_names=['sex_first_letter'])
This will produce a DataSet which looks like:
Sex
Age
sex_first_letter
male
22
m
female
38
f
female
26
f
female
35
f
male
35
m
-
leapyear.ext.pandas.
left_join_csv_with_header
(ly_dataset, csv_location, csv_key_column_name, csv_value_column_names, ly_key_column_name, output_value_column_names=None, **kwargs)¶ Left join a LeapYear DataSet with a CSV that has a header row.
This function allows a user to perform a left join of a LeapYear DataSet with a CSV that has a header row. The user must specify names for the CSV columns that contain the join key and values. The key column must have unique values.
Note
Currently, only a single column can be used as the key in the join.
- Parameters
ly_dataset (DataSet) – The LeapYear DataSet to join on.
csv_location (str) – The path to the CSV.
csv_key_column_name (str) – The name of the key column in the CSV.
csv_value_column_names (List[str]) – A list of the names of the value column in the CSV.
ly_key_column_name (str) – The column name to join on.
output_value_column_names (Optional[List[str]]) – A list of the names of the columns in the output DataSet that contain the joined input values. This must have the same length as
csv_value_column_names
, if provided. If not provided, the column names default tocsv_value_column_names
.kwargs (Any) – The kwargs are passed to the pandas read_csv function.
- Returns
A LeapYear DataSet with a new column, named output_value_column_name, containing the result of joining the ly_key_column_name column to the csv_key_column_name column.
- Return type
Example
Suppose that we have a table with a
Sex
column containing the valuesmale
andfemale
:Sex
Age
male
22
female
38
female
26
female
35
male
35
We’d like to encode the abbreviations (coming from the first letter) as a new column, coming from a CSV (with a header) of the following form:
Sex
first_letter
male
m
female
f
To do so, we call this function, assuming that the CSV is found at
csv_location
, the LeapYear DataSet is calledds
, and we wish to call the new columnsex_first_letter
:new_ds = left_join_csv_with_header(ly_dataset=ds, csv_location=csv_location, csv_key_column_name='Sex', csv_value_column_names=['first_letter'], ly_key_column_name='Sex', output_value_column_names=['sex_first_letter'])
This will produce a DataSet which looks like:
Sex
Age
sex_first_letter
male
22
m
female
38
f
female
26
f
female
35
f
male
35
m
-
leapyear.ext.pandas.
left_join_csv_no_header
(ly_dataset, csv_location, csv_key_column_index, csv_value_column_indices, ly_key_column_name, output_value_column_names=None, **kwargs)¶ Left join a LeapYear DataSet with a CSV that has no header row.
This function allows a user to perform a left join of a LeapYear DataSet with a CSV that has no header row. The user must specify an index (column number) for the CSV column that contains the join key and the column with values. The key column must have unique values.
Note
Currently, only a single column can be used as the key in the join.
- Parameters
ly_dataset (DataSet) – The LeapYear DataSet to join on.
csv_location (str) – The path to the CSV.
csv_key_column_index (int) – The index of the key column in the CSV (0-indexed).
csv_value_column_indices (List[int]) – A list of the indices of the value columns in the CSV (0-indexed).
ly_key_column_name (str) – The column name to join on.
output_value_column_names (Optional[List[str]]) – The desired column names for the value column in the output DataSet. Defaults to the index of the column, must be same length as
csv_value_column_indices
if provided.kwargs (Any) – The kwargs are passed to the pandas
read_csv
function.
- Returns
A LeapYear DataSet with a new column, named
output_value_column_name
, containing the result of joining thely_key_column_name
column to the column in positioncsv_key_column_index
.- Return type
Example
Suppose that we have a table with a
Sex
column containing the valuesmale
andfemale
:Sex
Age
male
22
female
38
female
26
female
35
male
35
We’d like to encode the abbreviations (coming from the first letter) as a new column, coming from a CSV (with no header) of the following form:
male
m
female
f
To do so, we call this function, assuming that the CSV is found at
csv_location
, the LeapYear DataSet is calledds
, and we wish to call the new columnsex_first_letter
:new_ds = left_join_csv_no_header(ly_dataset=ds, csv_location=csv_location, csv_key_column_index=0, csv_value_column_indices=[1], ly_key_column_name='Sex', output_value_column_names=['sex_first_letter'])
This will produce a DataSet which looks like:
Sex
Age
sex_first_letter
male
22
m
female
38
f
female
26
f
female
35
f
male
35
m