API documentation¶

Astraea.FLICKERinstall()[source]¶: Installs the FLICKER software to calcualte Flicker values from light curves. Documentation: https://flicker.readthedocs.io.

Astraea.getFlicker(t, sig)[source]¶

Calculates the Flicker value

Parameters:	t – Time [days] sig – Flux
Returns:	Flicker ([float]): Flicker value

Astraea.getKeplerProt(X_pred)[source]¶

Predict rotation period from trained models.

This function predicts rotation periods for stars in the Kepler field. The models are trained on rotation periods from McQuillian et all. (2014), Santos et all. (2019) and Garcia et all. (2014). If the models are not already downloaded, this tool will download the model which might take a couple of minues. It first passes the stars through a classifier, which identifies what stars have measureable rotation periods. Then it uses two regressor models (one with 1 estimator and another one with 100 estimators) to predict rotation periods. If column “Prot” exist, it will also output the true periods associated with the predicted periods. The light curve feature “flicker” can be calculated using software FLICKER.

Parameters: X_pred ([Pandas DataFrame]) – DataFrame contains all variables needed, run Astraea.getTrainF() to print out requirements

Returns:

Containing:

Prot_prediction_1est:
TrueProt:	True rotation period (if avaliable)
	Period predictions with 1 estimator
Prot_prediction_100est:
	Period prediction with 100 estimators

Return type: <pandas.DataFrame> or <pandas.Series>

Astraea.getRvar(Flux)[source]¶

Calculates light curve Rvar

Parameters:	Flux – The light curve flux in ppm
Returns:	The variability of the light curve
Return type:	Rvar ([float])

Astraea.getLGpeak(t, sig, sig_err)[source]¶

Calculates the location of the highest peak and the maximum power of the hightest peak

Parameters:	t – Time [days] sig – Flux sig_err – Flux error
Returns:	LG_Prot ([float]): The period calculated from Lomb-Scargle :LG_peaks ([float]): The maximum peak height

Astraea.getTrainF()[source]¶: Print out needed featuers for the model in order.

Astraea.getVs(df)[source]¶

Calculates tangential velocity (v_tan) and vertical velocity proximation (v_b).

Parameters:	df ([Pandas DataFrame]) – DataFrame contains columns ‘parallax’, ‘pmra’, ‘pmdec’, ‘ra’, ‘dec’, which are parallax, ra proper motion, dec propermotion, right ascension and declination, respectively
Returns:	v_t ([array-like]): Tangential velocity :v_b ([array-like]): Proxy for vertical velocity

Astraea.RFclassifier(df, testF, modelout=False, traind=0.8, ID_on='KID', X_train_ind=[], X_test_ind=[], target_var='Prot_flag', n_estimators=100, criterion='gini', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='auto', max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, bootstrap=True, oob_score=False, n_jobs=None, random_state=None, verbose=0, warm_start=False, class_weight=None)[source]¶

Train RF classifier model and predict values for cross-validation dataset.

It uses scikit-learn Random Forest classifier model. All default hyper-parameters are taken from the scikit-learn model that user can change by adding in optional inputs. More details on hyper-parameters, see https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html. To use the module to train a RF model to predict rotation period, input a pandas dataFrame with column names as well as a list of attribute names.

Parameters:

df ([Pandas DataFrame]) – DataFrame contains all variables needed
testF ([string list]) – List of feature names used to train
modelout (Optional [bool]) – Whether to only output the trained model
traind (Optinal [float]) – Fraction of data use to train, the rest will be used to perform cross-validation test (default 0.8)
ID_on (Optional [string]) – What is the star identifier column name (default ‘KID’). If specified ID column does not exist, it will just take the index as ID
X_train_ind (Optional [list]) – List of ID_on for training set, if not specified, take random traind fraction of indexes from ID_on column
X_test_ind (Optional [list]) – List of ID_on for testing set, if not specified, take the remaining (1-traind) fraction of indexes from ID_on column that is not in the training set (X_train_ind)
target_var (Optional [string]) – Label column name (default ‘Prot_flag’)

Returns:

regr:

Sklearn RF classifier model (attributes see https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html)

<pandas.Series> containing:

actrualF ([string list]):
	Actrual features used
importance ([float list]):
	Impurity-based feature importance ordering as actrualF
ID_train ([list]):
	List of ID_on used for training set
ID_test ([list]):
	List of ID_on used for testing set
predictp ([float list]):
	List of prediction on testing set
X_test ([matrix]):
	Matrix used to predict label values for testing set
y_test ([array-like]):
	Array of true label values of testing set
X_train ([matrix]):
	Matrix used to predict label values for training set
y_train ([array-like]):
	Array of true label values of training set

Return type:

Astraea.RFregressor(df, testF, modelout=False, traind=0.8, ID_on='KID', X_train_ind=[], X_test_ind=[], target_var='Prot', target_var_err='Prot_err', chisq_out=False, MREout=False, n_estimators=100, criterion='mse', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features='auto', max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, bootstrap=True, oob_score=False, n_jobs=1, random_state=None, verbose=0, warm_start=False)[source]¶

Train RF regression model and perform cross-validation test.

It uses scikit-learn Random Forest regressor model. All default hyper-parameters are taken from the scikit-learn model that user can change by adding in optional inputs. More details on hyper-parameters, see https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html. To use the module to train a RF model to predict rotation period, input a pandas dataFrame with column names as well as a list of attribute names.

Parameters:

df ([Pandas DataFrame]) – DataFrame contains all variables needed
testF ([string list]) – List of feature names used to train
modelout (Optional [bool]) – Whether to only output the trained model
traind (Optinal [float]) – Fraction of data use to train, the rest will be used to perform cross-validation test (default 0.8)
ID_on (Optional [string]) – What is the star identifier column name (default ‘KID’). If specified ID column does not exist, it will just take the index as ID
X_train_ind (Optional [list]) – List of ID_on for training set, if not specified, take random traind fraction of indexes from ID_on column
X_test_ind (Optional [list]) – List of ID_on for testing set, if not specified, take the remaining (1-traind) fraction of indexes from ID_on column that is not in the training set (X_train_ind)
target_var (Optional [string]) – Label column name (default ‘Prot’)
target_var_err (Optional [string]) – Label error column name (default ‘Prot_err’)
chisq_out (optional [bool]) – If true, only output average chisq value
MREout (optional [bool]) – If true, only output median relative error. If both chisq_out and MREout are true, then output only these two values

Returns:

regr:

Sklearn RF regressor model (attributes see https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html)

<pandas.Series> containing:

actrualF ([string list]):
	Actrual features used
importance ([float list]):
	Impurity-based feature importance ordering as actrualF
ID_train ([list]):
	List of ID_on used for training set
ID_test ([list]):
	List of ID_on used for testing set
predictp ([float list]):
	List of prediction on testing set
ave_chi ([float]):
	Average chisq on cross-validation (testing) set
MRE_val ([float]):
	Median relative error on cross-validation (testing) set
X_test ([matrix]):
	Matrix used to predict label values for testing set
y_test ([array-like]):
	Array of true label values of testing set
X_train ([matrix]):
	Matrix used to predict label values for training set
y_train ([array-like]):
	Array of true label values of training set

Return type:

Astraea.load_RF()[source]¶

Load random forest classifier and regressors from zendo.org.

Two regressors will be laoded, one with 1 estimator and one with 100 estimators. 1 estimator minimizes bias (systematic offset) with the cost of high varience (scattering) and 100 estimators minimizes varience and maximizes bias. If user wants to predict rotation period from Kepler light curves then it is best to use model with 100 estimiators and user should use model with 1 estimator otherwise. Model trained on TESS light curves are still being developed.

Astraea.plot_corr(df, y_vars, x_var='Prot', logplotarg=[], logarg=[], MS=1)[source]¶

Plot correlations on one variable vs other variables specified by user

Parameters:	df – DataFrame contains all variables needed
Returns:	plots for feature correlations
Return type:	<matplotlib.plot>

Astraea.plot_result(actrualF, importance, prediction, y_test, y_test_err=[], topn=20, MS=3, labelName='Period')[source]¶

Plot impurity-based feature importance as well as predicted values vs true values for a random forest model

Parameters:	actrualF ([array-like]) – Feature used (from function output of RFregressor()) importance ([array-like]) – importance of the model (from function output of RFregressor()) prediction ([array-like]) – Predicted values (from function output of RFregressor()) y_test ([array-like]) – true values (from function output of RFregressor()) y_test_err (Optional [array-like]) – Errors for true values (from function output of RFregressor()) topn (Optional [int]) – How many most important features to plot MS (Optional [int]) – Markersize for plotting true vs predicted values labelName (Optional [string]) – Label name
Returns:	importance plot as well as true vs prediction plot
Return type:	<matplotlib.plot>