sner.models package

Submodules

sner.models.ner module

NER Model

sner.models.ner.assess_strength(rules, corpus, config)
Evaluates the accuracy of the strength rating of the passed-in rules. This is useful because the ner model will generate rules in an unsupervised fashion. This function gets used to evaluate the performance of that process.
Parameters:
  • rules (set) – A set of Rule objects to be evaluated
  • corpus (set) – A set of Token objects, representing the entire Garshana corpus.
Returns:

None

Raises:

None

sner.models.ner.get_new_names(corpus, names, rules)

Meant to use the provided ruleset to scan the corpus for new names. It will then return the names in quesiton, which will be used to generate more rules.

Basically, it grabs all tokens from the corpus matching the rules in question and then return them as a set. The names parameter lets you specify tokens that are already recognized as names, allowing you to retrieve only new name results.
Parameters:
  • corpus (set) – Set of Token objects representing the entire Garshana corpus.
  • names (set) – Set of Tokens already recognized as names.
  • rules (set) – Set of Rule objects used to find new names
Returns:

Set of Token objects

Return type:

new_names (set)

Raises:

None

sner.models.ner.import_corpus(corpus_path, display)
Imports the corpus for the ner model, from a format of our own design. The format is a CSV containing individual words in the corpus, with columns containing the Tablet ID, the line number within that tablet, the word number within that line, the word itself, and any annotation associated with that word.
Parameters:corpus_path (str) –

Path of the corpus file. display (Display): Utility object used to print the progress of scanning

the corpus file.
Returns:
Set of Token objects, properly initialized from the input
data.
Return type:corpus (set)
Raises:None
sner.models.ner.import_seed_rules(seed_rules_path, display)

This function will read the seed rule file, and return the contents as a set. The file is a CSV formatted with columns containing the rule type,

the contents of the rule, and its strength rating (the probability that a token is a name if the rule applies to said token).
Parameters:rulename (str) – Location of seed rules file.
Returns:
Set of Rule objects corresponding to the rules in the
seed rules file.
Return type:rules (set)
Raises:None
sner.models.ner.main(config)

Rules and names will be lists of RuleSets or TokenSets. These sets will represent the results of various iterations of the algorithm. So index 0 of rules would be the first rule set (seed rules) and 1 would be the first rules generated and used by the algorithm itself. Index zero of names would be the names that came from the seed rules. Index one the rules that came from rule set 1. And so on.

Parameters:config (dict) – Dictionary containing confiruation information, such as the location of the input files, as well as various flags and runtime parameters. (defined in sner.py)
Returns:None
Raises:None
sner.models.ner.print_precision_and_recall(selected_elements, relevant_elements, i, log)
Prints the precision, recall, and F1 score of the algorithm. Used by passing in tokens in the selected_elements parameter. These tokens are the tokens considered to be names. Relevant elements is just the total number of names in the corpus.
Parameters:
  • selected_elements (set) – Set of Token objects representing names as identified by the algorithm.
  • relevant_elements (int) – Total number of names that exist in the corpus.
  • i (int) – Index of current log entry. log (pandas.DataFrame): Data structure containing logs
Returns:

None

Raises:

None

sner.models.sklearn_launcher module

Launcher for scikit-learn models

sner.models.sklearn_launcher.dec_model(params)

Initiates model as decision tree classifier.

Parameters:params (dict) – Dictionary of hyperparameters.
Returns:Selected model.
Return type:model (sklearn.tree.DecisionTreeClassifer)
Raises:None
sner.models.sklearn_launcher.forest_model(params)

Initiates model as Extra Trees Classifier.

Parameters:params (dict) – Dictionary of hyperparameters.
Returns:Selected model.
Return type:model (sklearn.naive_bayes.MultinomialNB)
Raises:None
sner.models.sklearn_launcher.main(config)

Load and train a scikit-learn model as specified by the user in configs or command line arguments. Test the models performance after training and then print performance information (precision, accuracy, recall).

Parameters:config (dict) – Configuration flags and values.
Returns:None
Raises:None
sner.models.sklearn_launcher.nbc_model(params)

Initiates model as multinomial Naive Bayes classifier.

Parameters:params (dict) – Dictionary of hyperparameters.
Returns:Selected model.
Return type:model (sklearn.naive_bayes.MultinomialNB)
Raises:None
sner.models.sklearn_launcher.rdf_model(params)

Initiates model as random forest classifier.

Parameters:params (dict) – Dictionary of hyperparameters.
Returns:Selected model.
Return type:model (sklearn.ensemble.RandomForestClassifer)
Raises:None
sner.models.sklearn_launcher.runModel(config)
sner.models.sklearn_launcher.sgd_model(params)

Initiates model as linear classifier with stochastic gradient descent.

Parameters:params (dict) – Dictionary of hyperparameters.
Returns:Selected model.
Return type:model (sklearn.linear_model.SGDClassifer)
Raises:None
sner.models.sklearn_launcher.svc_model(params)

Initiates model as c-support vector classifier.

Parameters:params (dict) – Dictionary of hyperparameters.
Returns:Selected model.
Return type:model (sklearn.svm.SVC)
Raises:None

Module contents