sner.models package¶
Submodules¶
sner.models.ner module¶
NER Model
-
sner.models.ner.
assess_strength
(rules, corpus, config)¶ - Evaluates the accuracy of the strength rating of the passed-in rules. This is useful because the ner model will generate rules in an unsupervised fashion. This function gets used to evaluate the performance of that process.
Parameters: - rules (set) – A set of Rule objects to be evaluated
- corpus (set) – A set of Token objects, representing the entire Garshana corpus.
Returns: None
Raises: None
-
sner.models.ner.
get_new_names
(corpus, names, rules)¶ Meant to use the provided ruleset to scan the corpus for new names. It will then return the names in quesiton, which will be used to generate more rules.
Basically, it grabs all tokens from the corpus matching the rules in question and then return them as a set. The names parameter lets you specify tokens that are already recognized as names, allowing you to retrieve only new name results.Parameters: - corpus (set) – Set of Token objects representing the entire Garshana corpus.
- names (set) – Set of Tokens already recognized as names.
- rules (set) – Set of Rule objects used to find new names
Returns: Set of Token objects
Return type: new_names (set)
Raises: None
-
sner.models.ner.
import_corpus
(corpus_path, display)¶ - Imports the corpus for the ner model, from a format of our own design. The format is a CSV containing individual words in the corpus, with columns containing the Tablet ID, the line number within that tablet, the word number within that line, the word itself, and any annotation associated with that word.
Parameters: corpus_path (str) – Path of the corpus file. display (Display): Utility object used to print the progress of scanning
the corpus file.Returns: - Set of Token objects, properly initialized from the input
- data.
Return type: corpus (set) Raises: None
-
sner.models.ner.
import_seed_rules
(seed_rules_path, display)¶ This function will read the seed rule file, and return the contents as a set. The file is a CSV formatted with columns containing the rule type,
the contents of the rule, and its strength rating (the probability that a token is a name if the rule applies to said token).Parameters: rulename (str) – Location of seed rules file. Returns: - Set of Rule objects corresponding to the rules in the
- seed rules file.
Return type: rules (set) Raises: None
-
sner.models.ner.
main
(config)¶ Rules and names will be lists of RuleSets or TokenSets. These sets will represent the results of various iterations of the algorithm. So index 0 of rules would be the first rule set (seed rules) and 1 would be the first rules generated and used by the algorithm itself. Index zero of names would be the names that came from the seed rules. Index one the rules that came from rule set 1. And so on.
Parameters: config (dict) – Dictionary containing confiruation information, such as the location of the input files, as well as various flags and runtime parameters. (defined in sner.py) Returns: None Raises: None
-
sner.models.ner.
print_precision_and_recall
(selected_elements, relevant_elements, i, log)¶ - Prints the precision, recall, and F1 score of the algorithm. Used by passing in tokens in the selected_elements parameter. These tokens are the tokens considered to be names. Relevant elements is just the total number of names in the corpus.
Parameters: - selected_elements (set) – Set of Token objects representing names as identified by the algorithm.
- relevant_elements (int) – Total number of names that exist in the corpus.
- i (int) – Index of current log entry. log (pandas.DataFrame): Data structure containing logs
Returns: None
Raises: None
sner.models.sklearn_launcher module¶
Launcher for scikit-learn models
-
sner.models.sklearn_launcher.
dec_model
(params)¶ Initiates model as decision tree classifier.
Parameters: params (dict) – Dictionary of hyperparameters. Returns: Selected model. Return type: model (sklearn.tree.DecisionTreeClassifer) Raises: None
-
sner.models.sklearn_launcher.
forest_model
(params)¶ Initiates model as Extra Trees Classifier.
Parameters: params (dict) – Dictionary of hyperparameters. Returns: Selected model. Return type: model (sklearn.naive_bayes.MultinomialNB) Raises: None
-
sner.models.sklearn_launcher.
main
(config)¶ Load and train a scikit-learn model as specified by the user in configs or command line arguments. Test the models performance after training and then print performance information (precision, accuracy, recall).
Parameters: config (dict) – Configuration flags and values. Returns: None Raises: None
-
sner.models.sklearn_launcher.
nbc_model
(params)¶ Initiates model as multinomial Naive Bayes classifier.
Parameters: params (dict) – Dictionary of hyperparameters. Returns: Selected model. Return type: model (sklearn.naive_bayes.MultinomialNB) Raises: None
-
sner.models.sklearn_launcher.
rdf_model
(params)¶ Initiates model as random forest classifier.
Parameters: params (dict) – Dictionary of hyperparameters. Returns: Selected model. Return type: model (sklearn.ensemble.RandomForestClassifer) Raises: None
-
sner.models.sklearn_launcher.
runModel
(config)¶
-
sner.models.sklearn_launcher.
sgd_model
(params)¶ Initiates model as linear classifier with stochastic gradient descent.
Parameters: params (dict) – Dictionary of hyperparameters. Returns: Selected model. Return type: model (sklearn.linear_model.SGDClassifer) Raises: None
-
sner.models.sklearn_launcher.
svc_model
(params)¶ Initiates model as c-support vector classifier.
Parameters: params (dict) – Dictionary of hyperparameters. Returns: Selected model. Return type: model (sklearn.svm.SVC) Raises: None