sner.scripts.ner package¶

Submodules¶

sner.scripts.ner.contextualfromnames module¶

sner.scripts.ner.contextualfromnames.main(corpus, rules, names, max_rules, iteration, options, display)¶

This is meant to generate conextual rules from a set of identified name tokens. It needs the corpus as well as the name set in order to assess the performance of any rules it finds from the names

Parameters:	corpus (set) – Set of all Token objects in the corpus. rules (set) – Set of all Rule objects that have been found so far. names (set) – Set of Token objects to derive new context rules from. max_rules (int) – maximum number of rules to be accepted each iteration. iteration (int) – what iteration is the algorithm currently on? options (Options) – collection of configuration options
Returns:	Set of Rule objects.
Return type:	new_rules (set)
Raises:	`None`

sner.scripts.ner.namesfromrule module¶

Names from rule.

sner.scripts.ner.namesfromrule.main(corpus, rule)¶

Takes the overall corpus as input, as well as a single Rule object. Will return a Set of tokens satisfying the passed-in Rule, referred to in the code as ‘names’.

Parameters:	corpus (set) – Set of all Token objects in the corpus. rule (Rule) – Rule object used to identify results.
Returns:	Set of Token objects that match the passed in Rule.
Return type:	names (set)
Raises:	`TypeError`

sner.scripts.ner.rulefilter module¶

sner.scripts.ner.rulefilter.main(ruleset, maxrules)¶

Determine which rules will be accepted by an iteration of the algorithm

Sorts rules into the top n rules, where n = maxrules Sorted by strength, ties broken by alphabetization

Parameters:	= set of all rules known (ruleset) – = integer value of maximum number of rules we can accept (maxrules) –
Returns:	The set of rules that are to be accepted by the next iteration.
Raises:	`None`

sner.scripts.ner.rulesperformance module¶

sner.scripts.ner.rulesperformance.main(corpus, rules, options, iteration, display)¶

Finds all tokens that match a given rule, using that to rate the rules performance. Rates rule performance by totalling up the tokens that match the rule, and comparing that to how many of the tokens in that set are considered to be a PN. :param corpus = Set of all lines from the corpus.: :param rules = RuleSet object of all currently used rules.: :param options = values pulled from the configuration file.:

Returns:	None
Raises:	`None`

sner.scripts.ner.rulesperformance.rateRulePerformance(results, rule, alpha, k, accept_threshold)¶

sner.scripts.ner.spellingfromnames module¶

sner.scripts.ner.spellingfromnames.getKgrams(names, k)¶

Produces lists of grams, from monograms up to k-grams. Duplicate found in /scripts/readnames.py Example use to get monograms, birgams, and trigrams: getKgrams(getPNs(), 3) :param names: Occurrences } :type names: dict :param k:

Special values of k = -1 returning dictionary of monograms.

and k = -2 returning dictionary of monograms,

followed by a dictionary of bigrams.

Returns:	A dictionary of all grams up to order k
Raises:	`None`

sner.scripts.ner.spellingfromnames.gramsToRules(kgrams, allrules, iteration)¶

sner.scripts.ner.spellingfromnames.main(corpus, allrules, names, maxrules, iteration, options, display)¶

Produces lists of grams, from monograms up to k-grams. Duplicate found in /scripts/readnames.py Example use to get monograms, birgams, and trigrams: getKgrams(getPNs(), 3) :param corpus: :type corpus: set :param allrules: names (set) = set of tokens that will be used to generate new spelling rules :type allrules: set :param maxrules: :type maxrules: int :param iteration: :type iteration: int :param options: :type options: Options

Returns:	Set of spelling rules generated from names
Raises:	`None`

sner.scripts.ner.updatetokenstrength module¶

sner.scripts.ner.updatetokenstrength.main(tokens, rules)¶

This will update the strength of all tokens it is given, using: the rules it is given.

Parameters:	tokens (set) – Set object of Token objects. rules (set) – Set object of Rule objects.
Returns:	None
Raises:	`ValueError`

sner.scripts.ner package¶

Submodules¶

sner.scripts.ner.contextualfromnames module¶

sner.scripts.ner.namesfromrule module¶

sner.scripts.ner.rulefilter module¶

sner.scripts.ner.rulesperformance module¶

sner.scripts.ner.spellingfromnames module¶

sner.scripts.ner.updatetokenstrength module¶

Module contents¶

SNER

Navigation

Related Topics