sner.scripts.ner package

Submodules

sner.scripts.ner.contextualfromnames module

sner.scripts.ner.contextualfromnames.main(corpus, rules, names, max_rules, iteration, options, display)

This is meant to generate conextual rules from a set of identified name tokens. It needs the corpus as well as the name set in order to assess the performance of any rules it finds from the names

Parameters:
  • corpus (set) – Set of all Token objects in the corpus.
  • rules (set) – Set of all Rule objects that have been found so far.
  • names (set) – Set of Token objects to derive new context rules from.
  • max_rules (int) – maximum number of rules to be accepted each iteration.
  • iteration (int) – what iteration is the algorithm currently on?
  • options (Options) – collection of configuration options
Returns:

Set of Rule objects.

Return type:

new_rules (set)

Raises:

None

sner.scripts.ner.namesfromrule module

Names from rule.

sner.scripts.ner.namesfromrule.main(corpus, rule)

Takes the overall corpus as input, as well as a single Rule object. Will return a Set of tokens satisfying the passed-in Rule, referred to in the code as ‘names’.

Parameters:
  • corpus (set) – Set of all Token objects in the corpus.
  • rule (Rule) – Rule object used to identify results.
Returns:

Set of Token objects that match the passed in Rule.

Return type:

names (set)

Raises:

TypeError

sner.scripts.ner.rulefilter module

sner.scripts.ner.rulefilter.main(ruleset, maxrules)

Determine which rules will be accepted by an iteration of the algorithm

Sorts rules into the top n rules, where n = maxrules Sorted by strength, ties broken by alphabetization

Parameters:
  • = set of all rules known (ruleset) –
  • = integer value of maximum number of rules we can accept (maxrules) –
Returns:

The set of rules that are to be accepted by the next iteration.

Raises:

None

sner.scripts.ner.rulesperformance module

sner.scripts.ner.rulesperformance.main(corpus, rules, options, iteration, display)

Finds all tokens that match a given rule, using that to rate the rules performance. Rates rule performance by totalling up the tokens that match the rule, and comparing that to how many of the tokens in that set are considered to be a PN. :param corpus = Set of all lines from the corpus.: :param rules = RuleSet object of all currently used rules.: :param options = values pulled from the configuration file.:

Returns:None
Raises:None
sner.scripts.ner.rulesperformance.rateRulePerformance(results, rule, alpha, k, accept_threshold)

sner.scripts.ner.spellingfromnames module

sner.scripts.ner.spellingfromnames.getKgrams(names, k)

Produces lists of grams, from monograms up to k-grams. Duplicate found in /scripts/readnames.py Example use to get monograms, birgams, and trigrams: getKgrams(getPNs(), 3) :param names: Occurrences } :type names: dict :param k:

Special values of k = -1 returning dictionary of monograms.
and k = -2 returning dictionary of monograms,
followed by a dictionary of bigrams.
Returns:A dictionary of all grams up to order k
Raises:None
sner.scripts.ner.spellingfromnames.gramsToRules(kgrams, allrules, iteration)
sner.scripts.ner.spellingfromnames.main(corpus, allrules, names, maxrules, iteration, options, display)

Produces lists of grams, from monograms up to k-grams. Duplicate found in /scripts/readnames.py Example use to get monograms, birgams, and trigrams: getKgrams(getPNs(), 3) :param corpus: :type corpus: set :param allrules: names (set) = set of tokens that will be used to generate new spelling rules :type allrules: set :param maxrules: :type maxrules: int :param iteration: :type iteration: int :param options: :type options: Options

Returns:Set of spelling rules generated from names
Raises:None

sner.scripts.ner.updatetokenstrength module

sner.scripts.ner.updatetokenstrength.main(tokens, rules)
This will update the strength of all tokens it is given, using
the rules it is given.
Parameters:
  • tokens (set) – Set object of Token objects.
  • rules (set) – Set object of Rule objects.
Returns:

None

Raises:

ValueError

Module contents