Skip to content

Use company suffix abbreviations to label company entities #19

@jokull

Description

@jokull

I’m writing some code to detect mentions of companies. The corpus uses the ehf/of/sf/etc. suffixes so that’s a strong indicator for me, and potentially for Greynir too.

I know that the Greynir website has an entity recognizer, but it seems quite strongly coupled to the database. Is there a case for bintokenizer to adapt a new token type? Or perhaps for Greynir to become company-entity aware?

I have some interesting examples of company names if that’s useful. I’m currently using an imperfect regex to match company names and then using Greynir to go back to the indefinite form.

  • Miðbæjarhótel/Centerhotels ehf.
  • Reitir - hótel ehf.
  • 105 Miðborg slhf.
  • Faxaflóahafnir sf.
  • Bjarg íbúðafélag hses.
  • Efstaleitis Apótek ehf.
  • Íþrótta- og sýningahöllin hf.
  • V-16 ehf.

These are the suffixes I’ve come across:

  • ehf.
  • slhf.
  • sf.
  • hses.
  • hf.
  • ohf.
  • bs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions