ePADD fact sheet - C+J 2016.pdf

Viewer
Transcript

epadd.stanford.edu Project website: library.stanford.edu/projects/ePADD Download: github.com/ePADD/epadd/releases Community forums: epadd.nimeyo.com Twitter: e_padd

ePADD is software that enables the analysis of email using named entity recognition and other natural language processing algorythms. It was created to aid in the appraisal, processing, discovery, and delivery of historical email, and can also be used by journalists and others seeking to analyze and interrogate an email corpus.

Fine-grained Named Entity Recognition: ePADD uses a custom named entity recognizer that recognizes categories of entities bootstrapped from DBPedia. These include persons, organizations, locations, government entities, political parties, companies, universities, diseases, and awards! ePADD learns from these categories and is also able to recognize likely entities it has not come accross before.

Bulk Actions and Annotation/Tagging: ePADD allows the user to apply actions (mark for future analysis/restriction) and annotations to any sets of messages meeting certain critera, including all messages associated with a given correspondent, all messages from a given date range, all messages containing certain keywords or named entities in the subject or text fields, or some combination of the above.

Name Resolution: ePADD resolves names and email addresses associated with one correspondent, improving browsing and visualization by cutting down on noise. All decision can be manually overridden using the Edit Correspondents interface. Mailing lists can similarly be flagged and optionally consolidated using this functionality.

Regular Expression Search: ePADD includes a customizable regular expression search, enabling the user to quickly search a collection for sensitive information such as social security numbers or credit card numbers, or any other expressions.

Lexicons: ePADD includes tiered thematic keyword searches out of the box, geared towards broad analysis of a variety of email collections, including email associated with political or literary figures. These lexicons can be edited and tuned, or the user can create all new lexicons to suit their research goals.

FAQ What formats does ePADD ingest? Email is ingested in MBOX format or through an IMAP connection. How much does it cost? Nothing! ePADD is completely free.

Can I modify the program for my own needs? Yes! ePADD is open source and licensed under an Apache Public License, v2.0.

Additional functionality: ePADD’s additional functionality includes account and folder-level browsing, as well as built in visualization tools, including a scrollable wall of image attachments.

About the Project ePADD development is managed by Stanford University’s Department of Special Collections & University Archives, part of Stanford University Libraries, in collaboration with partners at Harvard University, the Metropolitan New York Library Council (METRO), University of Illinois at Urbana-Champaign, and University of California, Irvine.

Where can I get it? github. com/ePADD/epadd/releases

What are the software requirements to run ePADD? OS: 64-bit, Windows 7 SP1 / 10, Mac OS X 10.10 / 10.11 Memory: 8 GB RAM (4 GB allocated by default) Browser: Chrome 50/51, Firefox 47/48 Windows installations: Java Runtime Environment 64-bit, 8u101 Any other questions about the software? Contact the ePADD development team at [email protected].

Funding for current ePADD development is provided through an Institute of Museum & Library Studies (IMLS) National Leadership Grant (NLG) for Libraries, which supports projects that address challenges faced by the library and archive fields and that have the potential to advance practice in those fields. Development for the initial 2015 release of ePADD was primarily funded by the National Historical Publications and Records Commission (NHPRC).