We need a program developed capable of the following:
1) Take as input an? English-language HTML or plaintext email.?
2) Detect the email signature in the email
3) Parse the email signature for all available data, including:
- Name
- Position
- Company
- Email address
- Phone Numbers (cell, office, fax)
- Personal/Company website address
- Skype, AIM, Google Chat, Twitter or other Social contact methods
4) Output a VCard with all available fields filled in and with entire email signature as Note section of the VCard
## Deliverables
The only input into the program will be an Plaintext and/orHTML email, with complete header and body information.
?
The email will have been forwarded from another source, sowill contain a New section at the top and a Forwarded section below [login to view URL] states that email programs add some sort of separator, which differsfrom program to program
?
Examples:
1. Google Mail: "---------- Forwarded message ----------"
2. Outlook:
1. "? _____ " in plaintext
2. "<hr size=3D2 width=3D"100%" align=3Dcenter tabindex=3D-1>" in HTML format (content of <HR> tag may differ, but <HR> tag generally separates New from Forwarded part
3. Yahoo:
1. "--- On Sat, 6/12/10, John Nyaradi <john@[login to view URL]> wrote:" in plaintext
2. "--- On <b>Sat, 6/12/10, John Nyaradi <i><john@[login to view URL]></i></b> wrote:" in HTML
?
The program will attempt to identify a single signatureblock in the Forwarded section of the email, below the Forward separator. Inthe case of multiple signature blocks, the signature block closest to the beginningof the email will be the one used.
?
A signature block traditionally consists of some/all of thefollowing information:
* Full Name
* Position
* Company
* Address
* Phone numbers
* * Office/Work
* Direct
* Cell/Mobile
* Fax
* Email Address
* Personal/Company website address
* Other contact information:
* * Facebook Profile Link
* Twitter URL or Name
* Google Chat Name
* Skype Name
?
Some other methods to consider in order to identify thesignature block:
* Signature blocks are usually separated from the rest of the document by one or more line breaks and possibly some sort of horizontal spacer (-, _, =, *, etc).
* Signature blocks also may immediately follow a Complimentary Close - "Thanks," "Best Regards" "Sincerely" "Best" etc
<!-- -->
* Signature blocks usually have more than one of the previously mentioned elements in close proximity, with a few (Name, Phone, Email) almost always standard.
* The name and/or email in the signature block may match the From/To/Subject/Sent information in the begining of the forwarded part of the email. Once again, the format in which this is written changes according to email program, but generally contains 4 elements - From, Sent, To and Subject
* * From: Lena Dander [mailto:lena@[login to view URL]]
* Sent: Saturday, June 12, 2010 11:59
* To: Mike Roesh
* Subject: my address
* Signature blocks may have uncommon separators in the block - ie, if two phone numbers are on one line, they may be separated by "|" or "-"
?
Once the signature block is identified, the informationshould be parsed and output in XML format with all of the available data thathas been identified. We can discuss the actual format of the XML elementslater, but they will be mostly based on the elements of the signature block discussedearlier.
?
If no signature block is found, the program will attempt toidentify the Full Name and Email of the sender in the forwarded email via theFrom/To/Sent/Subject lines in the body of the email. The program should outputan XML with those details.
?
If neither tasks can be accomplished, the program shouldreturn an event stating so.
?
Given that intelligent parsing is never 100% accurate, wewill work with the chosen developer to set success targets for correctlyidentifying, parsing and outputting the signature block information from alarge sample of emails (several hundred or thousand emails).