Find Jobs
Hire Freelancers

Make Word List with pages numbers(an Index) from PDF

$30-5000 USD

Abgeschlossen
Veröffentlicht vor fast 14 Jahren

$30-5000 USD

Bezahlt bei Lieferung
I want a program that will take a PDF file, go through it, make a list of words and identify which pages they are on. This is like an index but for every word. Next, it should have a list of words to skip. I will provide this list. For example, I would not want the word "I" or "and" indexed so I would put those in the list of words to skip. It should also combine continous page sequences, for example if a word appears on page 12, 13, 14, 15, it would not list those individually but would list 12-15 It should pay attention to capitalization so James and james would be treated as two different words. The output file must be in alphabetical order. I will proved an example. Note that these page numbers should be the page numberss which may differ from the PDF numbers. For example, a PDF starts numbering page 1 at the first page, but a PDF document may have a title page which is not numbered and a blank page, then start numbering 1 on the PDF page 3. I believe the best way to handle this is to just ask the user to enter an offset number. Then the program should try to determine the page number by looking for a page number(which might be at the top or bottom of the page or in the header and isolated and sequential. So if you find a 6 on one page and a 7 at the top of the next, you know that is the page number, but if you cannot look at the top or bottom of the page and find an isolated number, then use the offset and the PDF page number. So, if the user enters -3 as the offset, then the word JAMES appears on? the seventh page? of the pdf, but? that page is actually numbered as 4 (7-3=4) so the final text listing would be JAMES - 4 because the author of the document did not number the first 3 pages. If you can handle pages numbered in Roman numerals(i, iv, vii..) that is good but not required, standard numbers are good enough. The program is for Windows XP/Vista/7 Final deliverables must be a fully working program that includes an installer. I will provide a license agreement to display for the installer. Final deliverables? must also ? include the source, the installer, and a second version of the installer which is a shareware version that only processes 10 pages(only the first 10 pages of a PDF) The program will be named? Elite Concordance and Index Creator Thanks ## Deliverables EXAMPLE The output can be a text file which might look like this: alpha - 1, 17, 204 James - 12-18, 112 james - 12-13, 119 Yesterday - 26, 110 In the above example it says the word "alpha" appears on page 1, 17, and 204. The word James with a capital J appears on each page 12, 13, 14, 15, 16, 17, 18 and then again on page 112.
Projekt-ID: 3618899

Über das Projekt

2 Vorschläge
Remote Projekt
Aktiv vor 14 Jahren

Möchten Sie etwas Geld verdienen?

Vorteile einer Ausschreibung auf Freelancer

Legen Sie Ihr Budget und Ihren Zeitrahmen fest
Für Ihre Arbeit bezahlt werden
Skizzieren Sie Ihren Vorschlag
Sie können sich kostenlos anmelden und auf Aufträge bieten
Vergeben an:
Avatar des Nutzers
See private message.
$42,50 USD in 14 Tagen
4,9 (89 Bewertungen)
5,9
5,9
2 Freelancer bieten im Durchschnitt $106 USD für diesen Auftrag
Avatar des Nutzers
See private message.
$170 USD in 14 Tagen
4,5 (41 Bewertungen)
4,6
4,6

Über den Kunden

Flagge von UNITED STATES
Camarillo, United States
5,0
167
Zahlungsmethode verifiziert
Mitglied seit Okt. 29, 2008

Kundenüberprüfung

Danke! Wir haben Ihnen per E-Mail einen Link geschickt, über den Sie Ihr kostenloses Guthaben anfordern können.
Beim Senden Ihrer E-Mail ist ein Fehler aufgetreten. Bitte versuchen Sie es erneut.
Registrierte Benutzer Veröffentlichte Jobs
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Vorschau wird geladen
Erlaubnis zur Geolokalisierung erteilt.
Ihre Anmeldesitzung ist abgelaufen und Sie wurden abgemeldet. Bitte melden Sie sich erneut an.