Parsing Text File (Python) (1. Locate Table Based on Keywords, 2. Extract Table Info)
$15-25 USD / hour
Abgeschlossen
Veröffentlicht vor mehr als 7 Jahren
$15-25 USD / hour
I would like to obtain a program that extract a specific table data from text files. Most of the text content is in html, the remaining are not.
To achieve that, you need:
1) Locate the table that I want. The table I want is the "Security Ownership For certain beneficial owners". However, the name of the table can change. You will need to write the program to find ("ownership" and "security") or ("ownership" and "stock") to locate the table.
The key words ownership, security/stock/securities, beneficial/beneficiary sometimes do not appear in the same row.
2) Extract the table data to csv (preferably using python. You could manually do it as well. There will be about 2000 files if you do it manually)
I have attached 5 text files in the attachment as well as the output file. Please see the attachment. The output for the 5 text files are also pasted below:
1st Example [login to view URL] none
2nd Example [login to view URL] none
3rd Example Input Number of Shares of Shares which may
Common Stock be Acquired within Percent
Name and Address Beneficially Owned 60 Days(1) Owned(1),(2)
Genstar Capital LLC(3) 3,534,074 1,335,000 31.5 %
Jean-Pierre L. Conte(4) 3,473,407 1,311,000 31.1 %
Oxford BioScience Partners IV L.P.(5) 717,293 ? 7.3 %
Bio-Rad Laboratories,ےInc.(6) 665,639 ? 6.7 %
Gabelli Asset Management Inc.(7) 537,521 ? 5.4 %
Terrance J. Bieker 160,498 142,498 1.6 %
Kevin J. Reagan 116,832 111,331 1.2 %
John L. Zabriskie, Ph.D. 60,500 45,500 *
David J. Moffa, Ph.D.(8) 56,350 48,500 *
John R. Overturf,ےJr. 43,600 36,000 *
Alan I. Edrick 40,916 35,416 *
Robert J. Weltman(9) 27,333 24,000 *
All directors and executive officers as a group (eight persons)(10) 3,979,436 1,754,245 34.2 %
4th Example [login to view URL]
Name and Address of Beneficial Owner Number of Percentage
Shares of Class(1)
Larry S. Flax(2) 2082053 8.1 %
Richard L. Rosenfield(3) 2118017 8.3 %
Leslie E. Bider(4) 8852 0 %
Marshall S. Geller(5) 16152 0.1 %
Charles G. Phillips(6) 133378 0.5 %
Alan I. Rothenberg(7) 46520 0.2 %
Thomas P. Beck(8) 138750 0.6 %
Susan M. Collyns(9) 463203 1.9 %
Sarah A. Goldsmith-Grover(10) 160211 0.6 %
Steven E. Rich(11) 18794 0.1 %
BlackRock Inc.(12) 1723416 7 %
40 East 52nd Street
New York, NY 10022
Fisher Investments(13) 1249015 5.1 %
13100 Skyline Boulevard
Woodside, CA 94062-4527
The TCW Group, Inc.(14) 2144619 8.7 %
865 South Figueroa Street
Los Angeles, CA 90017
Thompson, Siegel & Walmsley, LLC(15) 1685519 6.9 %
6806 Paragon Place, Suite 300
Richmond, VA 23230
All directors and executive officers as a group (10 persons)(16) 5185930 18.9 %
Hi,
I have gone through the files. I am good at Data Entry and Excel. I can make it via Data Entry. I will copy paste the tables. Looking forward to work on this.
Experienced Python Expert FREELANCER HERE to work for your project.
Let's discuss more and finalize the project and cost. Feel free to ask me questions, if any. I look forward to work with you.
You can also contact me through Skype. Have a good day and stay fine :-)
Sincere regards,
Jubair
Hello
I'm interesting your project very well
I'm a Good Python, Scrap, Excel, Math, Algorithm expert.
I m quite well experienced in these jobs.
Let's go ahead with me
I want to service for you continously.
Thanks
Hi I have a team of 8 members, expert in web scraping & excel work. I understand the requirements of your project and I can assure you of completion with desired quality of work. I have good skills and experience in ♦ web scraping, ♦ find contact information, ♦ phone , e-mail searching through “GOOGLE OR SOCIAL MEDIA OR GIVEN URL” . I can do this project for you quickly and successfully . I'll work for the lowest price because I want to build a reputation on freelancer.com . Please, give me a chance to show my quality and help me to build a good reputation for my feature jobs. I am a new freelancer but I have long time experience with Microsoft Office (Word, Excel),Data mining , web search etc .
Hey there... I had a look at your examples and the corresponding output tables...I can do this in Java or C# (not Python!)... Lets agree to a fixed price instead of hourly ? $120 in 3 days...DEAL ?....Please reply.. We can discuss further and hopefully get it started soon... Thank you.. !
Hi,
I am a Python developer with proven and extensive experience writing Python scripts used to parse HTML markup with demonstrated quick turnaround.
This is normally done as part of web scraping projects using Beautiful Soup library (Python.)
My APPROACH
I can write Python code that can:
1. Locate relevant table based on given keywords:
-- Case 1 (HTML files - 3 & 4): Use paragraphs (elements with tag <p>) to search for keywords
(so that search is done on text instead of table rows)
Then, locate next sibling table
-- Case 2 (Text files - 5): Use elements with tag <PAGE> to search for keywords
Then, locate element with tag <TABLE> inside.
2. Extract Table Info:
-- Case 1 (HTML file): Regardless of table format or number of columns, there is actually a consistent structure inside each <tr> element (table row) for both columns names, and data rows (ie. same number of <td> elements). Script will exclude non-breaking space (" ") character.
-- Case 2 (Text file): Read data rows line by line.
3. Generate Excel sheet with relevant rows as output.
Deliverable is a Python script that can be run on schedule or on demand.
Hours of work: 8 Hr
Project Duration: Max. 3 days
Total Cost: 190 USD
Look forward to hearing from you.
Kind regards,
Yordan B
**Fast & Efficient Delivery**
Greetings!
Hi, I'm computer science graduate with more than 2 years of experience in Application development, I've read all details and also files that you attached here (input & output). I will do this task by first extracting data from files and parse for the required table on search entire html file for each account entry and remove duplication if found, after doing this i will write back that data to the xls.
I will do this task in C# Language, that Application Interface can be Desktop or Console Application.
Note! I've already worked as parsing document file parsing so it will be easy task for me
My Job will speak for itself.
Looking forward for consideration
My name is Mike and I’m from UK. I work with individual clients and also provide outsourcing services for a number of UK and USA based agencies. Your project description sounds interesting to me and I do have skills & experience that are required to complete this project. I can show you some examples of my work. Please contact me to discuss your project.
This looks like a fun project.
All tables seem to be within some html code. The files do contain some extra text. The extra text seems to not be relevant. The plan would be:
1. Strip extra text using regex
2. Convert html to tree using lxml
3. Use xpath to locate tables
4. Extract table information using xpath
5. Use csv module to write to csv file (one for each processed file)
6. Merge all files into one (if necessary)
7. Convert final file to xlsx (if necessary)
Milestone 1: Result for first 100 files
Milestone 2: Result for all files
All files to be provided.
Best,
Tammo
Hello,
I have read your description very carefully. I am very good at parsing and python scripting.
I can deliver the result as per your requirement.
Price for whole project (both task included) : 200 USD
Lets discuss more over chat.
Looking forward to work with you.
-Viral Parekh
You pay me after checking the work.
Hi
I have read out all the details given in your project and I am fully capable to deliver you this project with 100% accuracy. I have completed many projects related to this in the past. Why you do not knock me here for further detail? You can release the milestone after checking the work.
Hello,
I'm a recent graduate about to begin a program working in data science. For the past year I have been working extensively in my Python, performing a lot of research analysis. This required me to effectively learn to parse through text files and extract the information I need both quickly and cleanly. Using these skill, combined with some regex, and my familiarity with html, I could finish easily do this job.
I look forward to hearing from you,
Charlie
I have almost 5 years of experience writing Engineering tools in python.
During this time I had to parse many files, so I am well acquainted with your problem.
Thanks
Please see summary about myself. But I am very easy to work with and am detail-oriented. I don't need a lot of guidance, mostly just an outline of what needs to be done. At my last job, which I quit after getting a new job, all I did was write scripts in Python. I did similar things to get accomplish your purpose on this project. For my hourly wage, I just put down what I was getting paid previously for doing this kind of work, and I am more experienced now than before.
I have been working as a Quantitative Researcher in finance industry for 7 years and have done lots of projects like this. For example, I worked on an ETF strategy before and had to scrape those ETF websites, i.e. parse the html file, locate the data table, extract the data and finally store the data in our database.
I'm a Python expert and very skillful with libs like pandas, requests, beatifulsoup and so on. I will deliver in a very effective and efficient fashion.