Build data assessment model using Python

Geschlossen Veröffentlicht May 2, 2015 Bezahlt bei Lieferung
Geschlossen Bezahlt bei Lieferung

I want to create a model in R or Python, whereby I am able to assess the quality of the variables (columns) in a dataset.

As a start I want to see whether the data is complete, valid and unique. Per column I want to see the number of missing values, valid values and the number of unique values. Completeness is simply the number of missing values (such as blank spaces, N.A., etc.) per column. Validity shows how many values fall within the specified range, so including an option to set a range or allowed values per column. Uniqueness should indicate the amount of unique and duplicate values per column, including an option to see which values are duplicate and how many times they reoccur.

The important thing is that I want to choose myself which columns should be assessed, instead of the entire dataset. Both completeness and uniqueness should show the percentage. So for example, column X is 95% complete, 80% valid and contains 70% unique values. Would be nice to have a dashboard that shows it.

I have some knowledge of R, so if you can do it in R that would be great. However I realize that Python is probably the most suitable program to work with. I was planning to do this myself by using pandas, so I prefer this tool. But if you argue that a different Python tool is better, it's fine with me as well.

Datensuche Python

Projekt-ID: #7596903

Über das Projekt

8 Vorschläge Remote Projekt Aktiv Jun 10, 2015

8 Freelancer bieten im Durchschnitt €126 für diesen Job

kodneviscom

Hi, I can write an application in python in which you can select the columns and the completeness and uniqueness will be calculated and written in a excel file.

€155 EUR in 3 Tagen
(10 Bewertungen)
5.9
hectorenavarrou

I have been using python since years ago, and I do this kind of data processing all the time. I can say, without any doubts, that I can finish this job. I would like to know how the datasets are given? XLS? CSV? text f Mehr

€88 EUR in 5 Tagen
(10 Bewertungen)
4.3
apurvsinghal

Hi, I have been working with R and Python for quite some time now. From the problem description I understand that you will need the functions in an API sort of format where in you can pass the required details like c Mehr

€138 EUR in 1 Tag
(0 Bewertungen)
0.0
deedwood

Hi! Check my profile and Contact Me.

€222 EUR in 5 Tagen
(0 Bewertungen)
0.0
vjerankenda

Hi, I have more than 20 years of programming experience. Python, C, R, Visual basic are some of them and many database systems like Oracle, MS Access, mySql and others. As you mentioned, task can be done both Mehr

€111 EUR in 3 Tagen
(0 Bewertungen)
0.0
santoshmungle

hi, Hope you are doing great! I can do it in R or Python since I use both in my work. Cheers, Santosh

€155 EUR in 3 Tagen
(0 Bewertungen)
0.0
subashfree

I propose that i will not break and invent any new statistical theory. More over I am not just happy to be part of your project as also getting some money. Be frank to you I just searched freelance for "R Program" and Mehr

€55 EUR in 3 Tagen
(0 Bewertungen)
0.0
danielunderwood

I can do the column analysis code very quickly in either R or Python, although I would prefer Python. I could only do a graphical interface in Python. The interface would allow you to load a delimited file (comma, spac Mehr

€83 EUR in 2 Tagen
(0 Bewertungen)
0.0