Projects aim to impute missing values of the given datasets. You have to write a code in the programming
language of your choice (e.g., MTLAB /or/ Python /or/ R /or/ FORTRAN /or/ C /or/ C++) to read some excel data
(step-1), identify the missing data (step-2), and then impute the missing values in the data based on the technique
given in the proposed reference for this project (step-3), consequently, return the imputed data and compare it
with the complete data to measure the accuracy and reliability of your results (step-4).
In the step 1, do not limit your code to a specific data size or data dimension, I mean you have to be able to read
or load the data with different size and dimension. You will receive some datasets with numerical/categorical
attributes in XLS and/or CSV format, I will upload later!
In the step 2, you discover the number and the location of the missing data. For instance, if you return the missing
indices, you are able to discover the missing data patterns (univariate, monotone, arbitrary missing data). Then not
only you can successfully handle the next step, but also you gain more points!
In the step 3, you have to read the reference paper given for the proposed method and understand the algorithm
and try to write a code to impute (i.e., single or multiple) the missing data based on the given approach.
In the step 4, you have to manage your code to return the imputed values. Then you are able to compare the
imputed values with the original complete data to compute the error (NRMS). You can automatically or manually
generate some diagrams to present and compare your results with the original complete datasets.
Every step has its own credit and the successful and unsuccessful projects will be considered into account.
However, I expect the clear and commented (to some extend) programming where we are able to execute your
code easily, see and check your results (preferably by means of a visualization technique of your choice) and
trustful and reliable results.
I have read your attached description and I can handle your project. Also I have similar experience in the past based on this project, so I have already some code snippets. Please don't hesitate to ask me if you are interesting and provide me full details on what you need me to do. Thanks.
Best regards, Vasily A.
I will do your project using Python , this seems to me the easiest and the quickest way of doing that. And I think personally that a CSV format would be suitable in that case.
Relevant Skills and Experience
I have 2 years experience in implementing several mathematical concepts in Analysis,Linear Algebra and experimenting Machine learning algorithm using python in my current study.
Proposed Milestones
$45 CAD - Predicting the missing values
Impute data using supervised learning methods or method proposed by research paper.
Implement this method in R language Script.
Relevant Skills and Experience
R language
File handling
Research
statistics
machine learning
Proposed Milestones
$100 CAD - Implementation of data imputation algorithem
Happy to work with you