Geschlossen

Write some Software

Dieses Projekt wurde SabidHabib für $80 USD vergeben.

Erhalten Sie kostenlose Angebote für ein Projekt wie dieses
Arbeitgeber arbeiten
Projektbudget
$10 - $30 USD
Angebotsanzahl
4
Projektbeschreibung

There are three types of data for assignment. The same data will be used for project. The project will be a group project. The assignments are individual assignments.

Dataset 1: This data set is taken from the UCI- Public data set.

This data contains labelled Cellphone Spam and Good messages. This dataset is well described. Please read the description before you start working on it.

Link to download dataset: [url removed, login to view]+Spam+Collection

Dataset 2: This data set is taken from the UCI- Public data set.

It contains sentences labelled with positive or negative sentiment, extracted from reviews of products, movies, and restaurants. Please read the description before you start working with it.

Link to download dataset: [url removed, login to view]+Labelled+Sentences

Dataset 3: Wikipedia data!

I chose the category CLASSIFICATION_Algorithms. It has 3 categories listed under it: Artificial Intelligence, Decision Tree, Ensemble Learning. We will use these categories as class labels. From each one these categories sample 14 pages. Do not sample pages under CLASSIFICATION_Algorithms!! Use these pages for assignments and projects.

Task Description:

----------------------------------------------------------

There are two sets of Wikipedia articles. The first set is from Wikipedia featured articles of a certain type. The first set becomes class Featured. The second set of articles are wikipedia (non-featured) articles of similar type to featured articles. The second set becomes class Non-Featured. We are dealing with a binary classification problem.

To create attributes, extract all possible tokens from the entire dataset after stemming and stop-word removal. Create 1-gram, 2-gram and 3-grams from these tokens. Use these n-grams as the attributes for ARFF files.

Perform attribute selection on each of 1-gram, 2gram, 3-gram an using information gain and gain ratio. Perform classification using decision tree, and naïve Bayes.

Make a Wiki report on your finding including various statistical evaluation measures given by WEKA for each classifier.

Link: Classification_algorithms: [url removed, login to view]:Classification_algorithms

Link: Artificial Intelligence: [url removed, login to view]:Artificial_neural_networks

Link: Decision Tree: [url removed, login to view]:Decision_trees

Link: Ensemble Learning: [url removed, login to view]:Ensemble_learning

NOTE:

Stemming and Stop-Word removal: You can use NLTK!!

Stemming: Convert to root word. Running-->.Run

Stop words: High frequency but low meaning

[url removed, login to view]

Möchten Sie Geld verdienen?

  • Legen Sie Ihr Budget und Ihren Zeitraum fest
  • Skizzieren Sie Ihr Angebot
  • Bekommen Sie Geld für Ihre Arbeit

Heuern Sie Freelancer an, die auch auf dieses Projekt geboten haben

    • Forbes
    • The New York Times
    • Time
    • Wall Street Journal
    • Times Online