R or weka lab | Computer Science homework help

  

Laboratory I:

         

To download additional .arff data sets go to:

http://www.hakank.org/weka/

or search the Internet for .arff files required

· What’s the difference between a “training set” and a “test set”?

· Why might a pruned decision tree that doesn’t fit the data so well be better than an un-pruned one?

· What’s the first thing that 1R does when making a rule based on a numeric attribute?

· How does 1R avoid overfitting when making a rule based on an enumerated and/or numeric attribute?

· What is the difference between Attribute, Instance and Training set? 

  • What      is the difference between ID3 and C4.5?
  1. Use the following learning      schemes to analyze the iris data (in iris.arff): 

  

OneR

– weka.classifiers.OneR

 

Decision table

– weka.classifiers.DecisionTable -R

 

C4.5

– weka.classifiers.j48.J48

· Do the decisions made by the classifiers make sense to you? Why?

· What can you say about the accuracy of these classifiers? When classifying iris that has not been used for training? 

· How did each one of the methods perform?

  1. Use the following learning      schemes to analyze the bolts data (bolts.arff without the TIME attribute):      

  

Decision Tree

– weka.classifiers.j48.J48

 

Decision table

– weka.classifiers.DecisionTable -R

 

Linear regression

– weka.classifiers.LinearRegression

 

M5′ 

– weka.classifiers.M5′

· The dataset describes the time needed by a machine to produce and count 20 bolts. (More details can be found in the file containing the dataset.) 

· Analyze the data. What adjustments have the greatest effect on the time to count 20 bolts? 

· According to each classifier, how would you adjust the machine to get the shortest time to count 20 bolts?

  1. Produce      a model for both Weather and Weather.nominal data sets. Which method(s) did you use? What did      the tree(s) look like?

Laboratory II:

 

To download additional .arff data sets go to:

weka data folder for

BreastTumor.arff

http://www.hakank.org/weka/

zoo.arff, wine.arff, bodyfat.arff, sleep.arff, pollution.arff

  1. Use the following learning schemes to analyze the zoo      data (in zoo.arff): 

  

OneR

– weka.classifiers.OneR

 

Decision table

– weka.classifiers.DecisionTable -R

 

C4.5

– weka.classifiers.j48.J48

 

K-means

– weka.clusterers.SimpleKMeans

Try using reduced error pruning for the C4.5. Did it change the produced model? Why? 

For K-means, for the first run, set k=10. Adjust as needed. What was the final number of k? Why?

  1. Use the following learning schemes to analyze the      breast tumor data. 

  

Linear regression

– weka.classifiers.LinearRegression

 

M5′ 

– weka.classifiers.M5′

 

Regression Tree

– weka.classifiers.M5′

 

K-means clustering

– weka.clusterers.SimpleKMeans

A) How many leaves did the Model tree produce? Regression Tree? What happens if you change the pruning factor? 

How many clusters did you choose for the K-means method? Was that a good choice? Did you try a different value for k?

B) Now perform the same analysis on the bodyfat.arff data set.

  1. Use a      k-means clustering technique to analyze the iris data set. What did you      set the k value to be? Try several different values. What was the random seed value?      Experiment with different random seed values. How did changing of these values      influence the produced models?
  2. Produce      a hierarchical clustering (COBWEB) model for iris data. How many clusters did it produce? Why?      Does it make sense? What did you expect?

Change the acuity and cutoff parameters in order to produce a model similar to the one obtained in the book. Use the classes to cluster evaluation – what does that tell you?

Laboratory III:

 

To download additional .arff data sets go to:

http://www.hakank.org/weka/

zoo.arff, wine.arff, soybean.arff, zoo2_x.arff, 

sunburn.arff, disease.arff

8. Use the following learning schemes to compare the training set and 10-fold stratified cross-validation scores of the disease data (in disease.arff): 

  

Decision table

– weka.classifiers.DecisionTable -R

 

C4.5

– weka.classifiers.j48.J48

 

Id3

– weka.clusterers.Id3

A) What does the training set evaluation score tell you? 

B) What does the cross-validation score evaluate? 

C) Which one of these models would you say is the best? Why?

9. Use the following learning schemes to analyze the wine data (in wine.arff). 

  

C4.5

– weka.classifiers.j48.J48

 

Decision List

– weka. classifiers.PART

A) What is the most important descriptor (attribute) in wine.arff?

B) How well were these two schemas able to learn the patterns in the dataset? How would you quantify your answer?

C) Compare the training set and 10-fold cross-validations scores of the two schemas.

D) Would you trust these two models? Did they really learn what is important for proper classification of wine?

E) Which one would you trust more, even if just very slightly?

10. Perform the same analysis of sunburn.arff as in 2. Instead of 10-fold cross-validations use 5-fold.

A)-E) Same as in 2.

F) Why could not we use 10-fold evaluation in this example?

11. Choose one of the following three files: soybean.arff, zoo.arff or zoo2_x.arff and use any two schemas of your choice to build and compare the models.

Calculate the price
Make an order in advance and get the best price
Pages (550 words)
$0.00
*Price with a welcome 15% discount applied.
Pro tip: If you want to save more money and pay the lowest price, you need to set a more extended deadline.
We know how difficult it is to be a student these days. That's why our prices are one of the most affordable on the market, and there are no hidden fees.

Instead, we offer bonuses, discounts, and free services to make your experience outstanding.
How it works
Receive a 100% original paper that will pass Turnitin from a top essay writing service
step 1
Upload your instructions
Fill out the order form and provide paper details. You can even attach screenshots or add additional instructions later. If something is not clear or missing, the writer will contact you for clarification.
Pro service tips
How to get the most out of your experience with Australia Assessments
One writer throughout the entire course
If you like the writer, you can hire them again. Just copy & paste their ID on the order form ("Preferred Writer's ID" field). This way, your vocabulary will be uniform, and the writer will be aware of your needs.
The same paper from different writers
You can order essay or any other work from two different writers to choose the best one or give another version to a friend. This can be done through the add-on "Same paper from another writer."
Copy of sources used by the writer
Our college essay writers work with ScienceDirect and other databases. They can send you articles or materials used in PDF or through screenshots. Just tick the "Copy of sources" field on the order form.
Testimonials
See why 20k+ students have chosen us as their sole writing assistance provider
Check out the latest reviews and opinions submitted by real customers worldwide and make an informed decision.
Environmental studies and Forestry
Excellent work.
Customer 458115, May 2nd, 2022
Military
good work
Customer 456821, March 18th, 2022
Business and administrative studies
Good analysis and computation,
Customer 463053, July 9th, 2022
Business Studies
Very well done, thank you!
Customer 462533, February 6th, 2022
English 101
Great work.
Customer 456823, May 15th, 2022
MGMT 221 Introduction to Management Information System
Great job!
Customer 457731, February 25th, 2022
Healthcare & Medical
Great
Customer 463473, November 4th, 2022
Business
Good work. Looking foward to future contributions.
Customer 463463, October 29th, 2022
SAP (Sociology, Anthropology & Psychology)
Good work.
Customer 462275, June 21st, 2022
Retail
This task was well written Looking forward to your future contribution.
Customer 463463, November 11th, 2022
Education
Excellent work!
Customer 463337, April 18th, 2023
Marketing
Did an excellent job on the paper!
Customer 463095, June 22nd, 2022
11,595
Customer reviews in total
96%
Current satisfaction rate
3 pages
Average paper length
37%
Customers referred by a friend
OUR GIFT TO YOU
15% OFF your first order
Use a coupon FIRST15 and enjoy expert help with any task at the most affordable price.
Claim my 15% OFF Order in Chat