Project 3 – Ensemble Methods and Unsupervised Learning In this project you will explore some techniques in unsupervised learning as well as

Project 3 – Ensemble Methods and Unsupervised Learning

In this project you will explore some techniques in unsupervised learning
as well as ensemble methods. It is important to realize that understanding
an algorithm or technique requires understanding how it behaves under a
variety of circumstances. You will go through the process of choosing and
exploring two classification datasets, tuning the algorithms you have
learned about, writing a thorough analysis of your findings, and presenting
your findings. The most crucial part of this assignment is the analysis and
your ability to explain and justify your results.

I. Choosing Datasets

The first task in this assignment is choosing two interesting classification
datasets, these can be binary or multiclass. The features can be of any
type, and it is recommended that you choose datasets with diverse feature
sets. I don’t care where you get the data from. You can download some,
take some from your own research, or make some up on your own. What I
do care about is that the datasets must be interesting. They should
contain a decent amount of features and a sufficiently large amount of
examples. Do not choose an “easy” dataset, however don’t go crazy either
trying to find the perfect one. Your two datasets should also differ in some
way such that you can compare and contrast your results between the
two. You should also be following standard machine learning practice by
splitting your dataset into training and testing, and only touching the
testing dataset at the very end when you are ready to report results. (Cross
validation is highly recommended).

II. Coding (10%)

After choosing your datasets you will now be tasked with writing code to apply
the machine learning algorithms you have learned about. Your code must be
written in python, but you may use any libraries that have already implemented
the machine learning algorithms (e.g scikit-learn). You are not expected to code
the algorithms from scratch, and in fact I would highly discourage it. What you
may not do is copy code from the internet. Below are the analyses you are
required to run.

1) Run K-means and Hierarchical Clustering on your datasets and analyze
what you observe.

2) Run two dimensionality reduction algorithms (PCA and UMAP) on your
datasets. Observe and analyze the results.

3) Re-run the K-means and Hierarchical Clustering on your dimensionality
reduced datasets and compare the results to part (1).

4) Tune and train two ensemble models (AdaBoost and Random Forests) on
both your original and dimensionality reduced datasets. Compare and
analyze the results.

Your code does not have to be pretty or well written. However, it must be written
in python and I must be able to run one script (main.py) that will produce all the
results and figures in your report.

III. Report (80%)

You will then produce a report describing and analyzing your methods and
results. Here you will describe the datasets you have chosen and why they are
interesting. You will then provide an analysis on how the different machine
learning algorithms performed on each dataset. The report must be limited to 10
pages maximum. Plots and figures are highly recommended. It is up to you
how you wish to demonstrate your understanding of the machine learning
algorithms you have explored, but below I have listed some potential ideas for
analysis and items you may wish to include in the report.

• A description of your two datasets and why you feel that they are interesting.

• Hypotheses on how you believe the learning algorithms will perform on each

dataset and why.

• How you dealt with different features in your datasets? missing data? different

scalings?

• Training and testing error rates you obtained for your various learning

algorithms (some sort of cross validation is highly recommended)

• The effect of hyperparameters on performance

• Comparing and contrasting results between datasets

• Comparing and contrasting results between learning algorithms

• Training and testing error rates as a function of training dataset size

• Timing analysis of how long it takes to train/test each algorithm

• Conclusions

• Ideas for future analyses

• What you may have done differently

• References

You are NOT being graded on how well the algorithms perform on your datasets.
What is most important is WHY? You should be explaining and justifying all of
your figures and results, and demonstrating that you understand the intricate
details of the machine learning process, and the machine leaning algorithms you
are using.

IV. Presentation (10%)

Finally you will give a maximum 7 minute presentation of your results (You will be
cut off exactly at the 7 minute mark). In this presentation you will describe your
datasets, your methods, and any interesting results you found!

What to turn in?

Below is a list of items you will be required to turn in via canvas. Please make
sure all documents are named as described bellow.

• report.pdf – Your maximum 10 page report in pdf format. Do not use super
tiny or large font. No specific formatting is required but use common sense.

• presentation.pptx or presentation.key – Your presentation slides either in a
powerpoint or keynote document.

• code.zip – A zip file with all of the code you have written. Within the folder
there should a file called README.txt that contains instructions on how to run
your code, and a python file called main.py that will produce all figures and
plots in your report/presentation. I should be able to reproduce your results
easily.

• data.zip – A zip file that contains the two datasets you have chosen.

Grading

You are being scored on your analysis more than anything else. Roughly
speaking, implementing everything and getting it to run is worth very little for
this assignment. Of course though, analysis without proof of working code
makes the analysis suspect. The key thing is that your explanations should be

both thorough and concise, and your analysis should prove to me you have a
deep understanding of the machine learning process and the machine learning
algorithms you are using.

Share This Post

Email
WhatsApp
Facebook
Twitter
LinkedIn
Pinterest
Reddit

Order a Similar Paper and get 15% Discount on your First Order

Related Questions

Week 8 – Signature Assignment: Summarize Healthcare Strategic Management of Marketing Initiatives For this assignment, as the

Week 8 – Signature Assignment: Summarize Healthcare Strategic Management of Marketing Initiatives For this assignment, as the CEO of Hospital G, you will create a voice-over PowerPoint presentation for a future board meeting to summarize the actions taken to regain market share, improve patient communication and satisfaction, and to improve clinical quality services

新机器加域与V账号无法登陆问题解决 1. 请先给新员工申请file1文件夹权限, 发至安全组[email protected] 2. 新员工拿到V账号和密码后,先在其他可以登录的电脑上登录自己的V账号和密码,进行密码的修改 3.

新机器加域与V账号无法登陆问题解决 1. 请先给新员工申请file1文件夹权限, 发至安全组[email protected] 2. 新员工拿到V账号和密码后,先在其他可以登录的电脑上登录自己的V账号和密码,进行密码的修改 3. 请在新机器中安装Global Protect,安装包在 \\10.63.4.11\员工须知\VPN\Global Protect 4. 安装好后请运行Global Protect,并用有电话验证的员工的V账号登录 5. GP连接成功后,请给电脑正常加域fareast.corp.microsoft.com,并用本机需要使用人的V账号验证. 6. 加完域后重启电脑 7. 重启电脑后继续用administrator登录,连接GP,然后把本机需要使用人的V账号加为管理员 8. 左下角search,输入IE,然后右键IE,选择open file location 9. shift+右键IE,选择 run as a different user,然后用本机使用人V账号验证 10. 验证成功后sign out管理员,用V账号登录即可。 每一步都需要按照教程来,一步出错就可能导致V账号登录不进去,请各位相互转告

Assessment 01 Collaboration and Leadership Reflection Video For this assessment you will use Kaltura to create a 5–10 minute video reflection that a

Assessment 01 Collaboration and Leadership Reflection Video For this assessment you will use Kaltura to create a 5–10 minute video reflection that addresses either an interprofessional collaboration you experienced or the case study on interprofessional collaboration presented below. If you choose to reflect on the interprofessional case study presented below,

Patient Education: OTC Medications: Purpose The purpose of this assignment is to explore over-the-counter (OTC) medications for HEENT symptom

Patient Education: OTC Medications: Purpose The purpose of this assignment is to explore over-the-counter (OTC) medications for HEENT symptom management. This assignment will allow for discovery into commonly used OTC agents and associated patient education points required for the safe, effective use of these agents. Formulation of a patient-education infographic

  Coordination of Care  Identify the key nursing steps during the Home Visit.  Describe the five (5)  challenges you may encounter as a Community/Public

  Coordination of Care  Identify the key nursing steps during the Home Visit.  Describe the five (5)  challenges you may encounter as a Community/Public Health Nurse assisting caregivers (family/significant others) with their additional responsibilities and burdens of caring for a loved one. Select two (2) of the challenges in number 2