In this Module 3 Discussion, we shall discuss how to use R to conduct logistic regression on binary dependent variable datasets. Please answer the questions by filling out the blank on the right hand side of the table.
To answer these questions, please go over the four examples (Example 1,2,3,4) in Data Mining and Business Analytics with R Chapter 7 and Data Mining for Business Analytics: Concepts, Techniques, and Applications in R Chapter 10 (all found in this week’s Readings & Resources) to find and then fill in the blanks in the above table for R functions we can use to handle those specific steps. You may also refer to some open resources to find relevant answers to fill in those blanks as answers.
What are the percentages of the training and test sets in those examples?
What percentages do you think will generate better prediction outcomes and why?
Describe what qualifies a good “lift curve”?
Do you think the “lift curve” for example 3 is a good one or not? Can you explain why or why not?
What are the R functions we use to conduct logistic regression?
What are the automated variable selection heuristics we can use for optimal model selection in logistic and multiple regressions? Please also show as many R functions as you can.
Can you show how to calculate the accuracy rate of evaluating the successes of prediction in Table 10.8 of the reference textbook (Ch. 10)?
In your response to other students, suggest changes to their answers that you think would make it a stronger study, or ask clarifying questions if anything was missing or confusing.