This
assignment will provide understanding of the
material covered in Module 3. In Module 3, we discussed big data in the
classroom and analytics in the workshops. While big data is the “talk of the town”, the reality is that most of us deal with small to medium data sets, or data sets that could be crunched on a single
computer rather than a “big data” set up. The more important skill is therefore the ability to apply analytics to data sets, which is what we shall focus on in this assessment.
Case Study
Banking
industry has long been at the forefront of using data analytics in global
competitive world for various purposes such as marketing, predicting churns,
loan defaults, etc. In this assignment, you will act as the data analyst for USB Bank, who is a new competitor to other banks
in the country. USB rolled out one of
its marketing campaigns through phone calls in which the bank clients were
contacted usually more than once in order to access
if the product, namely term deposit, would be subscribed or not.
The data has
been collected and you were given the data set on CloudDeakin. Further, you
were told about the following information that is relevant to the data set.
- This data set is multivariate with attributes that have categorical and numeric values.
- There are 17 different attributes in the data set and there are 500 cases (or instances). There are some missing values in the data set.
- The data set covers aspects such as: (a) client related information, (b) campaign related information, (c) subscription to term deposit
o The client related information describes various
characteristics such as balance in bank account, age, job, marital, education, etc.
o
The campaign related information describes communication
types, contacts made with the clients during this phone calls campaign, the
last contact duration and also the history of contacts with clients for
previous campaigns.
o
The
attribute ‘subscribe’ is the goal for classification to identify the likelihood
of client subscription to term deposit. It is an attribute that takes the values of “yes” or “no”.
Task 1
Download “bankingdata.csv”
and “banking-‐header-‐description.txt” from CloudDeakin.
Produce an Orange loadable file based on your experience in processing the data in the last workshop. If you are unsure of how
to do this, reflect the activity of the last workshop and review the following link
Using your knowledge of classification, regression and data exploration, answer the following
questions
·
Produce
a model that can classify whether the client subscribes to term deposit. That
is, given a client’s information and campaign information, the classifier should
produce a class label showing the likely subscription to term deposit.
·
What would be the top five attributes that best determine
the subscription to term deposit? In listing
the top five attributes, discuss
how you determine
them and supplement with appropriate
Orange files if available. Your discussion should be in about 500 words.
·
Produce a model to predict the likely balance
of a given client’s information and campaign
information.
·
Can we predict the balance for clients who are “widowed” and “defacto”? If no, explain
why. If yes, explain how you would do so. Discuss in about 500 words.
Task 3
Discuss
how you would ensure that the models you produced
in Task 2 are reliable
and accurate. Discuss this in about 500 words.
Task 4
As noted in the introduction, there are missing
values in the data set. Discuss what you would
do with these missing values. Do you remove them,
attempt to provide values to these unknowns, or attempt a combination of different techniques? Your discussion should be in about 500 words.
Deliverables
There are a few artifacts that you will
need to submit for this assignment. Zip them into a single archive and upload
your submission on CloudDeakin.
·
The Orange formatted file that can be loaded into Orange without errors.
· A word document containing any discussions about your
attempt/approach of Task 2 and also the written discussions of Task 3 and Task
4. Appropriate referencing or citations of referenced material are applied in
these discussions.
·
The
Orange files to load and test your models in Task 2.
Marking rubrics
There are
four tasks in this assignment. A grade will be awarded to each task and then an
overall mark determined for the entire assessment task. The rubric below gives
you an idea of what you must achieve to earn a certain ‘grade’.
As a general
rule, to meet a ‘credit’, you must first satisfy the requirements of a ‘pass’.
And for a High distinction, you must first satisfy the requirements of a
‘distinction’, which must of course first meet the requirements of a ‘credit’,
and so on.
Your final
grade will be decided based on the grades for each component in the assessment.
It is not a simple average but rather, a final consideration of how you
performed as a whole informed by the grades
of each component.
|
Pass
|
Credit
|
Distinction
|
High Distinction
|
Task 1: Loading of data
|
File could load with
|
File loads with some
|
File loads without
|
File loads without
|
into Orange
|
some
errors but can
|
errors
but can proceed
|
errors
|
errors
and on
|
|
proceed
with some
|
without intervention
|
|
inspection
of its
|
|
intervention
by marker
|
by
the marker
|
|
formatting
has all the
|
|
|
|
|
required Orange meta-‐
|
|
|
|
|
data
|
Task 2: Models (bullets
|
Appropriate model
|
Good model chosen,
|
Excellent model
|
Excellent model
|
1
and 2)
|
chosen,
works with
|
works
without marker
|
chosen,
works without
|
chosen,
works without
|
|
minimal intervention
|
intervention, produces
|
intervention, produces
|
intervention, produces
|
|
from
marker, produces
|
good predictions
|
accurate predictions
|
accurate predictions
|
|
sensible predictions
|
|
and
accompanied by a
|
and
accompanied by
|
|
|
|
good relevant
|
excellent well-‐
|
|
|
|
discussion
|
researched, thoughtful
|
|
|
|
|
and relevant
|
|
|
|
|
discussion
|
Task 2: Discussions
|
Correct and logical
|
Correct and logical
|
Correct, logical and
|
Correct, logical,
|
(bullets
3 and 4), Tasks
|
discussion
to
|
discussion
that well
|
researched supported
|
thoughtful and well-‐
|
3 and 4
|
sufficiently
support the
|
supports
the answer
|
discussion that
|
researched discussion
|
|
answer
|
|
strongly
supports the
|
that
strongly supports
|
|
|
|
answer
|
the answer
|
No comments:
Post a Comment