Enterprise Information Management Assignment

This assignment will provide  understanding of the material covered in Module 3. In Module 3, we discussed big data in the classroom and analytics in the workshops. While big data is the “talk of the town”, the reality is that most of us deal with small to medium data sets, or data sets that could be crunched on a single computer rather than a “big data” set up. The more important skill is therefore the ability to apply analytics to data sets, which is what we shall focus on in this assessment.

 Case Study

Banking industry has long been at the forefront of using data analytics in global competitive world for various purposes such as marketing, predicting churns, loan defaults, etc. In this assignment, you will act as the data analyst for USB Bank, who is a new competitor to other banks in the country. USB rolled out one of its marketing campaigns through phone calls in which the bank clients were contacted usually more than once in order to access if the product, namely term deposit, would be subscribed or not.

The data has been collected and you were given the data set on CloudDeakin. Further, you were told about the following information that is relevant to the data set.
  •  This data set is multivariate with attributes that have categorical and numeric values.
  •  There are 17 different attributes in the data set and there are 500 cases (or instances). There are some missing values in the data set.
  •  The data set covers aspects such as: (a) client related information, (b) campaign related information, (c) subscription to term deposit

                 o     The client related information describes various characteristics such as balance in                                  bank account, age, job, marital, education, etc.

o    The campaign related information describes communication types, contacts made with the clients during this phone calls campaign, the last contact duration and also the history of contacts with clients for previous campaigns.

o    The attribute ‘subscribe’ is the goal for classification to identify the likelihood of client subscription to term deposit. It is an attribute that takes the values of “yes” or “no”.

 Task 1
Download “bankingdata.csv” and “banking-­‐header-­‐description.txt” from CloudDeakin. Produce an Orange loadable file based on your experience in processing the data in the last workshop. If you are unsure of how to do this, reflect the activity of the last workshop and review the following link

Task 2
Using your knowledge of classification, regression and data exploration, answer the following questions

·         Produce a model that can classify whether the client subscribes to term deposit. That is, given a client’s information and campaign information, the classifier should produce a class label showing the likely subscription to term deposit.

·         What would be the top five attributes that best determine the subscription to term deposit? In listing the top five attributes, discuss how you determine them and supplement with appropriate Orange files if available. Your discussion should be in about 500 words.

·         Produce a model to predict the likely balance of a given client’s information and campaign information.

·         Can we predict the balance for clients who are “widowed” and “defacto”? If no, explain why. If yes, explain how you would do so. Discuss in about 500 words.

Task 3
Discuss how you would ensure that the models you produced in Task 2 are reliable and accurate. Discuss this in about 500 words.

Task 4
As noted in the introduction, there are missing values in the data set. Discuss what you would do with these missing values. Do you remove them, attempt to provide values to these unknowns, or attempt a combination of different techniques? Your discussion should be in about 500 words.

Deliverables
There are a few artifacts that you will need to submit for this assignment. Zip them into a single archive and upload your submission on CloudDeakin.

·         The Orange formatted file that can be loaded into Orange without errors.

·      A word document containing any discussions about your attempt/approach of Task 2 and also the written discussions of Task 3 and Task 4. Appropriate referencing or citations of referenced material are applied in these discussions.

·         The Orange files to load and test your models in Task 2.


Marking rubrics
There are four tasks in this assignment. A grade will be awarded to each task and then an overall mark determined for the entire assessment task. The rubric below gives you an idea of what you must achieve to earn a certain ‘grade’.

As a general rule, to meet a ‘credit’, you must first satisfy the requirements of a ‘pass’. And for a High distinction, you must first satisfy the requirements of a ‘distinction’, which must of course first meet the requirements of a ‘credit’, and so on.


Your final grade will be decided based on the grades for each component in the assessment. It is not a simple average but rather, a final consideration of how you performed as a whole informed by the grades of each component.



Pass
Credit
Distinction
High Distinction
Task 1: Loading of data
File could load with
File loads with some
File loads without
File loads without
into Orange
some errors but can
errors but can proceed
errors
errors and on

proceed with some
without intervention

inspection of its

intervention by marker
by the marker

formatting has all the




required Orange meta-­‐




data
Task 2: Models (bullets
Appropriate model
Good model chosen,
Excellent model
Excellent model
1 and 2)
chosen, works with
works without marker
chosen, works without
chosen, works without

minimal intervention
intervention, produces
intervention, produces
intervention, produces

from marker, produces
good predictions
accurate predictions
accurate predictions

sensible predictions

and accompanied by a
and accompanied by



good relevant
excellent well-­‐



discussion
researched, thoughtful




and relevant




discussion
Task 2: Discussions
Correct and logical
Correct and logical
Correct, logical and
Correct, logical,
(bullets 3 and 4), Tasks
discussion to
discussion that well
researched supported
thoughtful and well-­‐
3 and 4
sufficiently support the
supports the answer
discussion that
researched discussion

answer

strongly supports the
that strongly supports



answer
the answer

No comments:

Post a Comment