tempData <- read.csv(url("https://laurencipriano.github.io/IveyBusinessStatistics/Datasets/SubscribeBank.csv"), header = TRUE)
summary(tempData)
> age job marital education
> Min. :18.0 Length:45211 Length:45211 Length:45211
> 1st Qu.:33.0 Class :character Class :character Class :character
> Median :39.0 Mode :character Mode :character Mode :character
> Mean :40.9
> 3rd Qu.:48.0
> Max. :95.0
> default balance housing loan
> Length:45211 Min. : -8019 Length:45211 Length:45211
> Class :character 1st Qu.: 72 Class :character Class :character
> Mode :character Median : 448 Mode :character Mode :character
> Mean : 1362
> 3rd Qu.: 1428
> Max. :102127
> contact day month duration
> Length:45211 Min. : 1.0 Length:45211 Min. : 0
> Class :character 1st Qu.: 8.0 Class :character 1st Qu.: 103
> Mode :character Median :16.0 Mode :character Median : 180
> Mean :15.8 Mean : 258
> 3rd Qu.:21.0 3rd Qu.: 319
> Max. :31.0 Max. :4918
> campaign pdays previous poutcome
> Min. : 1.0 Min. : -1 Min. : 0.0 Length:45211
> 1st Qu.: 1.0 1st Qu.: -1 1st Qu.: 0.0 Class :character
> Median : 2.0 Median : -1 Median : 0.0 Mode :character
> Mean : 2.8 Mean : 40 Mean : 0.6
> 3rd Qu.: 3.0 3rd Qu.: -1 3rd Qu.: 0.0
> Max. :63.0 Max. :871 Max. :275.0
> subscribe
> Length:45211
> Class :character
> Mode :character
>
>
>
The data is related with direct marketing campaigns of a Portuguese banking institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was required, in order to access if the product (bank term deposit) would be (or not) subscribed.
Variable | Description |
---|---|
Dependent variable | |
subscribe | Has the client subscribed a term deposit? (binary: ‘yes’, ‘no’) |
Predictor variables | |
age | Age of the client (numeric) |
job | Type of job (categorical: ‘admin.’, ‘unknown’, ‘unemployed’, ‘management’, ‘housemaid’, ‘entrepreneur’, ‘student’, ‘blue-collar’, ‘self-employed’, ‘retired’, ‘technician’, ‘services’) |
marital | Marital status (categorical: ‘married’, ‘divorced’, ‘single’; ‘divorced’ includes divorced or widowed) |
education | Education level (categorical: ‘unknown’, ‘secondary’, ‘primary’, ‘tertiary’) |
default | Has credit in default? (binary: ‘yes’, ‘no’) |
balance | Average yearly balance, in euros (numeric) |
housing | Has housing loan? (binary: ‘yes’, ‘no’) |
loan | Has personal loan? (binary: ‘yes’, ‘no’) |
contact | Contact communication type (categorical: ‘unknown’, ‘telephone’, ‘cellular’) |
day | Last contact day of the month (numeric) |
month | Last contact month of year (categorical: ‘jan’, ‘feb’, …, ‘nov’, ‘dec’) |
duration | Last contact duration, in seconds (numeric) |
campaign | Number of contacts during this campaign (numeric, includes last contact) |
pdays | Number of days since last contact from a previous campaign (numeric, -1 if not previously contacted) |
previous | Number of contacts before this campaign (numeric) |
poutcome | Outcome of the previous marketing campaign (categorical: ‘unknown’, ‘other’, ‘failure’, ‘success’) |
This dataset is public available for research. The details are described in [Moro et al., 2011]. Please include this citation if you plan to use this database:
[Moro et al., 2011] S. Moro, R. Laureano and P. Cortez. Using Data Mining for Bank Direct Marketing: An Application of the CRISP-DM Methodology. In P. Novais et al. (Eds.), Proceedings of the European Simulation and Modelling Conference - ESM’2011, pp. 117-121, Guimarães, Portugal, October, 2011. EUROSIS.
Available at: [pdf] http://hdl.handle.net/1822/14838 [bib] http://www3.dsi.uminho.pt/pcortez/bib/2011-esm-1.txt
Title: Bank Marketing
Sources Created by: Paulo Cortez (Univ. Minho) and Sérgio Moro (ISCTE-IUL) @ 2012
Past Usage:
The full dataset was described and analyzed in:
S. Moro, R. Laureano and P. Cortez. Using Data Mining for Bank Direct Marketing: An Application of the CRISP-DM Methodology. In P. Novais et al. (Eds.), Proceedings of the European Simulation and Modelling Conference - ESM’2011, pp. 117-121, Guimarães, Portugal, October, 2011. EUROSIS.
Relevant Information:
The data is related with direct marketing campaigns of a Portuguese banking institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was required, in order to access if the product (bank term deposit) would be (or not) subscribed.
There are two datasets:
The classification goal is to predict if the client will subscribe a term deposit (variable y).
Number of Instances: 45211 for bank-full.csv (4521 for bank.csv)