Direct Marketing dataset

tempData <- read.csv(url("https://laurencipriano.github.io/IveyBusinessStatistics/Datasets/SubscribeBank.csv"), header = TRUE)

summary(tempData)
>       age           job              marital           education        
>  Min.   :18.0   Length:45211       Length:45211       Length:45211      
>  1st Qu.:33.0   Class :character   Class :character   Class :character  
>  Median :39.0   Mode  :character   Mode  :character   Mode  :character  
>  Mean   :40.9                                                           
>  3rd Qu.:48.0                                                           
>  Max.   :95.0                                                           
>    default             balance         housing              loan          
>  Length:45211       Min.   : -8019   Length:45211       Length:45211      
>  Class :character   1st Qu.:    72   Class :character   Class :character  
>  Mode  :character   Median :   448   Mode  :character   Mode  :character  
>                     Mean   :  1362                                        
>                     3rd Qu.:  1428                                        
>                     Max.   :102127                                        
>    contact               day          month              duration   
>  Length:45211       Min.   : 1.0   Length:45211       Min.   :   0  
>  Class :character   1st Qu.: 8.0   Class :character   1st Qu.: 103  
>  Mode  :character   Median :16.0   Mode  :character   Median : 180  
>                     Mean   :15.8                      Mean   : 258  
>                     3rd Qu.:21.0                      3rd Qu.: 319  
>                     Max.   :31.0                      Max.   :4918  
>     campaign        pdays        previous       poutcome        
>  Min.   : 1.0   Min.   : -1   Min.   :  0.0   Length:45211      
>  1st Qu.: 1.0   1st Qu.: -1   1st Qu.:  0.0   Class :character  
>  Median : 2.0   Median : -1   Median :  0.0   Mode  :character  
>  Mean   : 2.8   Mean   : 40   Mean   :  0.6                     
>  3rd Qu.: 3.0   3rd Qu.: -1   3rd Qu.:  0.0                     
>  Max.   :63.0   Max.   :871   Max.   :275.0                     
>   subscribe        
>  Length:45211      
>  Class :character  
>  Mode  :character  
>                    
>                    
> 

Summary of dataset

The data is related with direct marketing campaigns of a Portuguese banking institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was required, in order to access if the product (bank term deposit) would be (or not) subscribed.

Data dictionary

List of Variables and Definitions
Variable Description
Dependent variable
subscribe Has the client subscribed a term deposit? (binary: ‘yes’, ‘no’)
Predictor variables
age Age of the client (numeric)
job Type of job (categorical: ‘admin.’, ‘unknown’, ‘unemployed’, ‘management’, ‘housemaid’, ‘entrepreneur’, ‘student’, ‘blue-collar’, ‘self-employed’, ‘retired’, ‘technician’, ‘services’)
marital Marital status (categorical: ‘married’, ‘divorced’, ‘single’; ‘divorced’ includes divorced or widowed)
education Education level (categorical: ‘unknown’, ‘secondary’, ‘primary’, ‘tertiary’)
default Has credit in default? (binary: ‘yes’, ‘no’)
balance Average yearly balance, in euros (numeric)
housing Has housing loan? (binary: ‘yes’, ‘no’)
loan Has personal loan? (binary: ‘yes’, ‘no’)
contact Contact communication type (categorical: ‘unknown’, ‘telephone’, ‘cellular’)
day Last contact day of the month (numeric)
month Last contact month of year (categorical: ‘jan’, ‘feb’, …, ‘nov’, ‘dec’)
duration Last contact duration, in seconds (numeric)
campaign Number of contacts during this campaign (numeric, includes last contact)
pdays Number of days since last contact from a previous campaign (numeric, -1 if not previously contacted)
previous Number of contacts before this campaign (numeric)
poutcome Outcome of the previous marketing campaign (categorical: ‘unknown’, ‘other’, ‘failure’, ‘success’)

Dataset access details

This dataset is public available for research. The details are described in [Moro et al., 2011]. Please include this citation if you plan to use this database:

[Moro et al., 2011] S. Moro, R. Laureano and P. Cortez. Using Data Mining for Bank Direct Marketing: An Application of the CRISP-DM Methodology. In P. Novais et al. (Eds.), Proceedings of the European Simulation and Modelling Conference - ESM’2011, pp. 117-121, Guimarães, Portugal, October, 2011. EUROSIS.

Available at: [pdf] http://hdl.handle.net/1822/14838 [bib] http://www3.dsi.uminho.pt/pcortez/bib/2011-esm-1.txt

  1. Title: Bank Marketing

  2. Sources Created by: Paulo Cortez (Univ. Minho) and Sérgio Moro (ISCTE-IUL) @ 2012

  3. Past Usage:

The full dataset was described and analyzed in:

S. Moro, R. Laureano and P. Cortez. Using Data Mining for Bank Direct Marketing: An Application of the CRISP-DM Methodology. In P. Novais et al. (Eds.), Proceedings of the European Simulation and Modelling Conference - ESM’2011, pp. 117-121, Guimarães, Portugal, October, 2011. EUROSIS.

  1. Relevant Information:

    The data is related with direct marketing campaigns of a Portuguese banking institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was required, in order to access if the product (bank term deposit) would be (or not) subscribed.

    There are two datasets:

    1. bank-full.csv with all examples, ordered by date (from May 2008 to November 2010).
    2. bank.csv with 10% of the examples (4521), randomly selected from bank-full.csv. The smallest dataset is provided to test more computationally demanding machine learning algorithms (e.g. SVM).

    The classification goal is to predict if the client will subscribe a term deposit (variable y).

  2. Number of Instances: 45211 for bank-full.csv (4521 for bank.csv)