/***********************************************************************************************************/ /***********************************************************************************************************/ *This do-file reads, processes and merges data from CapitalIQ and Compustat * *Input *a) CapitalIQ data via WRDS (for 2002-2018 data). Note: WRDS does not have 2019 data as of October 2020 *b) CapitalIQ vis online interface (for 2019 data). Note: Online interface only contains last 10 years of data *c) Compustat data via WRDS *d) CapitalIQ/Compustat link from CapitalIQ *e) GDP data from FRED *f) LIBOR annual data /***********************************************************************************************************/ /***********************************************************************************************************/ /****************************************************************/ /********** 0. Intros ***************************************/ /****************************************************************/ set more off global path "C:\Users\tobia\Dropbox\6_WIP Papers\Trends in Corporate Borrowing" /****************************************************************/ /********** 1. Read various innput files **********************/ /****************************************************************/ /*1a) Capital IQ from WRDS (2002-208/2019)*/ use "$path/0_Data/CapitalIQ/Capital IQ - Capital Structure Summary (2002_2019)_20200621.dta", clear *Keep only annnual reports and latest for financial period keep if periodtypename == "Annual" /*22,564,229 obs. deleted */ drop if latestforfinancialperiodflag == 0 /*1,398,379 obs. deleted */ *Sample period gen year = year(periodenddate) gen month = month(periodenddate) keep if year >= 2002 & year <= 2019 *For multiple annual reports within a year: Rule 1) keep latest periodenddate (affects <1% of firm-year observations), Rule 2) keep latest filingdate (affects <0.01% of firm-year observations) gsort companyid year -periodenddate -filingdate duplicates drop companyid year, force *Keep only relevant variables keep companyid year periodenddate filingdate outstandingbalrrevolvingcredit srbondsandnotes undrawncrdtportionrevolvingcrdt undrawncreditportionoftermloans generalotherborrowings outstandingbaltermloans totoutstbalcommercialpaper outstandingbalcapitalleases subordinatedbondsandnotes totadjustments order companyid year periodenddate filingdate outstandingbalrrevolvingcredit srbondsandnotes undrawncrdtportionrevolvingcrdt undrawncreditportionoftermloans generalotherborrowings outstandingbaltermloans totoutstbalcommercialpaper outstandingbalcapitalleases subordinatedbondsandnotes totadjustments *Reformat variables + set to zero if missing foreach var of varlist outstandingbalrrevolvingcredit-totadjustments { destring `var', force replace replace `var' = 0 if `var' == . } *Source = WRDS gen wrds = 1 save "$path/2_IntermediateFiles/CapitalIQ_WRDS.dta", replace /*1b) Capital IQ from online version (2019) - USA*/ import excel using "$path/0_Data/CapitalIQ/USA 2019 public.xls", sheet("Screening") cellrange(A8) firstrow clear *Rename to make names consistent with WRDS version rename ExcelCompanyID companyid replace companyid = subinstr(companyid, "IQ", "", .) destring companyid, force replace rename PeriodDateBalanceSheetCY2 periodenddate destring periodenddate, force replace replace periodenddate = periodenddate - 21916 format periodenddate %td rename FilingDateBalanceSheetCY2 filingdate destring filingdate, force replace replace filingdate = filingdate - 21916 format filingdate %td rename TotalRevolvingCreditCY2019 outstandingbalrrevolvingcredit rename TotalSrBondsandNotesCY20 srbondsandnotes rename UndrawnRevolvingCreditCY201 undrawncrdtportionrevolvingcrdt rename UndrawnTermLoansCY2019U undrawncreditportionoftermloans rename GeneralOtherBorrowingsCY201 generalotherborrowings rename TotalTermLoansCY2019USD outstandingbaltermloans rename TotalCommercialPaperCY2019 totoutstbalcommercialpaper rename TotalLeasesCY2019USDmm outstandingbalcapitalleases rename TotalSubBondsandNotesCY2 subordinatedbondsandnotes rename TotalAdjustmentsCY2019US totadjustments *Sample period gen year = year(periodenddate) gen month = month(periodenddate) keep if year == 2019 *For multiple annual reports within a year: Rule 1) keep latest periodenddate (affects <1% of firm-year observations), Rule 2) keep latest filingdate (affects <0.01% of firm-year observations) gsort companyid year -periodenddate -filingdate duplicates drop companyid year, force *Keep only relevant variables keep companyid year periodenddate filingdate outstandingbalrrevolvingcredit srbondsandnotes undrawncrdtportionrevolvingcrdt undrawncreditportionoftermloans generalotherborrowings outstandingbaltermloans totoutstbalcommercialpaper outstandingbalcapitalleases subordinatedbondsandnotes totadjustments order companyid year periodenddate filingdate outstandingbalrrevolvingcredit srbondsandnotes undrawncrdtportionrevolvingcrdt undrawncreditportionoftermloans generalotherborrowings outstandingbaltermloans totoutstbalcommercialpaper outstandingbalcapitalleases subordinatedbondsandnotes totadjustments *Reformat variables + set to zero if missing foreach var of varlist outstandingbalrrevolvingcredit-totadjustments { destring `var', force replace replace `var' = 0 if `var' == . } *Source = Online (=not WRDS) gen wrds = 0 save "$path/2_IntermediateFiles/CapitalIQ_ONLINE_US.dta", replace /*1c) Capital IQ from online version (2019) - Europe*/ import excel using "$path/0_Data/CapitalIQ/EU 2019 public.xls", sheet("Screening") cellrange(A8) firstrow clear *Rename to make names consistent with WRDS version rename ExcelCompanyID companyid replace companyid = subinstr(companyid, "IQ", "", .) destring companyid, force replace rename PeriodDateBalanceSheetCY2 periodenddate destring periodenddate, force replace replace periodenddate = periodenddate - 21916 format periodenddate %td rename FilingDateBalanceSheetCY2 filingdate destring filingdate, force replace replace filingdate = filingdate - 21916 format filingdate %td rename TotalRevolvingCreditFY2019 outstandingbalrrevolvingcredit rename TotalSrBondsandNotesFY20 srbondsandnotes rename UndrawnRevolvingCreditFY201 undrawncrdtportionrevolvingcrdt rename UndrawnTermLoansFY2019U undrawncreditportionoftermloans rename GeneralOtherBorrowingsFY201 generalotherborrowings rename TotalTermLoansFY2019USD outstandingbaltermloans rename TotalCommercialPaperFY2019 totoutstbalcommercialpaper rename TotalLeasesFY2019USDmm outstandingbalcapitalleases rename TotalSubBondsandNotesFY2 subordinatedbondsandnotes rename TotalAdjustmentsFY2019US totadjustments *Sample period gen year = year(periodenddate) gen month = month(periodenddate) keep if year == 2019 keep if year == 2019 *For multiple annual reports within a year: Rule 1) keep latest periodenddate (affects <1% of firm-year observations), Rule 2) keep latest filingdate (affects <0.01% of firm-year observations) gsort companyid year -periodenddate -filingdate duplicates drop companyid year, force *Keep only relevant variables keep companyid year periodenddate filingdate outstandingbalrrevolvingcredit srbondsandnotes undrawncrdtportionrevolvingcrdt undrawncreditportionoftermloans generalotherborrowings outstandingbaltermloans totoutstbalcommercialpaper outstandingbalcapitalleases subordinatedbondsandnotes totadjustments order companyid year periodenddate filingdate outstandingbalrrevolvingcredit srbondsandnotes undrawncrdtportionrevolvingcrdt undrawncreditportionoftermloans generalotherborrowings outstandingbaltermloans totoutstbalcommercialpaper outstandingbalcapitalleases subordinatedbondsandnotes totadjustments *Reformat variables + set to zero if missing foreach var of varlist outstandingbalrrevolvingcredit-totadjustments { destring `var', force replace replace `var' = 0 if `var' == . } *Source = Online (=not WRDS) gen wrds = 0 save "$path/2_IntermediateFiles/CapitalIQ_ONLINE_Europe.dta", replace *1d) Linktable CapitalIQ -- gvkey use "$path/0_Data/CapitalIQ/CapitalIQ_Linktable_20200621.dta", clear sort companyid save "$path/2_IntermediateFiles/Linktable_CapIQ_Compustat.dta", replace /*1e) Industry SIC codes from Compustat file */ use "$path/0_Data/Compustat/1a3d113b8494fafb.dta", clear *Keep only relevant variables keep gvkey fyear indfmt fic sic at dt xint che dlc dltt *Sample period and minimum restrictions keep if fyear >=2002 & fyear <= 2019 keep if at != . & at > 0 *SIC - reformat as number destring sic, replace *Remove duplicate gvkey-fyear observations bys gvkey fyear: egen INDL = max(indfmt == "INDL") drop if INDL == 1 & indfmt == "FS" drop indfmt INDL *Rename for merge later on rename fyear year sort gvkey year compress save "$path/2_IntermediateFiles/SIC.dta", replace /*1f) US Nominal GDP data (annualy, in USD bn) from the St. Louis Federal Reserve Bank database (FRED) */ import excel "$path/0_Data/FRED/GDP.xls", sheet("FRED Graph") firstrow clear keep if year >= 2002 & year <=2019 replace GDP = GDP * 1000 /*scaling to USD mn*/ sort year save "$path/2_IntermediateFiles/GDP.dta", replace /*1g) Euro Area Nominal GDP data (annually, in EUR bn), from Eurostat*/ import delimited using "$path\0_Data\ECB\data(1).csv", rowrange(3:) delimiter(";") varnames(3) clear destring ,dpcomma force replace rename v1 year keep if year >= 2002 & year <= 2019 egen GDP_Euro12 = rowtotal(austria belgium germany spain finland france greece ireland italy luxembourg netherlands portugal) keep year GDP_Euro12 save "$path/2_IntermediateFiles/GDP_Europe.dta", replace /*1h) 3m LIBOR annual data*/ import excel "$path/0_Data/LIBOR/3m LIBOR annual.xls", sheet("FRED Graph") firstrow clear keep if year != . rename USD3MTD156N LIBOR rename EUR3MTD156N LIBOR_EUR replace LIBOR_EUR = LIBOR_EUR/100 save "$path/2_IntermediateFiles/LIBOR.dta" /*******************************************************************************/ /**** 2. Create master data by merging the adjusted data sets ****************/ /******************************************************************************/ *2a) Capital IQ from WRDS use "$path/2_IntermediateFiles/CapitalIQ_WRDS.dta", clear *2b) Capital IQ from online version (US and Europe) append using "$path/2_IntermediateFiles/CapitalIQ_ONLINE_US.dta" append using "$path/2_IntermediateFiles/CapitalIQ_ONLINE_Europe.dta" *Drop 2019 observations from WRDS if 2019 available in online version duplicates tag companyid year, gen(tag) drop if year == 2019 & tag == 1 & wrds == 1 drop tag /*2c) Merge with CapitalIQ-Compustat Linktable (joins via gvkey)*/ joinby companyid using "$path/2_IntermediateFiles/Linktable_CapIQ_Compustat.dta", unmatched (none) /*Note: drops companyids that are not in the linking table*/ drop if periodenddate < startdate & startdate != . drop if periodenddate > enddate & enddate != . /*Note: These drops make data set unique on companid/year-level*/ gsort gvkey year -periodenddate duplicates drop gvkey year, force *2d) Merge with SIC codes from Compustat sort gvkey year merge gvkey year using "$path/2_IntermediateFiles/SIC.dta" *Keep only firms with available CapitalIQ and Compustat data drop if inlist(_merge, 1, 2) drop _merge *Drop financials and utilities drop if inrange(sic, 6000, 6999) /*drop financials*/ drop if inrange(sic, 4900, 4949) /*drop utilities */ *2e) Merge with nominal GDP data sort year merge m:1 year using "$path/2_IntermediateFiles/GDP.dta" keep if _merge == 3 /*Sanity check: should not delete any observations*/ drop _merge sort year merge m:1 year using "$path/2_IntermediateFiles/GDP_Europe.dta" keep if _merge == 3 drop _merge *2f) Merge with LIBOR data joinby year using "$path/2_IntermediateFiles/LIBOR.dta", unmatched (none) *2g) Checks + Save data set duplicates drop companyid year, force /*Sanity check: there should be no duplicates on companyid/year-level*/ *2h) Calculate variables *Total debt gen total_debt = outstandingbalrrevolvingcredit + srbondsandnotes + generalotherborrowings + outstandingbaltermloans + totoutstbalcommercialpaper + outstandingbalcapitalleases + totadjustments + subordinatedbondsandnotes drop if total_debt < 0 /*drops < 0.01% of observations*/ compress save "$path/2_IntermediateFiles/Master_CapitalIQ_Compustat.dta", replace