***************************************** * Cluster Analysis Paper **************** * MMP DATA PREP, 2009 ******************* * Data created: October 4, 2009 ********* * Last update: December 21, 2009 12pm *** ***************************************** * NOTE - The code here is the same as mmp_cluster_data9_final.do ******************** * The only difference is that we are keeping migrants and non-migrants in *********** * the sample on the survey year. We are essentially comparing those who've migrated * * at least once to those who have never migrated. *********************************** * We correct also the computation of hh migrants in the sample ********************** ************************************************************************************* * IMPORTANT DECISIONS * ************************* * 1. SAMPLE - We start with the 124 communities and eliminate the 4 interviewed in 1982 (Kandel's * recommendation. We also do not include the sample surveyed in the U.S. (Munshi 2000) - non-random. * I am not sure about the latter - WE COULD INCLUDE THE U.S. SAMPLE ALTHOUGH IT IS NOT RANDOM * a. MIGRANTS - To study migrants alone, we use PERS data (instead of LIFE) because it contains * information on all household members, and their first moves. (LIFE contains information on all * moves made by household heads alone.) * b. MIGRANTS AND NON-MIGRANTS - To study migrants and non-migrants (i.e., to use clustering to * discriminate between them), we use LIFE data, and include all migration moves made by hh heads. * i. First strategy could be Gary King's idea - discover clusters using data from the first * few periods, then apply them to the rest of the data. Then discover clusters using data from * the last period, and apply retrospectively to earlier periods. Show the different results. * ii. Second, similar to Nagin et al., we can study migration from a life-course perspective, * and discover different trajectories (i.e., non-migrants, those who migrate once, and those * who migrate repeatedly). Again, the LIFE data would be of interest. * 2. HOUSEHOLD WEALTH - We use the data set house_wealth.dta generated in mmp_hhwealth_data124.do. * 3. REMITTANCE DATA - We create a restricted data set with household heads on their last trips (to use * the remittance information). clear set mem 2000m set matsize 5000 set more off cd "/Users/fgarip/Documents/Data/MMP Data" * cd "/Volumes/s-vol$/Home/Faculty/fgarip/Documents/Data/MMP Data" * cd "M:\Documents\Data\MMP Data\" * Sort Data To Be Merged * ************************** use "Clustering\usjanwage.dta", clear sort year save "Clustering\usjanwage.dta", replace use "Clustering\init_data.dta", clear keep commun hhnum persnum room lnland business ren room room2 ren lnland lnland2 ren business business2 sort commun hhnum persnum save "Clustering\dataid_temp.dta", replace ********************************** ***** (1) NATIONAL-YEAR DATA ***** ********************************** use "Data\natlyear.dta", clear sort year * Get the US wage data from BLS - Massey and Espinosa data in natlyear.dta is not * reliable. It is not clear if it is measured in real or nominal terms. So, I use data * on the average hourly earnings of production workers (nominal) and convert to real * dollars ($US in 2000) * Fill-in the missing years' data (2007 and 2008) replace exchrate = 10.928 if year==2007 replace exchrate = 11.156 if year==2008 replace const00 = 0.83051 if year==2007 replace const00 = 0.79980 if year==2008 merge year using "Clustering\usjanwage.dta" keep if _merge==3 drop _merge replace usjanwage = usjanwage*const00 lab var usjanwage "Average hourly wages (prod workers) in the U.S. ($ in 2000)" * Generate a constant peso indicator for Mexico gen const00_mx = 1 gsort -year replace const00_mx = const00_mx[_n-1]*(1+infrate[_n-1]/100) if year<2000 gsort year replace const00_mx = const00_mx[_n-1]/(1+infrate[_n-1]/100) if year>2000 * Other National Indicators * * These variables are typically available until year 2002. replace lwhrs = . if lwhrs==8888 replace probapp = . if probapp==8888 replace visaaccs = . if visaaccs==8888 replace insbudgt = . if insbudgt==8888 replace bpebudgt = . if bpebudgt==8888 replace mexrlint = . if mexrlint==8888 replace mxunemp = . if mxunemp==8888 | mxunemp==9999 replace usunemp = . if usunemp==8888 replace totusemp = . if totusemp==8888 replace dfinvest = . if dfinvest==8888 replace exports = . if exports==8888 replace imports = . if imports==8888 replace tradebal = . if tradebal==8888 * Convert exports, imports, trade bal to real dollars replace exports = exports*const00 replace imports = imports*const00 replace tradebal = tradebal*const00 gen lnexports = ln(exports) gen lnimports = ln(imports) lab var exports "exports to Mexico in real 2000 US$" lab var imports "imports from Mexico in real 2000 US$" lab var lnexports "log of exports in real 2000 US$" lab var lnimports "log of imports in real 2000 US$" * Trade growth (Import + Export) gen trade = exports + imports lab var trade "trade in real 2000 US$" gen lntrade = ln(trade) lab var lntrade "log of trade in real 2000 US$" gen tradegr = . sort year replace tradegr = (trade[_n]-trade[_n-1])/trade[_n-1] if _n>1 lab var tradegr "trade (exp + imp) growth - annual" gen tradegr_ma = tradegr sort year replace tradegr_ma = (tradegr[_n-2] + tradegr[_n-1] + tradegr[_n])/3 if _n>2 lab var tradegr_ma "trade (exp + imp) growth - mov av (3yr)" * Convert INS and BPE budget to real dollars replace insbudgt = insbudgt*const00 lab var insbudgt "ins budget in real 2000 US$" replace bpebudgt = bpebudgt*const00 lab var bpebudgt "border patrol enforcement budget in real 2000 US$" gen lnins = ln(insbudgt) gen lnbpe = ln(bpebudgt) gen lnlw = ln(lw) lab var lnins "log of ins budget in real 2000 US$" lab var lnbpe "log of border patrol enforcement budget in real 2000 US$" lab var lnlw "log of line watch hours" * Convert FDI to real dollars (I am not sure if it is in real or nominal dollars) replace dfinvest = dfinvest*const00 gen dfigr = . sort year replace dfigr = (dfinvest[_n]-dfinvest[_n-1])/dfinvest[_n-1] if _n>1 lab var dfigr "FDI growth annual" * Compute Moving Average (in 3 year window) for FDI to Mexico gen dfigr_ma = dfigr sort year replace dfigr_ma = (dfigr[_n-2] + dfigr[_n-1] + dfigr[_n])/3 if _n>2 lab var dfigr_ma "FDI growth - mov av (3yr)" * Compute US employment growth gen usempgr = . sort year replace usempgr = (totusemp[_n] - totusemp[_n-1])/totusemp[_n-1] if _n>1 lab var usempgr "U.S. employment growth annual" * Compute Moving Average (in 3 year window) for US employment growth gen usempgr_ma = usempgr sort year replace usempgr_ma = (usempgr[_n-2] + usempgr[_n-1] + usempgr[_n])/3 if _n>2 lab var usempgr_ma "U.S. employment growth - mov av (3yr)" save "Clustering\natlyear_temp.dta", replace ***************************** * (2) HOUSEHOLD WEALTH DATA * ***************************** use "Clustering\house_wealth.dta", clear sort commun hhnum year * Compute commun inequality by land & property * ************************************************ bys commun year: gen ssize = _N gsort commun year -troom bys commun year: gen rank = _n by commun year: egen mnroom = mean(troom) gen wrank = rank*troom by commun year: egen swrank = sum(wrank) gen rgini = (ssize+1)/(ssize-1) replace rgini = rgini - 2/(ssize*(ssize-1)*mnroom)*swrank drop rank wrank swrank lab var rgini "rooms gini by community-year" gsort commun year -vland bys commun year: gen rank = _n by commun year: egen mnland = mean(vland) gen wrank = rank*vland by commun year: egen swrank = sum(wrank) gen lgini = (ssize+1)/(ssize-1) replace lgini = lgini - 2/(ssize*(ssize-1)*mnland)*swrank drop rank wrank swrank ssize lab var lgini "land value gini by community-year" * Relative deprivation by land & property * ******************************************* * prop is the proportion of people with MORE property than person * should be the same for all people with the same number of property * i.e., ties are treated the same in ranking sort commun year troom gen prop = . by commun year: replace prop = (_N-_n)/_N gsort commun year troom -prop by commun year troom: replace prop = prop[_N] * mnex is the mean excess rooms by * people with more number of property gsort commun year -troom gen mnex = troom by commun year: replace mnex = troom[_n-1] if _n==2 by commun year: replace mnex = troom[_n-1]*1/(_n-1) + mnex[_n-1]*(_n-2)/(_n-1) if _n>2 gsort commun year troom mnex by commun year troom: replace mnex = mnex[_N] by commun year: replace mnex = mnex-troom gen rdroom = prop*mnex lab var rdroom "hh rel depr wrt room in commun-year [0,1]" drop mnex prop sort commun year lnvland gen prop = . by commun year: replace prop = (_N-_n)/_N gsort commun year lnvland -prop by commun year lnvland: replace prop = prop[_N] gsort commun year -lnvland gen mnex = lnvland by commun year: replace mnex = lnvland[_n-1] if _n==2 by commun year: replace mnex = lnvland[_n-1]*1/(_n-1) + mnex[_n-1]*(_n-2)/(_n-1) if _n>2 gsort commun year lnvland mnex by commun year lnvland: replace mnex = mnex[_N] by commun year: replace mnex = mnex-lnvland gen rdland = prop*mnex lab var rdland "hh rel depr wrt land value in commun-year [0,1]" drop mnex prop sort commun hhnum year save "Clustering\house_wealth_temp.dta", replace *********************** * (3) REMITTANCE DATA * *********************** * This information is only available in mig124.dta for household * heads during the last trip. We use it as an outcome variable. use "Data\mig124.dta", clear * As Durand et al. (1996) argue, migrants may either send monthly remittances (most US settlers * do this, or they may bring back their savings on their return (most temporary migrants do this). gen year = usyrl replace year = . if year==9999 replace remit = . if remit==9999 gen remitfl = remit replace remitfl = 1 if remit>0 & remit~=. lab var remitfl "sent remittances? (0/1)" * Both remittances and savings are measured monthly * Keep them as is * Monthly savings are recorded in 'savings' variable. We do not know * if these savings were all brought to Mexico. So, we use 'savretrn' variable * which records total savings brought to Mexico upon return. We divide it by * the U.S. last visit duration, and compute monthly values. replace usdurl = . if usdurl==9999 replace savings = . if savings==9999 replace savretrn = . if savretrn==9999 gen savm = savretrn/usdurl gen savfl = savm replace savfl = 1 if savm>0 & savm~=. lab var savm "average monthly savings brought to Mexico" lab var savfl "brought savings from Mexico? (0/1)" * Add remittances and savings, compute logs * Use rowtotal() which treats missing as 0, then replace with missing only if both are missing. egen remsavm = rowtotal(remit savm) replace remsavm = . if remit==. & savm==. * IMPORTANT NOTE - remsav variable is converted to constant 2000 US$ below ************************************************************************** gen remsavfl = . replace remsavfl = 1 if remitfl==1 | savfl==1 replace remsavfl = 0 if remitfl==0 & savfl==0 replace remsavfl = 0 if remitfl==0 & savfl==. replace remsavfl = 0 if remitfl==. & savfl==0 * Illegal crossing information * ******************************** * Note a migrant may have crossed illegally several times within a year. Our goal is to record * whether the migrant crossed illegally in his or her first trip to the U.S. Check for the year of the * second or third attempt only if the prior attempts were unsuccessful (i.e., crsyes#==2) * First determine the undocumented migrants on their first trip gen undoc = 0 replace undoc = 1 if crsyr1==usyr1 & crsyr1<8888 replace undoc = 1 if crsyr2==usyr1 & crsyr2<8888 & crsyes1==2 replace undoc = 1 if crsyr3==usyr1 & crsyr3<8888 & crsyes1==2 & crsyes2==2 lab var undoc "Migrant (hh head) undocumented on first trip?" * Used coyote? (All migrants that attempt to cross illegally do so by their third trip.) gen coyote = 0 if undoc==1 replace coyote = 1 if undoc==1 & (crscoy1==1 | crscoy2==1 | crscoy3==1) lab var coyote "Undocumented migrant (hh head) used coyote on first trip?" * How did s/he cross the border? gen crosshow = . replace crosshow = crshow1 if undoc==1 & crshow1<8888 replace crosshow = crshow2 if undoc==1 & crshow2<8888 & crsyes1==2 replace crosshow = crshow3 if undoc==1 & crshow3<8888 & crsyes1==2 & crsyes2==2 gen crossfamfr = . replace crossfamfr = 1 if crosshow>=2 & crosshow<=4 replace crossfamfr = 0 if crosshow==1 | crosshow==5 lab var crossfamfr "Undoc migrant (hh head) crossed with family or friends (or alone or w strangers)?" * Where did s/he cross the border? gen cstate = . replace cstate = crsst1 if undoc==1 & crsst1<1111 replace cstate = crsst2 if undoc==1 & crsst2<1111 & crsyes1==2 replace cstate = crsst3 if undoc==1 & crsst3<1111 & crsyes1==2 & crsyes2==2 gen crosstij = 0 if cstate>2 & cstate~=. replace crosstij = 1 if cstate==2 lab var crosstij "Undoc migrant (hh head) crossed @ Tijuana?" gen persnum = 1 * There is one individual who is observed twice. Drop it. bys commun hhnum persnum: drop if _n==2 keep commun hhnum persnum year remit remitfl savretrn savm savfl remsavfl remsavm undoc coyote crossfamfr crosstij sort commun hhnum persnum save "Clustering\mig124_temp.dta", replace ***************************** * (4) CONTEXTUAL INDICATORS * ***************************** * (a) Rainfall * **************** * These data are collected at the community-level by Jessica Roman-Salazar. * Missing observations (about half) are completed by state-level indicators. use "Clustering\all_rain_red2.dta", clear sort commun merge commun using "Clustering\commun_statenum.dta" keep if _merge==3 drop _merge sort statenum merge statenum using "Data\environs.dta", keep(annual*) keep if _merge==3 drop _merge bys commun year: keep if _n==1 rename rain crain * Compute annual rainfall by state from environs data * Generate a time-specific rainfall variable gen rain = . forvalues i=41(1)99 { replace rain = annual`i' if year==19`i' } forvalues i=0(1)5 { replace rain = annual0`i' if year==200`i' } * 2006-2008 values for rainfall are missing - assume they are equal to the 2005 value. sort commun year by commun: replace rain = rain[_n-1] if year>=2006 lab var rain "annual rainfall in centimeters" * Replace the missing and incomplete values of community-level rainfall with * state-level data replace crain = rain if crain==. | incomplete==1 * There are 9 observations with very high values of rainfall. Set these to the state average replace crain = rain if crain>2500 replace crain = crain/100 lab var crain "annual rainfall to community in meters" * Create lagged values of rainfall gen crain1 = crain sort commun year by commun: replace crain1 = crain[_n-1] if _n>1 lab var crain1 "annual rainfall to community in meters (t-1)" gen crain2 = crain1 sort commun year by commun: replace crain2 = crain1[_n-1] if _n>1 lab var crain2 "annual rainfall to community in meters (t-2)" gen crain3 = crain2 sort commun year by commun: replace crain3 = crain2[_n-1] if _n>1 lab var crain3 "annual rainfall to community in meters (t-2)" gen recent_crain = (crain1 + crain2 + crain3)/3 lab var recent_crain "average rainfall to community in past 3 years (t-1 to t-3)" sort commun year save "Clustering\rain_temp.dta", replace * (b) Distance * **************** use "Clustering\commun_distance.dta", clear sort commun gen km = distance/10000 lab var km "10,000km to U.S. border" sort commun save "Clustering\distance_temp.dta", replace * (c) Community-level Indicators * ********************************** use "Data\commun124.dta", clear * Use only the community indicators that are available over time (note - most * community indicators apply only to the survey year, which is different than * the year of first migration for most migrants. keep commun pratio* lfp* agri* manu* serv* self* ltmin* minx* metrocat compop* manprdct hctirlnd /// hctrnlnd polcat yrprim yrsecon yrbank1 yrpave yrpavehw ejido dtaward1 dtawardl awardsz hctejido sort commun save "Clustering\commun124_temp.dta", replace * SAMPLE I - with PERS DATA * ***************************** use "Data\pers124.dta", clear keep commun hhnum persnum hhmemshp surveypl surveyyr relhead sex age marstat edyrs occ hhincome ldowage /// usborn usyr1 usyrl usdur1 usdurl usdoc1 usdocl usstate1 usstatel usplace1 usplacel usmar1 usmarl /// usocc1 usoccl uswage1 uswagel usby1 usbyl uscurtrp ustrips usexp legyrapp legyrrec legspon /// cityrapp cityrrec doyr1 dodur1 dostate1 doplace1 doocc1 doyrl dodurl dostatel doplacel dooccl /// dowagel dobyl docurtrp dotrips * Inital Sample Selection * *************************** * Compute the sample size in each community bys commun: gen csize = _N * Keep the migrants and the nonmigrants - all on survey year * We are essentially comparing those who've migrated at least * once to those who have never migrated. gen mig = 0 replace mig = 1 if usyr1 < 8888 replace usyr1 = . if usyr1==8888 | usyr1==9999 gen year = surveyyr * Drop individuals whose relationship to hh head is unknown drop if relhead==8888 | relhead==9999 * Compute the ages of migrants at the time of their first migration. replace age = . if age==9999 | age==8888 gen agem = age - (surveyyr-usyr1) lab var agem "age at the time of first migration" * Keep only the individuals who are older than 15 and younger than 65 at the time of first migration. * This is to make sure that moves are not associational (children or elderly migrating with other hh members). drop if (agem<15 | agem>65) & agem~=. gen educ = edyrs replace educ = . if educ==8888 | educ==9999 * There are 10 observations with the same commun hhnum persnum - drop them bys commun hhnum persnum: keep if _n==1 * MERGE * ********* sort commun merge commun using "Clustering\commun124_temp.dta" drop _merge sort commun year merge commun year using "Clustering\rain_temp.dta", keep(crain* recent_crain rain) drop if _merge==2 drop _merge sort commun merge commun using "Clustering\distance_temp.dta", keep(km) drop if _merge==2 drop _merge sort year merge year using "Clustering\natlyear_temp.dta" drop if _merge==2 drop _merge sort commun hhnum year merge commun hhnum year using "Clustering/house_wealth_temp.dta" drop if _merge==2 drop _merge sort commun hhnum persnum merge commun hhnum persnum using "Clustering/mig124_temp.dta" drop if _merge==2 * Note that these observations are those outside the time frame 1965-2005 drop _merge * VARIABLES OF INTEREST * ************************* * Demographic Characteristics * ******************************* replace sex = 0 if sex==2 gen educcat = 0 replace educcat = 1 if educ>=6 & educ<9 replace educcat = 2 if educ>=9 & educ<12 replace educcat = 3 if educ>=12 & educ<16 replace educcat = 4 if educ>=16 lab define educlab 0 "less than pri" 1 "pri" 2 "some sec" 3 "secondary" 4 "adv" lab val educcat educlab * Occupations in Origin and Destination * ***************************************** * NOTE - Origin occupation is measured in the survey year rename occ o rename usocc1 u gen mexocc = . replace mexocc = 1 if o<=99 replace mexocc = 2 if o>=110 & o<=129 replace mexocc = 3 if o>=130 & o<=219 replace mexocc = 4 if o>=410 & o<=419 replace mexocc = 5 if o>=510 & o<=549 replace mexocc = 6 if o>=550 & o<=839 gen usocc = . replace usocc = 1 if u<=99 replace usocc = 2 if u>=110 & u<=129 replace usocc = 3 if u>=130 & u<=219 replace usocc = 4 if u>=410 & u<=419 replace usocc = 5 if u>=510 & u<=549 replace usocc = 6 if u>=550 & u<=839 rename o occ drop u lab var mexocc "Occupation in Mexico in Survey year" lab var usocc "Occupation in the US" lab define occlab 1 "unemployed" 2 "prof/tech" 3 "educ, arts, admin" /// 4 "agriculture" 5 "manufacturing" 6 "service" lab val mexocc occlab lab val usocc occlab gen mx_none = (mexocc==1) gen mx_agri = (mexocc==4) gen mx_manuf = (mexocc==5) gen mx_serv = (mexocc==6) gen mx_oth = (mexocc==2 | mexocc==3) gen us_none = (usocc==1) gen us_agri = (usocc==4) gen us_manuf = (usocc==5) gen us_serv = (usocc==6) gen us_oth = (usocc==2 | usocc==3) * Domestic Migration Experience * ********************************* gen mxmig = 0 replace mxmig = 1 if year>=doyr1 & doyr1~=8888 & doyr1~=9999 lab var mxmig "Individual migrated within Mexico?" * We can know the total number of mexican trips only if the last mexican trip * took place before the first U.S. migration trip gen mxtrip = . replace mxtrip = dotrips if year>=doyrl & doyrl~=8888 & doyrl~=9999 & dotrips~=8888 & dotrips~=9999 lab var mxtrip "No of domestic trips (if last trip took place b4 first U.S. trip)" * Prior migrants in the household * *********************************** * For each year, compute the total number of migrants from a household. * This variable, miginy, should equal the total for only one migrant from each * year so that the cumulative sums for household over time are not inflated. * See the example of commun==3 & hhnum==166 * list commun hhnum year persnum mig if commun==3 & hhnum==166 * Create a dummy for migration until last year (i.e., exclude individuals whose * first migration is the survey year) - We do this to be consistent with the prior * definition of hhmig - number of migrants until last year gen mig_temp = 0 replace mig_temp = 1 if year >= usyr1+1 bys commun hhnum: egen hhmig = total(mig_temp) * Exclude the index individual replace hhmig = hhmig - mig_temp * list commun hhnum year persnum mig hhmig if commun==3 & hhnum==166 lab var hhmig "ever migrants from hh (excl. ind) prior to survey year" * Prior LEGAL migrants in the household * ***************************************** * Flag individuals who have been legalized until last year gen leg = 0 replace leg = 1 if year >= legyrrec+1 replace leg = 1 if year >= cityrrec+1 bys commun hhnum: egen nhhleg = total(leg) * Exclude the index individual replace nhhleg = nhhleg - leg lab var nhhleg "number of hh mems legalized prior to survey year" gen hhleg = nhhleg>0 lab var hhleg "any hh mems legalized prior to survey year" * Compute the total number of not-legalized migrants in hh * ************************************************************ gen nhhnleg = 0 replace nhhnleg = hhmig - nhhleg * For about 200 observations we have a negative number - possibly measurement error * Correct replace nhhnleg = 0 if nhhnleg < 0 gen hhnleg = nhhnleg>0 lab var hhnleg "any hh migs not legalized prior to survey year" lab var nhhnleg "number of hh migrants not legalized prior survey year" * Prior migrants in the community sample * ****************************************** sort commun year by commun: egen cmig = total(mig_temp) gen pcmig = cmig/csize lab var pcmig "proportion of migrants in community sample upto t-1" * Household Wealth * ******************** * Lag wealth variables - Land, Property, Business * gen tlandlag = tland gen vlandlag = vland gen lnvlandlag = lnvland gen tproplag = tprop gen troomlag = troom gen lntroomlag = lntroom gen tbuslag = tbus sort commun hhnum year by commun hhnum: replace tlandlag = tland[_n-1] if _n>1 by commun hhnum: replace vlandlag = vland[_n-1] if _n>1 by commun hhnum: replace lnvlandlag = lnvland[_n-1] if _n>1 by commun hhnum: replace tproplag = tprop[_n-1] if _n>1 by commun hhnum: replace troomlag = troom[_n-1] if _n>1 by commun hhnum: replace lntroomlag = lntroom[_n-1] if _n>1 by commun hhnum: replace tbuslag = tbus[_n-1] if _n>1 * Migration Prevalence in Community * ************************************* gen prev = . replace prev = pratio50 if year>=1950 & year<1960 replace prev = pratio60 if year>=1960 & year<1970 replace prev = pratio70 if year>=1970 & year<1980 replace prev = pratio80 if year>=1980 & year<1990 replace prev = pratio90 if year>=1990 & year<2000 replace prev = pratio00 if year>=2000 lab var prev "Mig prevalence in community in decade" * Community Indicators * ************************ gen prisch = 0 replace prisch = 1 if year>yrprim lab var prisch "pri school in commun in year?" gen secsch = 0 replace secsch = 1 if year>yrsecon lab var secsch "sec school in commun in year?" gen bank = 0 replace bank = 1 if year>yrbank1 lab var bank "any bank in community in year?" gen road = 0 replace road = 1 if year>yrpave lab var road "paved roads in community in year?" gen roadhw = 0 replace roadhw = 1 if year>yrpavehw lab var roadhw "paved road from commun to highway in year?" * Indicator for ejido - a program that gives land - usually taken as an * incentive to migrate to obtain financing to work the land (new economics) * Ejido indicator applies to the survey year - to determine whether ejido was * established at the year of first migration for each individual, we use 'dtaward1' * variable. This is missing for about 4000 observations. If ejido = 1 for these observations, * assume ejido existed through the period the community is observed (the assumption is that, * if informants cannot recall it, it was many years ago.) If ejido = 9999 (missing), assume * ejido was NOT established. gen ejidoy = 0 replace ejidoy = 1 if year>=dtaward1 & dtaward1~=8888 & dtaward1~=9999 replace ejidoy = 1 if ejido==1 & dtaward1==9999 lab var ejidoy "ejido in community-year?" * Community Economic Indicators * ********************************* gen compop = . replace compop = compop60 if year>=1960 & year<1970 replace compop = compop70 if year>=1970 & year<1980 replace compop = compop80 if year>=1980 & year<1990 replace compop = compop90 if year>=1990 & year<2000 replace compop = compop00 if year>=2000 lab var compop "population of community 50-00" gen lncompop = ln(compop) lab var lncompop "log of population of community 50-00" lab def metrolab 1 "metropolitan" 2 "smaller urban" 3 "town" 4 "rancho" lab val metrocat metrolab gen minx2 = . replace minx2 = minx270 if year>=1965 & year<1980 & minx270~=8888 replace minx2 = minx280 if year>=1980 & year<1990 & minx280~=8888 & minx280~=9999 replace minx2 = minx290 if year>=1990 & year<2000 replace minx2 = minx200 if year>=2000 lab var minx2 "prop. lf earning 2x min wage 70-00" gen self = . replace self = self60 if year>=1960 & year<1970 & self60~=8888 replace self = self70 if year>=1970 & year<1980 & self70~=8888 replace self = self80 if year>=1980 & year<1990 & self80~=8888 replace self = self90 if year>=1990 & year<2000 & self90~=8888 replace self = self00 if year>=2000 lab var self "prop. lf self-employed 50-00" gen manuf = . replace manuf = manuf60 if year>=1960 & year<1970 & manuf60~=8888 replace manuf = manuf70 if year>=1970 & year<1980 & manuf70~=8888 replace manuf = manuf80 if year>=1980 & year<1990 & manuf80~=8888 replace manuf = manuf90 if year>=1990 & year<2000 replace manuf = manuf00 if year>=2000 lab var manuf "prop. lf in manuf 50-00 females" gen manum = . replace manum = manum60 if year>=1960 & year<1970 & manum60~=8888 replace manum = manum70 if year>=1970 & year<1980 & manum70~=8888 replace manum = manum80 if year>=1980 & year<1990 & manum80~=8888 replace manum = manum90 if year>=1990 & year<2000 replace manum = manum00 if year>=2000 lab var manum "prop. lf in manuf 50-00 males" gen manu = (manuf + manum)/2 lab var manu "prop. lf in manuf 50-00" gen manu_lag = . replace manu_lag = (manum50 + manuf50)/2 if year>=1960 & year<1970 & manum50~=8888 & manuf50~=8888 replace manu_lag = (manum60 + manuf60)/2 if year>=1970 & year<1980 & manum60~=8888 & manuf50~=8888 replace manu_lag = (manum70 + manuf70)/2 if year>=1980 & year<1990 & manum70~=8888 & manuf50~=8888 replace manu_lag = (manum80 + manuf80)/2 if year>=1990 & year<2000 & manum80~=8888 & manuf50~=8888 replace manu_lag = (manum90 + manuf90)/2 if year>=2000 lab var manu_lag "prop. lf in manuf - lagged by a decade" gen dmanu = (manu - manu_lag)/manu_lag*100 lab var dmanu "change in the prop. of lf in manuf in the last decade" gen agrif = . replace agrif = agrif60 if year>=1960 & year<1970 & agrif60~=8888 replace agrif = agrif70 if year>=1970 & year<1980 & agrif70~=8888 replace agrif = agrif80 if year>=1980 & year<1990 & agrif80~=8888 replace agrif = agrif90 if year>=1990 & year<2000 replace agrif = agrif00 if year>=2000 lab var agrif "prop. lf in agriculture 50-00 females" gen agrim = . replace agrim = agrim60 if year>=1960 & year<1970 & agrim60~=8888 replace agrim = agrim70 if year>=1970 & year<1980 & agrim70~=8888 replace agrim = agrim80 if year>=1980 & year<1990 & agrim80~=8888 replace agrim = agrim90 if year>=1990 & year<2000 replace agrim = agrim00 if year>=2000 lab var agrim "prop. lf in agriculture 50-00 males" gen agri = (agrif + agrim)/2 lab var agri "prop. lf in agriculture 50-00" gen agri_lag = . replace agri_lag = (agrim50 + agrif50)/2 if year>=1960 & year<1970 & agrim50~=8888 & agrif50~=8888 & agrim50~=9999 & agrif50~=9999 replace agri_lag = (agrim60 + agrif60)/2 if year>=1970 & year<1980 & agrim60~=8888 & agrif50~=8888 replace agri_lag = (agrim70 + agrif70)/2 if year>=1980 & year<1990 & agrim70~=8888 & agrif50~=8888 replace agri_lag = (agrim80 + agrif80)/2 if year>=1990 & year<2000 & agrim80~=8888 & agrif50~=8888 replace agri_lag = (agrim90 + agrif90)/2 if year>=2000 lab var agri_lag "prop. lf in agriculture - lagged by a decade" gen dagri = (agri - agri_lag)/agri_lag*100 lab var dagri "change in the prop. of lf in agriculture in the last decade" gen ltmin = . replace ltmin = ltmin70 if year<1980 & ltmin70~=8888 & ltmin70~=9999 replace ltmin = ltmin80 if year>=1980 & year<1990 & ltmin80~=8888 & ltmin80~=9999 replace ltmin = ltmin90 if year>=1990 & year<2000 & ltmin90~=8888 & ltmin90~=9999 replace ltmin = ltmin00 if year>=2000 & ltmin00~=8888 & ltmin00~=9999 lab var ltmin "prop. lf w/ less than min wage" gen lfpf = . replace lfpf = lfpf60 if year>=1960 & year<1970 & lfpf60~=8888 replace lfpf = lfpf70 if year>=1970 & year<1980 & lfpf70~=8888 replace lfpf = lfpf80 if year>=1980 & year<1990 & lfpf80~=8888 replace lfpf = lfpf90 if year>=1990 & year<2000 & lfpf90~=8888 replace lfpf = lfpf00 if year>=2000 & lfpf00~=8888 lab var lfpf "lf participation rate" gen lfpm = . replace lfpm = lfpm60 if year>=1960 & year<1970 & lfpm60~=8888 replace lfpm = lfpm70 if year>=1970 & year<1980 & lfpm70~=8888 replace lfpm = lfpm80 if year>=1980 & year<1990 & lfpm80~=8888 replace lfpm = lfpm90 if year>=1990 & year<2000 & lfpm90~=8888 replace lfpm = lfpm00 if year>=2000 & lfpf00~=8888 lab var lfpm "lf participation rate - males" gen lfp = (lfpf + lfpm)/2 lab var lfp "lf participation rate" gen lfp_lag = . replace lfp_lag = (lfpf50 + lfpm50)/2 if year>=1960 & year<1970 & lfpf50~=8888 & lfpm50~=8888 replace lfp_lag = (lfpf60 + lfpm60)/2 if year>=1970 & year<1980 & lfpf60~=8888 & lfpm60~=8888 replace lfp_lag = (lfpf70 + lfpm70)/2 if year>=1980 & year<1990 & lfpf70~=8888 & lfpm70~=8888 replace lfp_lag = (lfpf80 + lfpm80)/2 if year>=1990 & year<2000 & lfpf80~=8888 & lfpm80~=8888 replace lfp_lag = (lfpf90 + lfpm90)/2 if year>=2000 & lfpf90~=8888 & lfpm90~=8888 lab var lfp_lag "lf participation rate - lagged by a decade" gen dlfp = (lfp-lfp_lag)/lfp_lag*100 lab var dlfp "change in lf participation rate in the last decade" gen lnmanp = ln(manprdct) lab var lnmanp "log of annual value of manufacturing in municipio" * Mexican Economic Indicators * ******************************* replace usavwage = . if usavwage==8888 replace mxunemp = . if mxunemp==9999 replace infrate = infrate/100 * Mexican Minimum Wage * ************************ replace mxminwag = . if mxminwag==8888 * IMP NOTE - In year 1993, three zeros were taken out of the Mexican peso. replace mxminwag = mxminwag/1000 if year<1993 * Convert to constant pesos replace mxminwag = mxminwag*const00_mx * Convert to U.S. dollars - note you need to use the exchange rate in 2000 * since the values are in 2000 pesos * Note the codebook reports the exchange rate as USD/Peso but in * fact it is Peso/USD replace mxminwag = mxminwag/9.572 * Convert to hourly wages (currently daily) - the trends observed over time are consistent * with other studies replace mxminwag = mxminwag/8 lab var mxminwag "Mexican hourly wages in constant 2000 pesos converted to U.S.$" * U.S. Average Wage * ********************* * I use usjanwage (from BLS) rather than usavwage provided by Massey and Espinosa. It is not * clear whether the latter is in real or nominal $. gen wratio = usjanwage/mxminwag lab var wratio "Hourly wage ratio (US/Mexico)" * OUTCOME VARIABLES * ********************* * These are the outcomes that cluster membership should predict. * Remittances * *************** * Note - remittance information is only available for household heads on their * last trip - here, we have merged remittance data based on individual id not year. * Therefore, we have remittance information for all hh heads on their last trip, although * they are observed on their first trip for the purposes of this study. We need to include * remittance information only for individuals who are on their first AND last observed trip. replace remit = . if usyr1~=usyrl replace savm = . if usyr1~=usyrl replace remsavm = . if usyr1~=usyrl replace remsavfl = . if usyr1~=usyrl gen remc = remit*const00 gen savmc = savm*const00 gen remsavmc = remsavm*const00 gen logremc = ln(remc+1) gen logsavmc = ln(savmc+1) gen logremsavm = ln(remsavm+1) gen logremsavmc = ln(remsavmc+1) lab var remc "heads only - monthly remittance during (first and) last trip (2000 US$)" lab var savmc "heads only - monthly savings during (first and) last trip (2000 US$)" lab var remsavm "heads only - monthly remittances and savings during (first and) last trip" lab var remsavmc "heads only - monthly remittances and savings during (first and) last trip (2000 US$)" lab var logremc "heads only - logged monthly remittances during (first and) last trip (2000 US$)" lab var logsavmc "heads only - logged monthly savings during last (first and) trip (2000 US$)" lab var logremsavm "heads only - logged monthly remittances and savings during (first and) last trip" lab var logremsavmc "heads only - logged monthly remittances and savings during (first and) last trip (2000 US$)" * U.S. Wages * ************** * Note that uswage1=8888 means migrant is not employed for wages. S/he may have his or her own * business. For now I am setting the wages of such individuals to missing. Alternatively, we can assume * the wages to be zero. replace uswage1 = . if uswage1==8888 | uswage1==9999 replace usby1 = . if usby1==8888 | usby1==9999 * Correct for potential measurement errors (e.g. migrants making more than 100$ hourly - 4 obs) * There are possible inconsistencies in weekly or monthly measures (e.g. migrants makine 2.5$ weekly), * but few observations. We ignore those for now. replace uswage1 = . if uswage1>=100 & usby1==1 * Compute yearly wages using the frequency of wage information gen uswageyr = . replace uswageyr = uswage1 * 40 * 52 if usby1==1 replace uswageyr = uswage1 * 5 * 52 if usby1==2 replace uswageyr = uswage1 * 52 if usby1==3 replace uswageyr = uswage1 * 26 if usby1==4 replace uswageyr = uswage1 * 12 if usby1==5 replace uswageyr = uswage1 if usby1==6 lab var uswageyr "Estimated yearly wages of migrant in 2000 U.S.$ (using uswage1)" * Convert to constant dollars replace uswageyr = uswageyr*const00 * tabstat uswageyr, by(year) stat(p25 p50 p75) gen uswagem = uswageyr/12 lab var uswagem "Estimated monthly wages of migrant in 2000 U.S.$ (using uswagel)" gen loguswageyr = ln(uswageyr+1) gen loguswagem = ln(uswagem+1) * U.S. Destination * ******************** rename usstate1 s gen usdiv = 1 if s==107 | s==120 | s==122 | s==130 | s==140 | s==146 replace usdiv = 2 if s==131 | s==133 | s==139 replace usdiv = 3 if s==115 | s==114 | s==123 | s==136 | s==150 replace usdiv = 4 if s==116 | s==117 | s==124 | s==126 | s==128 | s==135 | s==142 replace usdiv = 5 if s==108 | s==109 | s==110 | s==111 | s==121 | s==134 | s==141 | s==147 | s==149 replace usdiv = 6 if s==100 | s==124 | s==125 | s==143 replace usdiv = 7 if s==104 | s==119 | s==137 | s==144 replace usdiv = 8 if s==103 | s==106 | s==113 | s==132 | s==127 | s==145 | s==129 | s==151 replace usdiv = 9 if s==102 | s==105 | s==112 | s==138 | s==148 gen usreg = 1 if usdiv==1 | usdiv==2 replace usreg = 2 if usdiv==3 | usdiv==4 replace usreg = 3 if usdiv==5 | usdiv==6 | usdiv==7 replace usreg = 4 if usdiv==8 | usdiv==9 rename s usstate1 lab define usdivlab 1 "new england" 2 "middle atlantic" 3 "east north central" /// 4 "west north central" 5 "south atlantic" 6 "east south central" /// 7 "west south central" 8 "mountain" 9 "pacific" lab define usreglab 1 "northeast" 2 "midwest" 3 "south" 4 "west" lab var usdiv "Mig is in US Division" lab var usreg "Mig is in US Region" * MATLAB cannot read-in labeled values *lab val usdiv usdivlab *lab val usreg usreglab gen calif = 0 if usdiv~=. gen texas = 0 if usdiv~=. gen illin = 0 if usdiv~=. gen othdest = 0 if usdiv~=. replace calif = 1 if usstate1==105 replace texas = 1 if usstate1==144 replace illin = 1 if usstate1==114 replace othdest = 1 if usstate1~=105 & usstate1~=144 & usstate1~=114 * Migration Patterns * ********************** gen ttrip = ustrips replace ttrip = . if ttrip==9999 lab var ttrip "total number of US trips" gen texp = usexp replace texp = . if texp==9999 lab var texp "total months of US experience" gen logtexp = ln(texp) lab var logtexp "log of total months of US experience" gen repmig = (ttrip>=2) lab var repmig "Individual migrated again?" gen resid = . replace resid = 0 if legyrrec==8888 replace resid = 1 if legyrrec<8888 lab var resid "received legal residency (b4 or after first mig)" gen legmig = . replace legmig = 0 if undoc==1 replace legmig = 1 if undoc==0 replace legmig = 1 if legyrrec<=usyr1 & usyr1<8888 & legyrrec<8888 lab var legmig "Migrant had legal docs (hh heads only) or residency on first trip?" * LIST of OUTCOME VARS: undoc coyote crossfamfr crosstij remsavfl remsavmc logremsavmc * us_none us_agri us_manuf us_serv us_oth uswagem loguswagem usdiv usreg calif texas illin othdest * ttrip texp logtexp repmig resid legmig ****************************** ******* FINAL SAMPLE ********* ****************************** * Options: (1) Keep only the individuals who were interviewed in Mexico (NOT necessary for now) * (2) Keep only the indiviudals who are members of the households they were interviewed * in. Note that these individuals are the children of the hh heads, who have moved out of the household. * About 30% of such individuals are migrants (10K cases). Dropping them leads to significant information * loss. The downside of including them is that hh wealth may not apply to those individuals. It is reasonable * to assume that at the year of their first migration, these individuals were residing in the household. * drop if surveypl==2 * keep if hhmemship==2 * drop if hhmemship>=8888 gen hhmem = 0 replace hhmem = 1 if hhmemshp==2 drop rain usempgr tradegr ejido ren ejidoy ejido ren mx_none ocnone ren mx_agri ocagri ren mx_manuf ocmanuf ren mx_serv ocserv ren troomlag room ren lntroomlag lnroom ren vlandlag land ren lnvlandlag lnland ren tbuslag business ren mxminwag mxwage ren usjanwage uswage ren compop pop ren lncompop lnpop ren recent_crain rain ren infrate inf ren minx2 min2 ren lwhrs lw ren probapp app ren visaaccs visa ren insbudgt ins ren bpebudgt bpe ren mexrlint mxint ren totusemp usemp ren dfinvest fdi ren dfigr_ma fdigr ren usempgr_ma usempgr ren tradegr_ma tradegr gen bus = business>0 gen metro = metrocat gen head = relhead==1 gen met = metrocat<=2 * Drop all missing observations * IMPORTANT NOTE - If we drop observations with missing lw, app, visa, mxint, etc. as in the mmp_cluster_data9_final.do * We end up with more missing observations (i.e., less migrants) than init_data.dta used in clustering. The reason * for this is that we keep migrants on the survey year in this data set. So, when we drop observations with missing * contextual variables (mostly from 2002-2005) we lose more migrants. (In the other data set, migrants are observed * on their first trip, which is typically in much earlier years with no missing contextual information. ******************************************************************************************************************** * To make sure we have the exact same sample, we merge id's from init_data.dta, and drop the rest of the migrants, * while keeping all of the non-migrants. * The goal is to have 17,049 migrants exactly. sort commun hhnum persnum merge commun hhnum persnum using "Clustering\dataid_temp.dta" drop if mig==1 & _merge~=3 * For 159 migrant obs wealth measures are missing, while they are available in init_data. * Take values from that data set. * codebook room lnland business educ agri self lnpop if mig==1 replace room = room2 if mig==1 & room==. replace lnland = lnland2 if mig==1 & lnland==. replace business = business2 if mig==1 & business==. drop room2 lnland2 business2 * For non-migrants drop all missing observations drop if mig==0 & (educ==. | sex==9999 | room==. | lnland==. | business==. | educ==. | agri==. | self==. | lnpop==.) save "Clustering\init_data_mig_nonmig.dta", replace ren prisch pri ren secsch sec outsheet relhead age sex head educ pri sec mxmig ocagri ocmanuf ocserv room lnroom land lnland business /// hhmig nhhleg nhhnleg hhnleg hhleg pcmig prev agrim self ltmin met commun hhnum persnum mig /// using "Clustering\mig_nonmig_data.raw", replace