This week, i am blogging from the Biological Control and Spatial Ecology group at the Université libre de Bruxelles, where i am spending about 8 days learning how to optimize coding to automatize the creation of databases.
Sounds Chinese? Attentive followers will have noticed the many blog posts around big data lately. If big data means bringing many different unorthodox data sets together to explore correlation, then we need someone that can integrate the different data sources into one database for analysis...
So let us look at my challenge : i want to extract DHS (Demographic health survey) from about 50 countries, which have similar variables that have been coded slightly differently, which means if you simply exact the data you cannot put it together easily, because one will write Poorest, the other poorest and another one might spell it differently... So is my only choice to everything manually, i.e. repeat the procedure 50 times (which might be a source of error)? Or are there tricks and tips on how to automatize data extraction and integration in a big database?
Wanna follow my R journey? Follow the R! page here on my blog. It will collect all the small and big tips to make use of R for big and small data!