The first thing I did is to normalize the tables based on the normal forms.Īccording to the first Norms, all the columns should only contain single information. The Yelp dataset is a subset of our businesses, reviews, and user data for use in connection with academic research. The first step of my task is to build a naive database where all the tables are designed exactly same as the data sources. This is exactly what I will simulate in later sections. This is a very typical industry practice which is to dump everything into NoSQL and then build ETL pipelines to transform data into a SQL database for analytics. ![]() Naive TablesĪpparently, the data was most likely stored in NoSQL database. The raw dataset contains five json files, just like what you will get by calling Yelp’s APIs. Since this is a very practical data source, I used it as an example to explain how I will design the tables optimizing for analytics. For more details, you can read it here at yelp website. The original purpose was to incentive researches into Image Classification Models. The dataset I used for this project was published by Yelp for its newest round of Challenge. Thanks to Metis, I started to get in touch with AWS eco-systems and Redshift was the first thing I wanted to play with. In addition to that, I was also exploring what options there are to replace our current database that runs on local distributed machines. This is what motivates me to learn how to design a good database. Plus, some databases will lock the session until the reading and updating is completed, which could cause problems if the database is a OLTP database where transactions took place frequently. However, a common problem I encountered is that, when I was trying to read data from the databases, the process sometimes took so long that generating a single report becomes time-consuming. ![]() ![]() Recently I have done a lot of work in writing SQL scripts and building ETL pipelines for various projects. Database design and analysis of Yelp Challenge Dataset
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |