Forecasting the rental price of airbnb in New York City with tensorflow

Time:2021-5-12

Author: timothy102
Compile VK
Source: analytics vidhya

introduce

Airbnb is an online marketplace that allows people to rent out their properties or spare rooms to their guests. 12% and 6% commission will be charged for every 3 guests.

Since its establishment in 2009, the company has helped 21000 guests find accommodation every year and 6 million people take vacations every year. At present, it has listed an amazing 800000 properties in 34000 cities in 90 different countries.

In this paper, I will use the open data set of kaggle New York City airbnb to try to build a neural network model with tensorflow to predict.

The goal is to build an appropriate machine learning model that can predict the price of accommodation data in the future.

In this article, I’ll show you the jupyter notebook I created. You can find it on GitHub:https://github.com/Timothy102/Tensorflow-for-Airbnb-Prices

Loading data

First, let’s look at how to load the data. We use WGet to get data directly from kaggle. Note that the – O flag represents the file name.

The dataset should look like this. There are 48895 rows and 16 columns.

Data analysis and preprocessing

Seaborn has a very simple API, which can draw all kinds of graphics for all kinds of data. If you are not familiar with grammar, you can check out this article:https://www.analyticsvidhya.com/blog/2019/09/comprehensive-data-visualization-guide-seaborn-python/

After using corr on the panda data frame, we pass it to a Heatmap function. The results are as follows

Now that we have longitude and longitude and neighborhood data, let’s create a scatter plot:

In addition, I deleted the duplicate items and some unnecessary columns, and filled in “reviews”_ per_ “Month” because it has too many missing values. The data looks like this. It has 10 columns and no zero value:

Good, right?

First of all, computers do numbers. That’s why we need to convert the classification column into a one hot coded vector. This is done using the factorize method of panda. There are many other tools you can use:

In order to keep the loss function in a stable range, let’s normalize some data so that the average value is 0 and the standard deviation is 1.

Feature crossover

We have to make a change, which is essential. In order to correlate longitude and latitude with the model output, we must create a feature crossover. The following links should provide you with enough background knowledge so that you can correctly feel the feature crossover:

Our goal is to introduce latitude and longitude crossing, one of the oldest techniques in this book. If we only put these two columns into the model as values, it assumes that the values are progressively related to the output.

Instead, we’ll use feature crossing, which means we’ll split the longitude * longitude map into a grid. Fortunately, tensorflow makes it easy.

I iterate (max min) / 100 to generate a evenly distributed frame mesh.

I use 100 × 100 grid:

Essentially, what we’re doing here is defining a bucked column and the previously defined boundary, creating a densefeatures layer, and passing it to the sequential API.

If you are not familiar with tensorflow syntax, check the documentation:https://www.tensorflow.org/api_docs/python/tf/feature_column/

Now, finally, we’re ready for model training. In addition to splitting the data, that is to say.

Obviously, we have to create two datasets, one containing all the data and the other containing the forecast score. Because the data size does not match, this may cause problems to our model, so I decided to truncate the data that is too long.

Creating models

Finally, the keras sequence model is established.

We use Adam optimizer, mean square error loss and two metrics to compile the model.

In addition, we use two callbacks:

  • It goes without saying that we should stop early

  • Reduce the plateau learning rate.

After 50 epoch training, the batch size is 64, and our model is quite successful.

ending

We use new york city’s airbnb data to build a fully connected neural network to predict future prices. Pandas and Seaborn make it easy to visualize and examine data. We introduce the idea of crossing longitude and latitude as feature in the model. And thanks to kaggle’s open dataset, we get a fully operational machine learning model.

Link to the original text:https://www.analyticsvidhya.com/blog/2020/10/predicting-nyc-airbnb-rental-prices-tensorflow/

Welcome to panchuang AI blog:
http://panchuang.net/

Sklearn machine learning official Chinese document:
http://sklearn123.com/

Welcome to pancreato blog Resource Hub:
http://docs.panchuang.net/