======================
Analysis of the booking data of a hotel chain/ hotel cooperation with five properties in locations.
The dataset contains bookings made in five different hotels from one hotel chain. One line corresponds to one person. For example, a double room is usually displayed in two rows with the same booking number. (Could also be two single rooms) The customer id is also assigned per person and provides information about whether customers book in several houses or more than once in one house.
Dimensions:
- 98 features (57 original + 41 new)
- 245k assignments
- Help identify repeating customers, as well as the features used for predition.
- Recognition of guests who are willing to spend above average amounts of money (called VIPs).
- Predict in which quarter and destination customer will book their next stay.
- Jupyter notebook following PEP8 designed for data science / technical audience.
- Slide deck (pdf / 10min presentation) pushed to GitHub designed for non-technical stakeholders outlining findings and recommendations, as well as future work.
- How to identify repeating customers?
- What distinguishes VIPs from the other guests?
- What do repeating customers/ VIPs have in common?
this notebook:
- Merging the three datasets
- Cleaning
- handling missing values
- inconsistency checks
- change variable types
next notebooks:
- Feature engineering
- EDA
- Basic and advanded modelling
- TimeSeries
- Conclusion
- Columns reise_special_event & flag_old have no value for the analysis
- Columns lkz & sprache_deutsch have low value for the analysis
- Guest of the Viana do Castelo show a different booking behaviour than the rest
- Guest from Linz and Düsseldorf are very similar.
- Adaboost is best suited for forecasting regular guests.
- To identify solvent customers, a logistic regression with dummie variables turned out to deliver the best result.
- Hotels should focus on win customer to book less via travel agency
- Advertising ban does not have a negative impact on follow up bookings
- Adding additional information like:
- Events at the destination
- Cancellations
- Revenue per available room (required capacity for each destination)
- Feedback from tripadvisor, yelp, etc. as well as from hotel internal surveys
- dataset can not be published due to nda
- data/glossary.xlsx: list of all features
- slides/slides.pdf: slide deck visualising the findings (10min.)
- 1-Cleaning.ipynb: Data cleansing for subsequent EDA
- 2-EDA.ipynb: jupyter notebook with Exploratory Data Analysis (EDA), visualizations, further documentation
- 3-Modelling_Repeater.ipynb: predicting repeating customers
- 4-Modelling_VIPs.ipynb: predicting solvent guests
Pandas / NumPy / Matplotlib / Seaborn / sklearn
This code is licensed under the GNU General Public License v3.0. For more details, please take a look at the LICENSE file.