Projects completed while studying at "Data Scientist Plus" program by Yandex.Practicum (2021-2023)
State-recognized Diploma of professional retraining (PDF): English / Russian
Project | Description | Tools / Libraries | Status | |
---|---|---|---|---|
01 | Big cities music preferences | Check several statistical hypotheses regarding music preferences of people living in Moscow & Saint-Petersburg | pandas |
Done |
02 | Analysis of bank's borrower reliability | Analyze if the marital status and the number of children of the bank's client affect the fact of loan repayment on time | pandas |
Done |
03 | Analysis of advertisements for the sale of apartments | Based on data from Yandex.Estate, determine the market value of real estate in Saint-Petersburg | pandas , matplotlib |
Done |
04 | Study of data on film distribution | Perform analysis to study the film distribution market and identify current trends | pandas , numpy , matplotlib , seaborn |
Done |
05 | Determination of a prospective tariff for a telecom company | Telecom company's commercial department needs analysis to understand which tariff brings in more money | pandas , numpy , scipy , matplotlib , seaborn |
Done |
06 | Mobile tariff recommendation for a client | Classification model is required to select the appropriate mobile tariff for clients | pandas , numpy , matplotlib , seaborn , sklearn |
Done |
07 | Bank's customer churn modelling | Based on historical data about clients' activities, it is necessary to predict whether some particular client will stop being bank's client in the nearest future or not | pandas , numpy , matplotlib , seaborn , sklearn |
Done |
08 | Choosing a location for an oil well | Having data about oil samples in three geographoc regions, build a model to choose the most profitable oil well location | pandas , numpy , matplotlib , seaborn , sklearn |
Done |
09 | Predicting rejection of hotel reservation | Develop a model that predicts hotel booking rejection and find out if the profit from such model would cover model's development expenses | pandas , numpy , matplotlib , seaborn , sklearn |
Done |
10 | SQL Basics | Train writing basic SQL queries | SQL |
Done |
11 | Git and Command-Line | Practice using Git and Linux command-line workflow | git , cmd |
Done |
12 | California housing cost prediction | Build a linear regression model on California housing data in 1990-s to predict median cost of a house in a residential area | pandas , numpy , pyspark |
Done |
13 | Linear algebra to protect personal data | Needed to protect the data of clients of an insurance company by developing a data transformation method that makes it difficult to recover personal information | pandas , numpy , sklearn |
Done |
14 | Cars cost determining | Based on historical data about technical characteristics, completeness and prices of cars, it's needed to build a model to determine car costs | pandas , numpy , matplotlib , seaborn , sklearn , catboost , lightgbm |
Done |
15 | Advanced SQL | Write 10 more advanced SQL queries from Jupyter environment | SQL |
Done |
16 | Star temperature prediction | Having characteristics of 240 cosmic stars already studied, create a neural network to determine the temperature on the surface of the discovered stars |
pandas , numpy , matplotlib , seaborn , sklearn , pytorch |
Done |
17 | Risk of road accident assessing | Create a system that could assess the risk of a road accident along the selected route; find out whether it's possible to predict an accident based on the historical data of one of the regions | SQL , pandas , numpy , matplotlib , seaborn , sklearn , sqlalchemy , lightgbm , catboost |
Done |
18 | Forecasting taxi orders | Build a model to predict the number of taxi orders for the next hour to attract more drivers during the peak period | pandas , numpy , matplotlib , seaborn , sklearn , statsmodels , lightgbm |
Done |
19 | Classification of comments whether they are toxic or not | Having a labeled English comments dataset with toxity markup, create a model to classify them into positive or negative | pandas , numpy , matplotlib , seaborn , nltk , spacy , lightgbm , afinn , nrclex |
Done |
20 | Determining age of buyers by their photos | Build a model that will determine the approximate age of a person from a photograph by using a labeled dataset of people photographs | pandas , matplotlib , seaborn , keras |
Done |
21 | Search images by text query | Build a model that is capable to get a textual description of some scene, and return several photos with the same or similar scene | pandas , numpy , matplotlib , seaborn , sklearn , pytorch , transformers , torchvision |
Done |
22 | Predict telecom contract termination | Build a model that will predict whether the subscriber will terminate the contract with telecom company or not | SQL , pandas , numpy , matplotlib , seaborn , sqlalchemy , sklearn , lightgbm , catboost , pytorch |
Done |