Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Visualization Team Project #51

Closed
wants to merge 23 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
62fc171
adding mortgage.csv
movcha Aug 8, 2024
9035533
Updated dashboard
krishnakishore163 Aug 9, 2024
a29c470
the is my team project 2_Jinrong Liu
Aug 10, 2024
2b9e076
Add files via upload
KristinaTalalaievska Aug 10, 2024
7f04b39
Add files via upload
KristinaTalalaievska Aug 10, 2024
49fd9ea
Add files via upload
KristinaTalalaievska Aug 10, 2024
705ee0d
README_Kristina.md
KristinaTalalaievska Aug 11, 2024
5f69792
4 visuals from movcha
movcha Aug 14, 2024
f2393a9
Merge pull request #4 from movcha/project_Kristina
movcha Aug 14, 2024
040928c
team_project
dogagzm Aug 18, 2024
58f7b8f
Video_upload
dogagzm Aug 19, 2024
8f50eda
this my update project with video record
Aug 19, 2024
f9e5346
Merge pull request #2 from movcha/project2_Krishna
krishnakishore163 Aug 20, 2024
c4b2df2
Merge pull request #3 from movcha/project2_Jinrong
krishnakishore163 Aug 20, 2024
4fcbf52
Merge pull request #6 from movcha/team_project_2_doga
krishnakishore163 Aug 20, 2024
55a0e3c
Merge pull request #5 from movcha/team_project_2_movcha
krishnakishore163 Aug 20, 2024
55d0ef1
Rename README_Jinrong to README_Team Project 2
liujinro Aug 20, 2024
8b3223f
Update README_Team Project 2
liujinro Aug 20, 2024
24625f6
Update README_Team Project 2
krishnakishore163 Aug 20, 2024
7be12f0
Update README_Team Project 2
krishnakishore163 Aug 20, 2024
76f14f5
Update README_Team Project 2
krishnakishore163 Aug 20, 2024
7dd7c2b
README_Team Project 2
KristinaTalalaievska Aug 21, 2024
c5d1f10
README_Team Project 2
KristinaTalalaievska Aug 21, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16,932 changes: 16,932 additions & 0 deletions 4_visuals_from_movcha.ipynb

Large diffs are not rendered by default.

116 changes: 116 additions & 0 deletions Project2_Jinrong.ipynb

Large diffs are not rendered by default.

138 changes: 138 additions & 0 deletions README_Team Project 2
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
Mortgage Risk Assessment Project

Project Overview
Welcome to the Mortgage Risk Assessment Project! This project aims to analyze and predict mortgage loan amounts and assess loan conditions (Good Loan vs. Bad Loan) using various machine learning techniques. Our team has designed, implemented, and tested regression and classification models to gain insights into the factors affecting loan amounts and conditions.

Team Members
• Doga Gizem (dogagzm)
• Krishna Kishore (krishnakishore163)
• Oleg Movchaniuk (movcha)
• Jinrong Liu (@liujinro)
• Kristina Talalaievska (@KristinaTalalaievska)

Visualizations

App.ipynb
1. An interactive risk profile dashboard using Python with Dash by Plotly which provides interactive elements like filters and sliders.
2. A correlation matrix heatmap is a powerful way to visualize the relationships between different variables and their influence on loan conditions

Detailed Interpretation of risk profile dashboard -
Income vs. Loan Amount:
o Observations: Look at how loan amounts vary with annual income. Higher incomes are associated with higher loan amounts
o Insights: Higher-income individuals tend to have good loan conditions even with large loan amounts, which suggest that higher income is a good predictor of loan quality.
Loan Condition Distribution:
o Observations: Different colors represent loan conditions. By analyzing the distribution of colors in relation to income and loan amount, we can identify which income groups or loan amounts are associated with higher rates of bad loans.
o Insights: Most bad loans are clustered in certain income brackets or loan amounts which indicate risky segments that need closer monitoring.

Filtering Effects
Income Category Filter: Selecting specific income categories will update the scatter plot to show only loans from the chosen income brackets. This helps in analyzing how loan conditions change with different income levels.
Home Ownership Filter: Applying this filter shows how loan conditions vary based on home ownership status. For example, Rented homes show a higher rate of bad loans, which point to the risk associated with rental properties.
Loan Term Filter: Filtering by loan term will display how loan conditions differ across various loan durations.

Users can interact with dropdowns to explore different scenarios and combinations. This interactivity allows for dynamic exploration of data, revealing how various factors like income, home ownership, and loan term interact to impact loan conditions.

Correlation Matrix Heatmap
A correlation matrix shows the correlation coefficients between pairs of variables. The heatmap uses colors to represent correlation values. Darker colors often indicate stronger correlations.
• Interest_rate has a high negative correlation with loan_condition_encoded.
• dti or loan_amount are strongly correlated with bad loan conditions
By visualizing the correlation matrix heatmap, we gain a clearer understanding of how different variables interact and their influence on loan conditions. This is essential for effective risk management and improving predictive models.



Project2_Jinrong.ipynb

Topic: How the loan conditions (Good Loan vs. Bad Loan) affect the total payment and loan amounts over time
Objective: The primary goal is to visualize the relationship between the loan amount and total payment over the years, differentiated by loan conditions (Good Loan vs. Bad Loan). The aim is to track trends and observe how loan amounts and payments have evolved over time.
Goal: To provide insights into the financial performance and risk associated with different loan conditions across years. This can help in understanding patterns in loan repayment behavior and the financial impact of good and bad loans.
Analysis:
-By observing the bar heights and the trend line, we can analyze whether there's an increase in good or bad loans over the years. For instance, if the bar for Good Loans significantly outweighs Bad Loans in the later years, it could indicate an improvement in the lending criteria or economic conditions.
-By comparing the relative sizes of bars representing Good Loans vs. Bad Loans, one could assess the financial risk. If Bad Loans increase over time, this may indicate growing risk.
Audience:
-Financial Analysts who need loan to perform decision-making; Stakeholders who need to see the loan trends based year to year.
Visualization:
-Bar charts are well-suited for this type of data because they allow for easy comparison between different categories (Good vs. Bad Loans) over time. The trend line adds predictive value, helping to identify long-term trends in the data.
Frameworks:
-Using Matplotlib is appropriate for this task as it provides flexibility and customization, which is essential when dealing with complex financial data.
Design:
-Use color-coding and clear legends to make it easy for the audience to differentiate between Good and Bad Loans.
-The current choice—a bar chart with a trend line—is effective for comparing different categories over time. The bars allow for a clear comparison of total payments and loan amounts, while the trend line indicates general movement or predictions.
-Use clear, non-technical language in labels and legends to make the chart accessible to people with varying levels of financial knowledge.
Data:
-The visualization should accurately reflect the underlying data without exaggerating trends or omitting important information. This can be done by carefully choosing the scales and ensuring that the trend line accurately reflects the data.
-Provide context in the title or through annotations to guide the audience in interpreting the data correctly. Consider adding notes or a legend to explain any assumptions or calculations.
Summary:
The visualization effectively meets the primary objective of showing trends in loan amounts and payments over time, segmented by loan condition. By carefully selecting colors, ensuring accuracy, and considering feedback, this visualization can be refined to communicate its message clearly to the target audience. Privacy considerations should also be kept in mind, especially if dealing with sensitive financial data.

Team Project 2_Kristina Talalaievska

- Visualization #1 in Tableu:
Growing confidence in the lending market during decade from 2006 -2016
https://public.tableau.com/views/Mortgagemarketinvestigation/Sheet1?:language=en-US&:sid=&:redirect=auth&:display_count=n&:origin=viz_share_link

- Topic:
Growing Confidence in the Lending Market: Analysis of Trends from 2006 to 2016 Using Tableau

- Objective:
To utilize Tableau for a detailed exploration of the lending market trends between 2006 and 2016, focusing on how visual data representation can reveal insights into the increasing confidence within the market.

- Goal:
To provide a visual and data-driven understanding of the lending market’s evolution over the decade, identifying key factors that contributed to the growth in lender and borrower confidence, and to create an interactive dashboard for stakeholders to explore these trends dynamically.
Analysis:
- Yearly Trends: the sum of loan amounts from 2006 to 2016, highlighting growth or decline over the years.
- Regional Performance: comparing loan amounts across different regions to identify areas of high and low activity.
- Interest Rate Correlations: visualizing the relationship between interest rates and loan volumes, offering insights into how rates influenced lending activity.
- Hotspot Identification: showing regional loan amounts over the years to identify lending hotspots and areas with lower activity.

- Audience:
The primary audience includes financial analysts, bank executives, policy makers, and investors interested in understanding the dynamics of the lending market. The interactive nature of the Tableau visualizations also makes it accessible to business stakeholders who benefit from exploring data in a visual format.
- Design:
It is designed for clarity and ease of use, with consistent color schemes and clear labeling. Each visualization is interactive, allowing users to drill down into specific years, regions, or other relevant data points. The design ensures that complex data is presented in an accessible and engaging manner.
- Data:
- Loan Amounts: Summed annually and regionally.
- Years: Spanning from 2006 to 2016, used to track trends over time.

- Methodologies used:
1. KNN targets to classify data points based on their proximity to other points in the dataset.
2. Logistic Regression aiming to predicts the probability of asthma occurrence by modeling the relationship between the dependent variable (diagnosis) and various independent variables (age, gender, ethnicity and education level).

- Visualization #2 in Tableu:
Identifying regions higher on loan amounts and showing more economic growth
https://public.tableau.com/app/profile/kristina.talalaievska/viz/Identifyingregionshigheronloanamountsandshowingmoreeconomicgrowth/Sheet2

- Topic:
Identifying Regions with Higher Loan Amounts and Correlating Economic Growth

- Objective:
To analyze and visualize the relationship between higher loan amounts in specific regions and their correlation with economic growth, using Tableau as the primary tool for data-driven insights.

- Goal:
To provide a clear understanding of which regions have received the highest loan amounts and how this correlates with indicators of economic growth, offering valuable insights for stakeholders in finance, policy-making, and regional development.
Analysis:
- Regional Loan Distribution: Identifying which regions have higher loan amounts.
- Economic Growth Correlation: Analyzing how these loan amounts correlate with economic growth indicators such as GDP, employment rates, or other relevant economic metrics.
- Comparison Across Regions: Comparing economic performance across regions with varying loan amounts to assess the impact of lending on economic development.
- Audience:
The primary audience includes economic analysts, regional planners, financial institutions, and government officials interested in understanding the impact of lending on regional economic growth. Investors and businesses looking to make informed decisions about regional markets may also benefit from these insights.
- Design:
The Tableau dashboard is designed to be intuitive and visually engaging, with interactive elements that allow users to filter data by region or economic indicator. Consistent color schemes and clear labeling help ensure that the data is easily interpretable.
- Data:
The data used in the Tableau visualizations includes:
- Loan Amounts: Summed regionally to identify areas with higher lending.
- Regions: Categorized for geographical analysis, making it easy to compare economic performance across different areas.

Rules of Engagement:
1. Transparent and clear communication within the team members.
2. Collaboration and teamwork.
3. Respect and professionalism in sharing different viewpoints and constructive feedbacks.
4. Identifying strong expertise of each team member to effectively contribute to the project.
5. Conducting team meetings and summarizing the key points for discussion on the common shared Google Doc.
6. Transparency in sharing ideas openly.
7. Supporting each other and clarifying if one of the team members has difficulties in understanding the project process.
8. Continuous improvement in developing the model.


Video Links

Video Link_Jinrong Liu: https://vimeo.com/1000558863/d0e722b4f2?share=copy
Video Link_Krishna Kishore: https://vimeo.com/1000959222/4afd8dce93?share=copy
Video Link_Kristina Talalaievska: https://drive.google.com/file/d/1hgNK0Y32LRxVpXNiHaegi-lkWQNTQ1SH/view?usp=share_link
1 change: 1 addition & 0 deletions app.ipynb

Large diffs are not rendered by default.

54 changes: 54 additions & 0 deletions data/Graph.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns


file_path = 'C:/Users/furka/team_project/data/mortgage.csv'
data = pd.read_csv(file_path)

# 1. Loan Amount vs Interest Rate (Scatter Plot)
plt.figure(figsize=(10, 6))
sns.scatterplot(x='loan_amount', y='interest_rate', data=data)
plt.title('Loan Amount vs Interest Rate')
plt.xlabel('Loan Amount')
plt.ylabel('Interest Rate (%)')
plt.grid(True)
plt.show()

# 2. Income vs DTI (Debt-to-Income Ratio) (Scatter Plot)
plt.figure(figsize=(10, 6))
sns.scatterplot(x='annual_inc', y='dti', data=data)
plt.title('Income vs Debt-to-Income Ratio')
plt.xlabel('Annual Income')
plt.ylabel('DTI (%)')
plt.grid(True)
plt.show()

# 3. Grade vs Interest Rate (Box Plot)
plt.figure(figsize=(10, 6))
sns.boxplot(x='grade', y='interest_rate', data=data)
plt.title('Grade vs Interest Rate')
plt.xlabel('Grade')
plt.ylabel('Interest Rate (%)')
plt.grid(True)
plt.show()

# 4. Loan Condition vs Total Payment (Bar Plot)
loan_condition_payment = data.groupby('loan_condition')['total_pymnt'].mean().reset_index()

plt.figure(figsize=(10, 6))
sns.barplot(x='loan_condition', y='total_pymnt', data=loan_condition_payment)
plt.title('Loan Condition vs Average Total Payment')
plt.xlabel('Loan Condition')
plt.ylabel('Average Total Payment')
plt.grid(True)
plt.show()

# 5. Histogram of Interest Rates
plt.figure(figsize=(10, 6))
sns.histplot(data['interest_rate'], bins=30, kde=True)
plt.title('Histogram of Interest Rates')
plt.xlabel('Interest Rate (%)')
plt.ylabel('Frequency')
plt.grid(True)
plt.show()
1 change: 1 addition & 0 deletions data/Video
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
https://drive.google.com/file/d/1fSrtiI2qsf8S9FBjv8rG0reGXyvNAIt-/view?usp=drive_link
Loading
Loading