description |
---|
Coming up with features is difficult, time-consuming, requires expert knowledge. "Applied machine learning" is basically feature engineering. — Andrew Ng |
- Featuretools/featuretools: automated feature engineering
- Best Practices for Feature Engineering
- Feature Engineering 相關文章推薦
- Feature Engineering 特徵工程中常見的方法
- Feature Engineering - Handling Cyclical Features
- 使用sklearn做單機特徵工程 - jasonfreak - 博客園
- Discover Feature Engineering, How to Engineer Features and How to Get Good at It
- 机器学习中,有哪些特征选择的工程方法? - 知乎
- target encoding for categorical features
- Python target encoding for categorical features | Kaggle
- 如何有效處理特徵範圍差異大且類型不一的數據? - 知乎
- An Introduction to Deep Learning for Tabular Data · fast.ai
- How to deal with Features having high cardinality
- Transform anything into a vector – Insight Data
- Dimensionality Reduction Algorithms: Strengths and Weaknesses
- Three Effective Feature Selection Strategies – AI³ | Theory, Practice, Business – Medium
- Feature Engineering and Selection: A Practical Approach for Predictive Models
- Feature Engineering for Machine Learning | Udemy
- Open Machine Learning Course. Topic 6. Feature Engineering and Feature Selection
- A Complete Machine Learning Walk-Through in Python: Part One
- Automated Feature Engineering in Python – Towards Data Science
- A Feature Selection Tool for Machine Learning in Python
- 數據科學中的陷阱:定性變量的處理 | 機器之心
- plasticityai/magnitude: A fast, efficient universal vector embedding utility package.
- featuretools-workshop/featuretools-workshop.ipynb at master · fred-navruzov/featuretools-workshop
- Featuretools for Good | Kaggle
- Feature Engineering 相關文章推薦 – Rick Liu – Medium
- Why Automated Feature Engineering Will Change the Way You Do Machine Learning
- Deep Feature Synthesis: How Automated Feature Engineering Works | Feature Labs
- https://www.slideshare.net/DataRobot/featurizing-log-data-before-xgboost
- A Hands on Guide to Automated Feature Engineering using Featuretools
- automated-feature-engineering/Automated_Feature_Engineering.ipynb at master · WillKoehrsen/automated-feature-engineering
- 如何用Python做自动化特征工程 | 机器之心
Indeed many estimators are designed with the assumption that each feature takes values close to zero or more importantly that all features vary on comparable scales. In particular, metric-based and gradient-based estimators often assume approximately standardized data (centered features with unit variances). A notable exception are decision tree-based estimators that are robust to arbitrary scaling of the data.
Imagine you have a categorical attribute, like “Item_Color” that can be Red, Blue or Unknown.
Unknown may be special, but to a model, it looks like just another colour choice. It might be beneficial to better expose this information.
You could create a new binary feature called “Has_Color” and assign it a value of “1” when an item has a color and “0” when the color is unknown.
Going a step further, you could create a binary feature for each value that Item_Color has. This would be three binary attributes: Is_Red, Is_Blue and Is_Unknown.
These additional features could be used instead of the Item_Color feature (if you wanted to try a simpler linear model) or in addition to it (if you wanted to get more out of something like a decision tree).
2nd place of a Kaggle competition:
I calculated the lag of "date_first_booking" and "date_account_created" and divided this lag feature into four categories (0, [1, 365], [-349,0), NA).