From 25537c2d3f83422790fde50c6c7971e906f238e4 Mon Sep 17 00:00:00 2001 From: chielP Date: Wed, 24 Jan 2024 11:15:48 +0100 Subject: [PATCH] Revert "docs: Improve structure of user guide" (#13945) --- docs/_build/overrides/404.html | 2 +- docs/api/index.md | 2 +- docs/index.md | 58 ++++++ .../python/user-guide/basics/expressions.py | 23 ++- .../user-guide/basics/reading-writing.py | 7 +- .../src/rust/user-guide/basics/expressions.rs | 25 +-- .../rust/user-guide/basics/reading-writing.rs | 6 +- docs/user-guide/basics/expressions.md | 130 ++++++++++++ docs/user-guide/basics/index.md | 18 ++ docs/user-guide/basics/joins.md | 26 +++ docs/user-guide/basics/reading-writing.md | 45 +++++ docs/user-guide/concepts/index.md | 11 -- docs/user-guide/expressions/index.md | 18 -- docs/user-guide/getting-started.md | 186 ------------------ docs/user-guide/index.md | 39 ++++ docs/user-guide/io/index.md | 12 -- docs/user-guide/lazy/index.md | 10 - docs/user-guide/overview.md | 69 ------- docs/user-guide/transformations/index.md | 8 - mkdocs.yml | 19 +- 20 files changed, 360 insertions(+), 354 deletions(-) create mode 100644 docs/index.md create mode 100644 docs/user-guide/basics/expressions.md create mode 100644 docs/user-guide/basics/index.md create mode 100644 docs/user-guide/basics/joins.md create mode 100644 docs/user-guide/basics/reading-writing.md delete mode 100644 docs/user-guide/concepts/index.md delete mode 100644 docs/user-guide/expressions/index.md delete mode 100644 docs/user-guide/getting-started.md create mode 100644 docs/user-guide/index.md delete mode 100644 docs/user-guide/io/index.md delete mode 100644 docs/user-guide/lazy/index.md delete mode 100644 docs/user-guide/overview.md delete mode 100644 docs/user-guide/transformations/index.md diff --git a/docs/_build/overrides/404.html b/docs/_build/overrides/404.html index a216b32dfc5f..ee9b8faa2aba 100644 --- a/docs/_build/overrides/404.html +++ b/docs/_build/overrides/404.html @@ -217,6 +217,6 @@

404 - You're lost.

How you got here is a mystery. But you can click the button below to go back to the homepage or use the search bar in the navigation menu to find what you are looking for.

- Home + Home {% endblock %} diff --git a/docs/api/index.md b/docs/api/index.md index 485b59923ad1..004799cae1b4 100644 --- a/docs/api/index.md +++ b/docs/api/index.md @@ -11,7 +11,7 @@ It's the best place to look if you need information on a specific function. ## Python The Python API reference is built using Sphinx. -It's available in [our docs](https://docs.pola.rs/py-polars/html/reference/index.html). +It's available on [GitHub Pages](https://docs.pola.rs/py-polars/html/reference/index.html). ## Rust diff --git a/docs/index.md b/docs/index.md new file mode 100644 index 000000000000..2c72f776edbb --- /dev/null +++ b/docs/index.md @@ -0,0 +1,58 @@ +--- +hide: + - navigation +--- + +# Polars + +![logo](https://raw.githubusercontent.com/pola-rs/polars-static/master/logos/polars_github_logo_rect_dark_name.svg) + +

Blazingly Fast DataFrame Library

+
+ + rust docs + + + + + + PyPI Latest Release + + + DOI Latest Release + +
+ +Polars is a highly performant DataFrame library for manipulating structured data. The core is written in Rust, but the library is also available in Python. Its key features are: + +- **Fast**: Polars is written from the ground up, designed close to the machine and without external dependencies. +- **I/O**: First class support for all common data storage layers: local, cloud storage & databases. +- **Easy to use**: Write your queries the way they were intended. Polars, internally, will determine the most efficient way to execute using its query optimizer. +- **Out of Core**: Polars supports out of core data transformation with its streaming API. Allowing you to process your results without requiring all your data to be in memory at the same time +- **Parallel**: Polars fully utilises the power of your machine by dividing the workload among the available CPU cores without any additional configuration. +- **Vectorized Query Engine**: Polars uses [Apache Arrow](https://arrow.apache.org/), a columnar data format, to process your queries in a vectorized manner. It uses [SIMD](https://en.wikipedia.org/wiki/Single_instruction,_multiple_data) to optimize CPU usage. + +## Performance :rocket: :rocket: + +Polars is very fast, and in fact is one of the best performing solutions available. +See the results in h2oai's [db-benchmark](https://duckdblabs.github.io/db-benchmark/), revived by the DuckDB project. + +Polars [TPC-H Benchmark results](https://www.pola.rs/benchmarks.html) are now available on the official website. + +## Example + +{{code_block('home/example','example',['scan_csv','filter','group_by','collect'])}} + +## Community + +Polars has a very active community with frequent releases (approximately weekly). Below are some of the top contributors to the project: + +--8<-- "docs/people.md" + +## Contributing + +We appreciate all contributions, from reporting bugs to implementing new features. Read our [contributing guide](development/contributing/index.md) to learn more. + +## License + +This project is licensed under the terms of the [MIT license](https://github.com/pola-rs/polars/blob/main/LICENSE). diff --git a/docs/src/python/user-guide/basics/expressions.py b/docs/src/python/user-guide/basics/expressions.py index 12c6ea2170ec..041b023f27c4 100644 --- a/docs/src/python/user-guide/basics/expressions.py +++ b/docs/src/python/user-guide/basics/expressions.py @@ -6,16 +6,19 @@ df = pl.DataFrame( { - "a": range(5), - "b": np.random.rand(5), + "a": range(8), + "b": np.random.rand(8), "c": [ - datetime(2025, 12, 1), - datetime(2025, 12, 2), - datetime(2025, 12, 3), - datetime(2025, 12, 4), - datetime(2025, 12, 5), + datetime(2022, 12, 1), + datetime(2022, 12, 2), + datetime(2022, 12, 3), + datetime(2022, 12, 4), + datetime(2022, 12, 5), + datetime(2022, 12, 6), + datetime(2022, 12, 7), + datetime(2022, 12, 8), ], - "d": [1, 2.0, float("nan"), -42, None], + "d": [1, 2.0, float("nan"), float("nan"), 0, -5, -42, None], } ) # --8<-- [end:setup] @@ -33,12 +36,12 @@ # --8<-- [end:select3] # --8<-- [start:exclude] -df.select(pl.exclude(["a", "c"])) +df.select(pl.exclude("a")) # --8<-- [end:exclude] # --8<-- [start:filter] df.filter( - pl.col("c").is_between(datetime(2025, 12, 2), datetime(2025, 12, 3)), + pl.col("c").is_between(datetime(2022, 12, 2), datetime(2022, 12, 8)), ) # --8<-- [end:filter] diff --git a/docs/src/python/user-guide/basics/reading-writing.py b/docs/src/python/user-guide/basics/reading-writing.py index 68c0ab235fd1..dc8a54ebd18f 100644 --- a/docs/src/python/user-guide/basics/reading-writing.py +++ b/docs/src/python/user-guide/basics/reading-writing.py @@ -6,12 +6,11 @@ { "integer": [1, 2, 3], "date": [ - datetime(2025, 1, 1), - datetime(2025, 1, 2), - datetime(2025, 1, 3), + datetime(2022, 1, 1), + datetime(2022, 1, 2), + datetime(2022, 1, 3), ], "float": [4.0, 5.0, 6.0], - "string": ["a", "b", "c"], } ) diff --git a/docs/src/rust/user-guide/basics/expressions.rs b/docs/src/rust/user-guide/basics/expressions.rs index 757c52e3939f..ea6cae3c84af 100644 --- a/docs/src/rust/user-guide/basics/expressions.rs +++ b/docs/src/rust/user-guide/basics/expressions.rs @@ -6,16 +6,19 @@ fn main() -> Result<(), Box> { let mut rng = rand::thread_rng(); let df: DataFrame = df!( - "a" => 0..5, - "b"=> (0..5).map(|_| rng.gen::()).collect::>(), + "a" => 0..8, + "b"=> (0..8).map(|_| rng.gen::()).collect::>(), "c"=> [ - NaiveDate::from_ymd_opt(2025, 12, 1).unwrap().and_hms_opt(0, 0, 0).unwrap(), - NaiveDate::from_ymd_opt(2025, 12, 2).unwrap().and_hms_opt(0, 0, 0).unwrap(), - NaiveDate::from_ymd_opt(2025, 12, 3).unwrap().and_hms_opt(0, 0, 0).unwrap(), - NaiveDate::from_ymd_opt(2025, 12, 4).unwrap().and_hms_opt(0, 0, 0).unwrap(), - NaiveDate::from_ymd_opt(2025, 12, 5).unwrap().and_hms_opt(0, 0, 0).unwrap(), + NaiveDate::from_ymd_opt(2022, 12, 1).unwrap().and_hms_opt(0, 0, 0).unwrap(), + NaiveDate::from_ymd_opt(2022, 12, 2).unwrap().and_hms_opt(0, 0, 0).unwrap(), + NaiveDate::from_ymd_opt(2022, 12, 3).unwrap().and_hms_opt(0, 0, 0).unwrap(), + NaiveDate::from_ymd_opt(2022, 12, 4).unwrap().and_hms_opt(0, 0, 0).unwrap(), + NaiveDate::from_ymd_opt(2022, 12, 5).unwrap().and_hms_opt(0, 0, 0).unwrap(), + NaiveDate::from_ymd_opt(2022, 12, 6).unwrap().and_hms_opt(0, 0, 0).unwrap(), + NaiveDate::from_ymd_opt(2022, 12, 7).unwrap().and_hms_opt(0, 0, 0).unwrap(), + NaiveDate::from_ymd_opt(2022, 12, 8).unwrap().and_hms_opt(0, 0, 0).unwrap(), ], - "d"=> [Some(1.0), Some(2.0), None, Some(-42.), None] + "d"=> [Some(1.0), Some(2.0), None, None, Some(0.0), Some(-5.0), Some(-42.), None] ) .unwrap(); @@ -43,17 +46,17 @@ fn main() -> Result<(), Box> { let out = df .clone() .lazy() - .select([col("*").exclude(["a", "c"])]) + .select([col("*").exclude(["a"])]) .collect()?; println!("{}", out); // --8<-- [end:exclude] // --8<-- [start:filter] - let start_date = NaiveDate::from_ymd_opt(2025, 12, 2) + let start_date = NaiveDate::from_ymd_opt(2022, 12, 2) .unwrap() .and_hms_opt(0, 0, 0) .unwrap(); - let end_date = NaiveDate::from_ymd_opt(2025, 12, 3) + let end_date = NaiveDate::from_ymd_opt(2022, 12, 8) .unwrap() .and_hms_opt(0, 0, 0) .unwrap(); diff --git a/docs/src/rust/user-guide/basics/reading-writing.rs b/docs/src/rust/user-guide/basics/reading-writing.rs index dad5e8713d24..44c1a335428d 100644 --- a/docs/src/rust/user-guide/basics/reading-writing.rs +++ b/docs/src/rust/user-guide/basics/reading-writing.rs @@ -9,9 +9,9 @@ fn main() -> Result<(), Box> { let mut df: DataFrame = df!( "integer" => &[1, 2, 3], "date" => &[ - NaiveDate::from_ymd_opt(2025, 1, 1).unwrap().and_hms_opt(0, 0, 0).unwrap(), - NaiveDate::from_ymd_opt(2025, 1, 2).unwrap().and_hms_opt(0, 0, 0).unwrap(), - NaiveDate::from_ymd_opt(2025, 1, 3).unwrap().and_hms_opt(0, 0, 0).unwrap(), + NaiveDate::from_ymd_opt(2022, 1, 1).unwrap().and_hms_opt(0, 0, 0).unwrap(), + NaiveDate::from_ymd_opt(2022, 1, 2).unwrap().and_hms_opt(0, 0, 0).unwrap(), + NaiveDate::from_ymd_opt(2022, 1, 3).unwrap().and_hms_opt(0, 0, 0).unwrap(), ], "float" => &[4.0, 5.0, 6.0] ) diff --git a/docs/user-guide/basics/expressions.md b/docs/user-guide/basics/expressions.md new file mode 100644 index 000000000000..0277d3da72f6 --- /dev/null +++ b/docs/user-guide/basics/expressions.md @@ -0,0 +1,130 @@ +# Expressions + +`Expressions` are the core strength of Polars. The `expressions` offer a versatile structure that both solves easy queries and is easily extended to complex ones. Below we will cover the basic components that serve as building block (or in Polars terminology contexts) for all your queries: + +- `select` +- `filter` +- `with_columns` +- `group_by` + +To learn more about expressions and the context in which they operate, see the User Guide sections: [Contexts](../concepts/contexts.md) and [Expressions](../concepts/expressions.md). + +### Select statement + +To select a column we need to do two things. Define the `DataFrame` we want the data from. And second, select the data that we need. In the example below you see that we select `col('*')`. The asterisk stands for all columns. + +{{code_block('user-guide/basics/expressions','select',['select'])}} + +```python exec="on" result="text" session="getting-started/expressions" +--8<-- "python/user-guide/basics/expressions.py:setup" +print( + --8<-- "python/user-guide/basics/expressions.py:select" +) +``` + +You can also specify the specific columns that you want to return. There are two ways to do this. The first option is to pass the column names, as seen below. + +{{code_block('user-guide/basics/expressions','select2',['select'])}} + +```python exec="on" result="text" session="getting-started/expressions" +print( + --8<-- "python/user-guide/basics/expressions.py:select2" +) +``` + +The second option is to specify each column using `pl.col`. This option is shown below. + +{{code_block('user-guide/basics/expressions','select3',['select'])}} + +```python exec="on" result="text" session="getting-started/expressions" +print( + --8<-- "python/user-guide/basics/expressions.py:select3" +) +``` + +If you want to exclude an entire column from your view, you can simply use `exclude` in your `select` statement. + +{{code_block('user-guide/basics/expressions','exclude',['select'])}} + +```python exec="on" result="text" session="getting-started/expressions" +print( + --8<-- "python/user-guide/basics/expressions.py:exclude" +) +``` + +### Filter + +The `filter` option allows us to create a subset of the `DataFrame`. We use the same `DataFrame` as earlier and we filter between two specified dates. + +{{code_block('user-guide/basics/expressions','filter',['filter'])}} + +```python exec="on" result="text" session="getting-started/expressions" +print( + --8<-- "python/user-guide/basics/expressions.py:filter" +) +``` + +With `filter` you can also create more complex filters that include multiple columns. + +{{code_block('user-guide/basics/expressions','filter2',['filter'])}} + +```python exec="on" result="text" session="getting-started/expressions" +print( + --8<-- "python/user-guide/basics/expressions.py:filter2" +) +``` + +### With_columns + +`with_columns` allows you to create new columns for your analyses. We create two new columns `e` and `b+42`. First we sum all values from column `b` and store the results in column `e`. After that we add `42` to the values of `b`. Creating a new column `b+42` to store these results. + +{{code_block('user-guide/basics/expressions','with_columns',['with_columns'])}} + +```python exec="on" result="text" session="getting-started/expressions" +print( + --8<-- "python/user-guide/basics/expressions.py:with_columns" +) +``` + +### Group by + +We will create a new `DataFrame` for the Group by functionality. This new `DataFrame` will include several 'groups' that we want to group by. + +{{code_block('user-guide/basics/expressions','dataframe2',['DataFrame'])}} + +```python exec="on" result="text" session="getting-started/expressions" +--8<-- "python/user-guide/basics/expressions.py:dataframe2" +print(df2) +``` + +{{code_block('user-guide/basics/expressions','group_by',['group_by'])}} + +```python exec="on" result="text" session="getting-started/expressions" +print( + --8<-- "python/user-guide/basics/expressions.py:group_by" +) +``` + +{{code_block('user-guide/basics/expressions','group_by2',['group_by'])}} + +```python exec="on" result="text" session="getting-started/expressions" +print( + --8<-- "python/user-guide/basics/expressions.py:group_by2" +) +``` + +### Combining operations + +Below are some examples on how to combine operations to create the `DataFrame` you require. + +{{code_block('user-guide/basics/expressions','combine',['select','with_columns'])}} + +```python exec="on" result="text" session="getting-started/expressions" +--8<-- "python/user-guide/basics/expressions.py:combine" +``` + +{{code_block('user-guide/basics/expressions','combine2',['select','with_columns'])}} + +```python exec="on" result="text" session="getting-started/expressions" +--8<-- "python/user-guide/basics/expressions.py:combine2" +``` diff --git a/docs/user-guide/basics/index.md b/docs/user-guide/basics/index.md new file mode 100644 index 000000000000..af73c7967574 --- /dev/null +++ b/docs/user-guide/basics/index.md @@ -0,0 +1,18 @@ +# Introduction + +This chapter is intended for new Polars users. +The goal is to provide a quick overview of the most common functionality. +Feel free to skip ahead to the [next chapter](../concepts/data-types/overview.md) to dive into the details. + +!!! rust "Rust Users Only" + + Due to historical reasons, the eager API in Rust is outdated. In the future, we would like to redesign it as a small wrapper around the lazy API (as is the design in Python / NodeJS). In the examples, we will use the lazy API instead with `.lazy()` and `.collect()`. For now you can ignore these two functions. If you want to know more about the lazy and eager API, go [here](../concepts/lazy-vs-eager.md). + + To enable the Lazy API ensure you have the feature flag `lazy` configured when installing Polars + ``` + # Cargo.toml + [dependencies] + polars = { version = "x", features = ["lazy", ...]} + ``` + + Because of the ownership ruling in Rust, we can not reuse the same `DataFrame` multiple times in the examples. For simplicity reasons we call `clone()` to overcome this issue. Note that this does not duplicate the data but just increments a pointer (`Arc`). diff --git a/docs/user-guide/basics/joins.md b/docs/user-guide/basics/joins.md new file mode 100644 index 000000000000..21cb927164a9 --- /dev/null +++ b/docs/user-guide/basics/joins.md @@ -0,0 +1,26 @@ +# Combining DataFrames + +There are two ways `DataFrame`s can be combined depending on the use case: join and concat. + +## Join + +Polars supports all types of join (e.g. left, right, inner, outer). Let's have a closer look on how to `join` two `DataFrames` into a single `DataFrame`. Our two `DataFrames` both have an 'id'-like column: `a` and `x`. We can use those columns to `join` the `DataFrames` in this example. + +{{code_block('user-guide/basics/joins','join',['join'])}} + +```python exec="on" result="text" session="getting-started/joins" +--8<-- "python/user-guide/basics/joins.py:setup" +--8<-- "python/user-guide/basics/joins.py:join" +``` + +To see more examples with other types of joins, go the [User Guide](../transformations/joins.md). + +## Concat + +We can also `concatenate` two `DataFrames`. Vertical concatenation will make the `DataFrame` longer. Horizontal concatenation will make the `DataFrame` wider. Below you can see the result of an horizontal concatenation of our two `DataFrames`. + +{{code_block('user-guide/basics/joins','hstack',['hstack'])}} + +```python exec="on" result="text" session="getting-started/joins" +--8<-- "python/user-guide/basics/joins.py:hstack" +``` diff --git a/docs/user-guide/basics/reading-writing.md b/docs/user-guide/basics/reading-writing.md new file mode 100644 index 000000000000..8999f601e823 --- /dev/null +++ b/docs/user-guide/basics/reading-writing.md @@ -0,0 +1,45 @@ +# Reading & writing + +Polars supports reading and writing to all common files (e.g. csv, json, parquet), cloud storage (S3, Azure Blob, BigQuery) and databases (e.g. postgres, mysql). In the following examples we will show how to operate on most common file formats. For the following dataframe + +{{code_block('user-guide/basics/reading-writing','dataframe',['DataFrame'])}} + +```python exec="on" result="text" session="getting-started/reading" +--8<-- "python/user-guide/basics/reading-writing.py:dataframe" +``` + +#### CSV + +Polars has its own fast implementation for csv reading with many flexible configuration options. + +{{code_block('user-guide/basics/reading-writing','csv',['read_csv','write_csv'])}} + +```python exec="on" result="text" session="getting-started/reading" +--8<-- "python/user-guide/basics/reading-writing.py:csv" +``` + +As we can see above, Polars made the datetimes a `string`. We can tell Polars to parse dates, when reading the csv, to ensure the date becomes a datetime. The example can be found below: + +{{code_block('user-guide/basics/reading-writing','csv2',['read_csv'])}} + +```python exec="on" result="text" session="getting-started/reading" +--8<-- "python/user-guide/basics/reading-writing.py:csv2" +``` + +#### JSON + +{{code_block('user-guide/basics/reading-writing','json',['read_json','write_json'])}} + +```python exec="on" result="text" session="getting-started/reading" +--8<-- "python/user-guide/basics/reading-writing.py:json" +``` + +#### Parquet + +{{code_block('user-guide/basics/reading-writing','parquet',['read_parquet','write_parquet'])}} + +```python exec="on" result="text" session="getting-started/reading" +--8<-- "python/user-guide/basics/reading-writing.py:parquet" +``` + +To see more examples and other data formats go to the [User Guide](../io/csv.md), section IO. diff --git a/docs/user-guide/concepts/index.md b/docs/user-guide/concepts/index.md deleted file mode 100644 index 63a2ebeabe44..000000000000 --- a/docs/user-guide/concepts/index.md +++ /dev/null @@ -1,11 +0,0 @@ -# Concepts - -The `Concepts` chapter describes the core concepts of the Polars API. Understanding these will help you optimise your queries on a daily basis. We will cover the following topics: - -- [Data Types: Overview](data-types/overview.md) -- [Data Types: Categoricals](data-types/categoricals.md) -- [Data structures](data-structures.md) -- [Contexts](contexts.md) -- [Expressions](expressions.md) -- [Lazy vs eager](lazy-vs-eager.md) -- [Streaming](streaming.md) diff --git a/docs/user-guide/expressions/index.md b/docs/user-guide/expressions/index.md deleted file mode 100644 index 3724e09ce15e..000000000000 --- a/docs/user-guide/expressions/index.md +++ /dev/null @@ -1,18 +0,0 @@ -# Expressions - -In the `Contexts` sections we outlined what `Expressions` are and how they are invaluable. In this section we will focus on the `Expressions` themselves. Each section gives an overview of what they do and provide additional examples. - -- [Operators](operators.md) -- [Column selections](column-selections.md) -- [Functions](functions.md) -- [Casting](casting.md) -- [Strings](strings.md) -- [Aggregation](aggregation.md) -- [Null](null.md) -- [Window](window.md) -- [Folds](folds.md) -- [Lists](lists.md) -- [Plugins](plugins.md) -- [User-defined functions](user-defined-functions.md) -- [Structs](structs.md) -- [Numpy](numpy.md) diff --git a/docs/user-guide/getting-started.md b/docs/user-guide/getting-started.md deleted file mode 100644 index 3ae743114cf8..000000000000 --- a/docs/user-guide/getting-started.md +++ /dev/null @@ -1,186 +0,0 @@ -# Getting started - -This chapter is here to help you get started with Polars. It covers all the fundamental features and functionalities of the library, making it easy for new users to familiarise themselves with the basics from initial installation and setup to core functionalities. If you're already an advanced user or familiar with Dataframes, feel free to skip ahead to the [next chapter about installation options](installation.md). - -## Installing Polars - -=== ":fontawesome-brands-python: Python" - - ``` bash - pip install polars - ``` - -=== ":fontawesome-brands-rust: Rust" - - ``` shell - cargo add polars -F lazy - - # Or Cargo.toml - [dependencies] - polars = { version = "x", features = ["lazy", ...]} - ``` - -## Reading & writing - -Polars supports reading and writing for common file formats (e.g. csv, json, parquet), cloud storage (S3, Azure Blob, BigQuery) and databases (e.g. postgres, mysql). Below we show the concept of reading and writing to disk. - -{{code_block('user-guide/basics/reading-writing','dataframe',['DataFrame'])}} - -```python exec="on" result="text" session="getting-started/reading" ---8<-- "python/user-guide/basics/reading-writing.py:dataframe" -``` - -In the example below we write the DataFrame to a csv file called `output.csv`. After thatread it back with `read_csv` and `print` the result for inspection. - -{{code_block('user-guide/basics/reading-writing','csv',['read_csv','write_csv'])}} - -```python exec="on" result="text" session="getting-started/reading" ---8<-- "python/user-guide/basics/reading-writing.py:csv" -``` - -For more examples on the CSV file format and other data formats, start with the [IO section](io/index.md) of the User Guide. - -## Expressions - -`Expressions` are the core strength of Polars. The `expressions` offer a modular structure that allows you to combine simple concepts into complex queries. Below we cover the basic components that serve as building block (or in Polars terminology contexts) for all your queries: - -- `select` -- `filter` -- `with_columns` -- `group_by` - -To learn more about expressions and the context in which they operate, see the User Guide sections: [Contexts](concepts/contexts.md) and [Expressions](concepts/expressions.md). - -### Select - -To select a column we need to do two things: - -1. Define the `DataFrame` we want the data from. -2. Select the data that we need. - -In the example below you see that we select `col('*')`. The asterisk stands for all columns. - -{{code_block('user-guide/basics/expressions','select',['select'])}} - -```python exec="on" result="text" session="getting-started/expressions" ---8<-- "python/user-guide/basics/expressions.py:setup" -print( - --8<-- "python/user-guide/basics/expressions.py:select" -) -``` - -You can also specify the specific columns that you want to return. There are two ways to do this. The first option is to pass the column names, as seen below. - -{{code_block('user-guide/basics/expressions','select2',['select'])}} - -```python exec="on" result="text" session="getting-started/expressions" -print( - --8<-- "python/user-guide/basics/expressions.py:select2" -) -``` - -Follow these links to other parts of the User guide to learn more about [basic operations](expressions/operators.md) or [column selections](expressions/column-selections.md). - -### Filter - -The `filter` option allows us to create a subset of the `DataFrame`. We use the same `DataFrame` as earlier and we filter between two specified dates. - -{{code_block('user-guide/basics/expressions','filter',['filter'])}} - -```python exec="on" result="text" session="getting-started/expressions" -print( - --8<-- "python/user-guide/basics/expressions.py:filter" -) -``` - -With `filter` you can also create more complex filters that include multiple columns. - -{{code_block('user-guide/basics/expressions','filter2',['filter'])}} - -```python exec="on" result="text" session="getting-started/expressions" -print( - --8<-- "python/user-guide/basics/expressions.py:filter2" -) -``` - -### Add columns - -`with_columns` allows you to create new columns for your analyses. We create two new columns `e` and `b+42`. First we sum all values from column `b` and store the results in column `e`. After that we add `42` to the values of `b`. Creating a new column `b+42` to store these results. - -{{code_block('user-guide/basics/expressions','with_columns',['with_columns'])}} - -```python exec="on" result="text" session="getting-started/expressions" -print( - --8<-- "python/user-guide/basics/expressions.py:with_columns" -) -``` - -### Group_by - -We will create a new `DataFrame` for the Group by functionality. This new `DataFrame` will include several 'groups' that we want to group by. - -{{code_block('user-guide/basics/expressions','dataframe2',['DataFrame'])}} - -```python exec="on" result="text" session="getting-started/expressions" ---8<-- "python/user-guide/basics/expressions.py:dataframe2" -print(df2) -``` - -{{code_block('user-guide/basics/expressions','group_by',['group_by'])}} - -```python exec="on" result="text" session="getting-started/expressions" -print( - --8<-- "python/user-guide/basics/expressions.py:group_by" -) -``` - -{{code_block('user-guide/basics/expressions','group_by2',['group_by'])}} - -```python exec="on" result="text" session="getting-started/expressions" -print( - --8<-- "python/user-guide/basics/expressions.py:group_by2" -) -``` - -### Combination - -Below are some examples on how to combine operations to create the `DataFrame` you require. - -{{code_block('user-guide/basics/expressions','combine',['select','with_columns'])}} - -```python exec="on" result="text" session="getting-started/expressions" ---8<-- "python/user-guide/basics/expressions.py:combine" -``` - -{{code_block('user-guide/basics/expressions','combine2',['select','with_columns'])}} - -```python exec="on" result="text" session="getting-started/expressions" ---8<-- "python/user-guide/basics/expressions.py:combine2" -``` - -## Combining DataFrames - -There are two ways `DataFrame`s can be combined depending on the use case: join and concat. - -### Join - -Polars supports all types of join (e.g. left, right, inner, outer). Let's have a closer look on how to `join` two `DataFrames` into a single `DataFrame`. Our two `DataFrames` both have an 'id'-like column: `a` and `x`. We can use those columns to `join` the `DataFrames` in this example. - -{{code_block('user-guide/basics/joins','join',['join'])}} - -```python exec="on" result="text" session="getting-started/joins" ---8<-- "python/user-guide/basics/joins.py:setup" ---8<-- "python/user-guide/basics/joins.py:join" -``` - -To see more examples with other types of joins, see the [Transformations section](transformations/joins.md) in the user guide. - -### Concat - -We can also `concatenate` two `DataFrames`. Vertical concatenation will make the `DataFrame` longer. Horizontal concatenation will make the `DataFrame` wider. Below you can see the result of an horizontal concatenation of our two `DataFrames`. - -{{code_block('user-guide/basics/joins','hstack',['hstack'])}} - -```python exec="on" result="text" session="getting-started/joins" ---8<-- "python/user-guide/basics/joins.py:hstack" -``` diff --git a/docs/user-guide/index.md b/docs/user-guide/index.md new file mode 100644 index 000000000000..442029472d80 --- /dev/null +++ b/docs/user-guide/index.md @@ -0,0 +1,39 @@ +# Introduction + +This user guide is an introduction to the [Polars DataFrame library](https://github.com/pola-rs/polars). +Its goal is to introduce you to Polars by going through examples and comparing it to other solutions. +Some design choices are introduced here. The guide will also introduce you to optimal usage of Polars. + +The Polars user guide is intended to live alongside the API documentation ([Python](https://docs.pola.rs/py-polars/html/reference/index.html) / [Rust](https://docs.rs/polars/latest/polars/)), which offers detailed descriptions of specific objects and functions. + +Even though Polars is completely written in [Rust](https://www.rust-lang.org/) (no runtime overhead!) and uses [Arrow](https://arrow.apache.org/) -- the [native arrow2 Rust implementation](https://github.com/jorgecarleitao/arrow2) -- as its foundation, the examples presented in this guide will be mostly using its higher-level language bindings. +Higher-level bindings only serve as a thin wrapper for functionality implemented in the core library. + +For [pandas](https://pandas.pydata.org/) users, our [Python package](https://pypi.org/project/polars/) will offer the easiest way to get started with Polars. + +### Philosophy + +The goal of Polars is to provide a lightning fast `DataFrame` library that: + +- Utilizes all available cores on your machine. +- Optimizes queries to reduce unneeded work/memory allocations. +- Handles datasets much larger than your available RAM. +- Has an API that is consistent and predictable. +- Has a strict schema (data-types should be known before running the query). + +Polars is written in Rust which gives it C/C++ performance and allows it to fully control performance critical parts +in a query engine. + +As such Polars goes to great lengths to: + +- Reduce redundant copies. +- Traverse memory cache efficiently. +- Minimize contention in parallelism. +- Process data in chunks. +- Reuse memory allocations. + +!!! rust "Note" + + The Rust examples in this guide are synchronized with the main branch of the Polars repository, rather than the latest Rust release. + You may not be able to copy-paste code examples and use them with the latest release. + We aim to solve this in the future. diff --git a/docs/user-guide/io/index.md b/docs/user-guide/io/index.md deleted file mode 100644 index 5a3548871e8a..000000000000 --- a/docs/user-guide/io/index.md +++ /dev/null @@ -1,12 +0,0 @@ -# IO - -Reading and writing your data is crucial for a DataFrame library. In this chapter you will learn more on how to read and write to different file formats that are supported by Polars. - -- [CSV](csv.md) -- [Excel](excel.md) -- [Parquet](parquet.md) -- [Json](json.md) -- [Multiple](multiple.md) -- [Database](database.md) -- [Cloud storage](cloud-storage.md) -- [Google Big Query](bigquery.md) diff --git a/docs/user-guide/lazy/index.md b/docs/user-guide/lazy/index.md deleted file mode 100644 index be731390f09c..000000000000 --- a/docs/user-guide/lazy/index.md +++ /dev/null @@ -1,10 +0,0 @@ -# Lazy - -The Lazy chapter is a guide for working with `LazyFrames`. It covers the functionalities like how to use it and how to optimise it. You can also find more information about the query plan or gain more insight in the streaming capabilities. - -- [Using lazy API](using.md) -- [Optimisations](optimizations.md) -- [Schemas](schemas.md) -- [Query plan](query-plan.md) -- [Execution](execution.md) -- [Streaming](streaming.md) diff --git a/docs/user-guide/overview.md b/docs/user-guide/overview.md deleted file mode 100644 index f76eb0e1a1d3..000000000000 --- a/docs/user-guide/overview.md +++ /dev/null @@ -1,69 +0,0 @@ -# Overview - -![logo](https://raw.githubusercontent.com/pola-rs/polars-static/master/logos/polars_github_logo_rect_dark_name.svg) - -

Blazingly Fast DataFrame Library

- - -Polars is a blazingly fast DataFrame library for manipulating structured data. The core is written in Rust, and available for Python, R and NodeJS. - -## Key features - -- **Fast**: Written from scratch in Rust, designed close to the machine and without external dependencies. -- **I/O**: First class support for all common data storage layers: local, cloud storage & databases. -- **Intuitive API**: Write your queries the way they were intended. Polars, internally, will determine the most efficient way to execute using its query optimizer. -- **Out of Core**: The streaming API allows you to process your results without requiring all your data to be in memory at the same time -- **Parallel**: Utilises the power of your machine by dividing the workload among the available CPU cores without any additional configuration. -- **Vectorized Query Engine**: Using [Apache Arrow](https://arrow.apache.org/), a columnar data format, to process your queries in a vectorized manner and SIMD to optimize CPU usage. - - - -!!! info "Users new to DataFrames" - A DataFrame is a 2-dimensional data structure that is useful for data manipulation and analysis. With labeled axes for rows and columns, each column can contain different data types, making complex data operations such as merging and aggregation much easier. Due to their flexibility and intuitive way of storing and working with data, DataFrames have become increasingly popular in modern data analytics and engineering. - - - -## Philosophy - -The goal of Polars is to provide a lightning fast DataFrame library that: - -- Utilizes all available cores on your machine. -- Optimizes queries to reduce unneeded work/memory allocations. -- Handles datasets much larger than your available RAM. -- A consistent and predictable API. -- Adheres to a strict schema (data-types should be known before running the query). - -Polars is written in Rust which gives it C/C++ performance and allows it to fully control performance critical parts in a query engine. - -## Example - -{{code_block('home/example','example',['scan_csv','filter','group_by','collect'])}} - -A more extensive introduction can be found in the [next chapter](getting-started.md). - -## Community - -Polars has a very active community with frequent releases (approximately weekly). Below are some of the top contributors to the project: - ---8<-- "docs/people.md" - -## Contributing - -We appreciate all contributions, from reporting bugs to implementing new features. Read our [contributing guide](../development/contributing/index.md) to learn more. - -## License - -This project is licensed under the terms of the [MIT license](https://github.com/pola-rs/polars/blob/main/LICENSE). diff --git a/docs/user-guide/transformations/index.md b/docs/user-guide/transformations/index.md deleted file mode 100644 index cd673786643c..000000000000 --- a/docs/user-guide/transformations/index.md +++ /dev/null @@ -1,8 +0,0 @@ -# Transformations - -The focus of this section is to describe different types of data transformations and provide some examples on how to use them. - -- [Joins](joins.md) -- [Concatenation](concatenation.md) -- [Pivot](pivot.md) -- [Melt](melt.md) diff --git a/mkdocs.yml b/mkdocs.yml index e5d6c26769b6..9918d5c2e8f3 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -2,18 +2,23 @@ # Project information site_name: Polars -site_url: https://docs.pola.rs/ +site_url: https://docs.pola.rs repo_url: https://github.com/pola-rs/polars repo_name: pola-rs/polars # Documentation layout nav: + - Home: index.md + - User guide: - - user-guide/overview.md - - user-guide/getting-started.md + - user-guide/index.md - user-guide/installation.md + - Basics: + - user-guide/basics/index.md + - user-guide/basics/reading-writing.md + - user-guide/basics/expressions.md + - user-guide/basics/joins.md - Concepts: - - user-guide/concepts/index.md - Data types: - user-guide/concepts/data-types/overview.md - user-guide/concepts/data-types/categoricals.md @@ -23,7 +28,6 @@ nav: - user-guide/concepts/lazy-vs-eager.md - user-guide/concepts/streaming.md - Expressions: - - user-guide/expressions/index.md - user-guide/expressions/operators.md - user-guide/expressions/column-selections.md - user-guide/expressions/functions.md @@ -39,7 +43,6 @@ nav: - user-guide/expressions/structs.md - user-guide/expressions/numpy.md - Transformations: - - user-guide/transformations/index.md - user-guide/transformations/joins.md - user-guide/transformations/concatenation.md - user-guide/transformations/pivot.md @@ -51,7 +54,6 @@ nav: - user-guide/transformations/time-series/resampling.md - user-guide/transformations/time-series/timezones.md - Lazy API: - - user-guide/lazy/index.md - user-guide/lazy/using.md - user-guide/lazy/optimizations.md - user-guide/lazy/schemas.md @@ -59,7 +61,6 @@ nav: - user-guide/lazy/execution.md - user-guide/lazy/streaming.md - IO: - - user-guide/io/index.md - user-guide/io/csv.md - user-guide/io/excel.md - user-guide/io/parquet.md @@ -133,7 +134,6 @@ theme: - navigation.tabs - navigation.tabs.sticky - navigation.footer - - navigation.indexes - content.tabs.link icon: repo: fontawesome/brands/github @@ -144,7 +144,6 @@ extra: analytics: provider: plausible domain: guide.pola.rs,combined.pola.rs - homepage: /user-guide/overview/ # Preview controls strict: true