Skip to content

7. Accessing rows and columns

Oleksandr Zaytsev edited this page Jan 10, 2018 · 1 revision

Rows and columns of a data frame can be accessed either by their names or their numeric indexes. You can access row 'C' and the column 'Population' of a data frame created in the previous sections by writing

df row: 'C'.
df column: 'Population'.

Alternatively, you can use numeric indexes. Here is how you can ask a data frame for a third row or a second column:

df rowAt: 3.
df columnAt: 2.

The important feature of a DataFrame is that when asked for a specific row or column, it responds with a DataSeries object that preserves the same indexing. This way, if you extract row 'B' from a data frame, it will still remember that 'Dubai' is a city with a population of 2.789 million people

            |      B  
------------+-------
      City  |  Dubai  
Population  |  2.789  
 BeenThere  |   true 

You can access multiple columns at a same time by providing an array of column names or indexes, or by specifying the numeric range. For this purpose DataFrame provides messages rows:, columns:, rowsAt:, columnsAt:, rowsFrom:to:, and columnsFrom:to:

df columns: #(City BeenThere).
df rowsAt: #(3 1).
df columnsFrom: 2 to: 3.
df rowsFrom: 3 to: 1.

The result will be a data frame with requested rows and columns in a given order. For example, the last line will give you a data frame "flipped upside-down" (with row indexes going in the descending order).

You can change the values of a specific row or column by passing an array or series of the same size to one of the messages: row:put:, column:put:, rowAt:put:, columnAt:put:. Be careful though, because these messages modify the data frame and may result in the loss of data.

df column: #BeenThere put: #(false true false).

As it was mentioned above, single cell of a data frame can be accessed with at:at: and at:at:put: messages

df at: 3 at: 2.
df at: 3 at: 2 put: true.

Head & tail

When working with bigger datasets it's often useful to access only the first or the last 5 rows. This can be done using head and tail messages. To see how they work let's load the Housing dataset.

df := DataFrame loadHousing.

This dataset has 489 entries. Printing all these rows in order to understand how this data looks like is unnecessary. On larger datasets it can also be time consuming. To take a quick look on your data, use df head or df tail

   |     RM  LSTAT  PTRATIO      MDEV  
---+---------------------------------
1  |  6.575   4.98     15.3  504000.0  
2  |  6.421   9.14     17.8  453600.0  
3  |  7.185   4.03     17.8  728700.0  
4  |  6.998   2.94     18.7  701400.0  
5  |  7.147   5.33     18.7  760200.0  

The resuld will be another data frame. head and tail messages are just shortcuts for df rowsFrom: 1 to: 5 and df rowsFrom: (df numberOfRows - 5) to: df numberOfRows.. But what if you want a different number of rows? You can do that using parametrized messages head: and tail: with a given number of rows.

df head: 10.
df tail: 3.

You can also look at the head or tail of a specific column, since all these messages are also supported by DataSeries

(df column: #LSTAT) head: 2.

The result will be another series

   |  LSTAT  
---+-------
1  |   4.98  
2  |   9.14

Tutorial

  1. Installation
  2. Creating DataSeries
Clone this wiki locally