Skip to content

Commit

Permalink
Add Myket Android Application Install Dataset
Browse files Browse the repository at this point in the history
  • Loading branch information
erfanloghmani committed Aug 18, 2023
1 parent a9da280 commit 8a8ab92
Show file tree
Hide file tree
Showing 4 changed files with 83 additions and 1 deletion.
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,9 @@ These datasets contain measurements of clothing fit from ModCloth.
* [RentTheRunway](https://cseweb.ucsd.edu/~jmcauley/datasets.html#clothing_fit):
These datasets contain measurements of clothing fit from [RentTheRunway](https://www.renttherunway.com).

### Android Applications
- [Myket](https://github.com/erfanloghmani/myket-android-application-market-dataset):
This dataset contains information on application install interactions of users in the [Myket](https://myket.ir/) Android application market. The dataset contains 694,121 install interactions for 10,000 anonymized users and 7,988 applications. It also has application features like an approximate number of installs, average ratings, and category.

## Datasets information statistics

Expand Down Expand Up @@ -215,6 +218,7 @@ These datasets contain measurements of clothing fit from [RentTheRunway](https:/
| 32 | RateBeer | 29,265 | 110,369 | 2,924,163 | 99\.9095% | Overall Rating<br/> \[0,20\] || |||
| 33 | RentTheRunway | 105,571 | 5,850 | 192,544 | 99\.9688% | Rating<br/> \[0,10\] |||||
| 34 | [Twitch](https://github.com/RUCAIBox/RecommenderSystems-Datasets/tree/master/dataset_info/Twitch) | 15,524,309 | 6,161,666 | 474,676,929 | 99\.9995% | Click | | | ||
| 35 | [Myket](https://github.com/erfanloghmani/myket-android-application-market-dataset) | 10,000 | 7,988 | 694121 | 99\.1312% | Install || || |


### CTR Datasets
Expand Down
37 changes: 37 additions & 0 deletions conversion_tools/src/extended_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -1857,6 +1857,43 @@ def load_inter_data(self):
return pd.read_csv(self.inter_file, delimiter=self.sep, header=None, engine='python')


class MyketDataset(BaseDataset):
def __init__(self, input_path, output_path):
super(MyketDataset, self).__init__(input_path, output_path)
self.dataset_name = 'myket'

# input path
self.inter_file = os.path.join(self.input_path, 'myket.csv')
self.item_file = os.path.join(self.input_path, 'app_info_sample.csv')

self.sep = ','

# output path
self.output_inter_file, self.output_item_file, self.output_user_file = self.get_output_files()

# selected feature fields
self.inter_fields = {
0: 'user_id:token',
1: 'item_id:token',
2: 'timestamp:float',
}

self.item_fields = {
0: 'item_id:token',
1: 'installs:float',
2: 'rating:float',
3: 'rating_count:float',
5: 'category:token_seq',
}

def load_inter_data(self):
return pd.read_csv(self.inter_file, delimiter=self.sep, engine='python', index_col=False)

def load_item_data(self):
return pd.read_csv(self.item_file, delimiter=self.sep, engine='python', index_col=False)



class JESTERDataset(BaseDataset):
def __init__(self, input_path, output_path):
super(JESTERDataset, self).__init__(input_path, output_path)
Expand Down
3 changes: 2 additions & 1 deletion conversion_tools/src/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,8 @@
'mind_large_dev': 'MINDLargeDevDataset',
'mind_small_train': 'MINDSmallTrainDataset',
'mind_small_dev': 'MINDSmallDevDataset',
'cosmetics': 'CosmeticsDataset'
'cosmetics': 'CosmeticsDataset',
'myket': 'MyketDataset',
}

click_dataset = {
Expand Down
40 changes: 40 additions & 0 deletions conversion_tools/usage/Myket.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# Myket

1.Clone the repository and install requirements.
(If you have already done this, please move to the step 2.)

```
git clone https://github.com/RUCAIBox/RecDatasets
cd RecDatasets/conversion_tools
pip install -r requirements.txt
```

2.Download the Myket Dataset and move the dataset files.
(If you have already done this, please move to the step 3.)

```
wget https://raw.githubusercontent.com/erfanloghmani/myket-android-application-market-dataset/main/myket.csv
wget https://raw.githubusercontent.com/erfanloghmani/myket-android-application-market-dataset/main/app_info_sample.csv
mkdir myket-data
mv myket.csv ./pinterest-data/
mv app_info_sample.csv ./pinterest-data/
```

3.Go the ``conversion_tools/`` directory
and run the following command to get the atomic files of Pinterest dataset.

```
python run.py --dataset myket \
--input_path myket-data --output_path output_data/pinterest-data \
--convert_inter --convert_item
```

`input_path` is the path of the input decompressed pinterest file

`output_path` is the path to store converted atomic files

`convert_inter`, `convert_item` Myket can be converted to 'myket.inter' and 'myket.item' atomic files

0 comments on commit 8a8ab92

Please sign in to comment.