Notice! This page is still in progress!

What’s the best strategy to win in PUBG?

Objective:

Predict

Data:

65,000 games’ worth of anonymized player data

Directory Structure

.                         # root folder
├── data                  # folder which contains data sets
|  └── train.csv          # train sample
|  └── test.csv           # test sample
├── pubg.ipynb            # main file
├── utils_data.py         # contains some utility functions

Directory Structure

.                         # root folder
├── data                  # folder which contains data sets
|  └── train.csv          # train sample
|  └── test.csv           # test sample
├── pubg.ipynb            # main file
└── utils_data.py         # contains some utility functions

Lets get it started

PUBG Start

%matplotlib inline

import pandas as pd
import numpy as np

from data_utils import df_info

We will start by importing required packages and loading the training set. data_utils is a file I prepared in advance some useful function utilities especially when working with tabular data. You can view the code of the file here.

Tip! The csv_read() parameter nrows= can be used to limit the number of imported rows to the number given. This might be practical for performance reasons in case you don’t need the whole dataset and just want to do some minor data analysis or for function testing purposes etc.. If nrows=None everything will be loaded.

Alt Text

train = pd.read_csv("data/train.csv", nrows=None)

In the following I will use some functions I prepared in advance. You can view the code for the functions here.

def get_cats(df):
    cats = []
    for col in df:
        if pd.api.types.is_string_dtype(df[col]):
            cats.append(len(df[col].unique()))
        else:
            cats.append(np.nan)
    return pd.DataFrame(cats, index=df.columns, columns=['categories'])


def df_info(df: pd.DataFrame, show_rows: int=2, horizontal: bool=True, percentiles: list=[.25, .5, .75], selected_cols: list=None, includes: str=None): 
    df = df.copy()
    if includes == 'objects':
        for col in df:
            if not pd.api.types.is_string_dtype(df[col]): 
                df.drop([col], axis=1, inplace=True)
    elif includes == 'numeric':
        for col in df:
            if not pd.api.types.is_numeric_dtype(df[col]): 
                df.drop([col], axis=1, inplace=True)
    if df.empty: raise ValueError(f'No "{includes}" type columns!')
    
    # in case you just want information on certain rows, specify those columns in selected_cols
    if selected_cols:
        df = df[selected_cols]
    
    # data types
    types = pd.DataFrame(df.dtypes, columns=["dtype"])
    
    # description of the dataframe (mean, median, std, min-max values)
    descr = df.describe(percentiles=percentiles).drop(["count"])
    
    # count missing values
    nans = pd.DataFrame(df.isnull().sum(), columns=["missings"])
    
    # count how many categories are contained in each string type column
    cats = get_cats(df)
    
    # show the first few rows depending on how many you want to show
    head = df.head(show_rows)
    
    # display it either vertically or horizontally
    if horizontal:
        info_df = pd.concat([types, nans, cats, descr.T, head.T], axis=1)
    else:
        info_df = pd.concat([types.T, nans.T, cats.T, descr, head], axis=0)
            
    # show all rows and columns, no matter how large the dataframe is
    with pd.option_context("display.max_rows", None, "display.max_columns", None):
        display(info_df)

df_info(train, horizontal=False, percentiles=[.5], includes="all")

	Id	groupId	matchId	assists	boosts	damageDealt	DBNOs	headshotKills	heals	killPlace	killPoints	kills	killStreaks	longestKill	maxPlace	numGroups	revives	rideDistance	roadKills	swimDistance	teamKills	vehicleDestroys	walkDistance	weaponsAcquired	winPoints	winPlacePerc
dtype	int64	int64	int64	int64	int64	float64	int64	int64	int64	int64	int64	int64	int64	float64	int64	int64	int64	float64	int64	float64	int64	int64	float64	int64	int64	float64
missings	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
categories	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
mean	499.5	1.75983e+06	499.5	0.371	1.134	174.757	0.964	0.341	1.391	42.427	1116.12	1.322	0.699	25.3916	41.064	39.613	0.199	387.059	0.003	3.97792	0.014	0.006	1087.49	3.717	1506.99	0.486571
std	288.819	877624	288.819	0.817329	1.70908	230.343	1.7337	0.809552	2.77954	28.4318	150.125	2.17372	0.806877	51.0182	23.7698	23.1409	0.547447	1095.86	0.0948683	21.0137	0.133498	0.0772656	1142.88	2.97755	39.9435	0.316501
min	0	24	0	0	0	0	0	0	0	1	908	0	0	0	3	3	0	0	0	0	0	0	0	0	1349	0
50%	499.5	1.98371e+06	499.5	0	0	100	0	0	0	38	1057.5	1	1	1.406	29	28	0	0	0	0	0	0	581.55	3	1500	0.4792
max	999	2.7006e+06	999	7	10	2285	22	8	29	98	1792	26	4	415.4	100	99	5	8197	3	251.8	2	1	5176	37	1744	1
0	0	24	0	0	5	247.3	2	0	4	17	1050	2	1	65.32	29	28	1	591.3	0	0	0	0	782.4	4	1458	0.8571
1	1	440875	1	1	0	37.65	1	1	0	45	1072	1	1	13.55	26	23	0	0	0	0	0	0	119.6	3	1511	0.04

def trn_val_split(df, size=0.8):
    idxs = np.random.permutation(range(len(df)))[:int(len(df) * size)]
    trn, val = df.iloc[idxs], df.drop([idxs])
    return trn, val

Learn more

Learn more

Share on

Twitter Facebook Google+ LinkedIn

PUBG Finish Placement

What’s the best strategy to win in PUBG?

Directory Structure

Lets get it started

Share on

You may also enjoy

Package Managing in Python

A Guide to Ignite

Shortcuts for VSCode