빅데이터 분석기사/작업 유형 1 문제 풀이

replace, 누적합 cumsum, 파라미터 method = 'bfill', standardization

유방울 2023. 6. 18. 20:08

T1-7

df[df['f4']=='ESFJ']

엑세서 str 은 데이터 프레임에 안됨

df['f4'].replace('ESFJ','ISFJ')

unsupported operand type(s) for &: 'str' and 'str' 오류시 =로 했는지 체크

-> 데이터 처음부터 다시 런타임하기 

con1 = df['city']=='경기'
con2 = df['f4']=='ISFJ'
result = df[con1&con2]['age'].max()
print(int(result))

T1-8

df2 = df[df['f2']==1]['f1'].cumsum()

fillna(method='bfill') : back 값 채우기 

pad : 앞 forward 같으로 채우기

-> 파라미터 암기 힘드니까 help(df.fillna) 사용하기!!!

# s.fillna(method='bfill') # 바로 back값으로 채워넣는 방법
# s.fillna(method='pad') # 앞값으로 채워넣는 방법

df2 = df2.fillna(method='bfill')
print(int(df2.mean()))

fillna는 데이터프레임에서 사용하는 거고 파이썬은 이 함수를 몰라서 df를 붙여야 함 

sum, count 이런 애들은 파이썬이 알고 있어서 안 붙여도 암

help(df.fillna)

help(df.fillna)

Help on method fillna in module pandas.core.frame:

fillna(value: 'object | ArrayLike | None' = None, method: 'FillnaOptions | None' = None, axis: 'Axis | None' = None, inplace: 'bool' = False, limit=None, downcast=None) -> 'DataFrame | None' method of pandas.core.frame.DataFrame instance
    Fill NA/NaN values using the specified method.
    
    Parameters
    ----------
    value : scalar, dict, Series, or DataFrame
        Value to use to fill holes (e.g. 0), alternately a
        dict/Series/DataFrame of values specifying which value to use for
        each index (for a Series) or column (for a DataFrame).  Values not
        in the dict/Series/DataFrame will not be filled. This value cannot
        be a list.
    method : {'backfill', 'bfill', 'pad', 'ffill', None}, default None
        Method to use for filling holes in reindexed Series
        pad / ffill: propagate last valid observation forward to next valid
        backfill / bfill: use next valid observation to fill gap.
    axis : {0 or 'index', 1 or 'columns'}
        Axis along which to fill missing values.
    inplace : bool, default False
        If True, fill in-place. Note: this will modify any
        other views on this object (e.g., a no-copy slice for a column in a
        DataFrame).
    limit : int, default None
        If method is specified, this is the maximum number of consecutive
        NaN values to forward/backward fill. In other words, if there is
        a gap with more than this number of consecutive NaNs, it will only
        be partially filled. If method is not specified, this is the
        maximum number of entries along the entire axis where NaNs will be
        filled. Must be greater than 0 if not None.
    downcast : dict, default is None
        A dict of item->dtype of what to downcast if possible,
        or the string 'infer' which will try to downcast to an appropriate
        equal type (e.g. float64 to int64 if possible).

표준화

시리즈를 데이터 프레임화 시키고 싶으면 인덱시 두 번 하기

df[df['f5']] 이거 아님!!! -> 이렇게 하면 넘파이가 됨

f5를 굳이 만들지 말

scaler = StandardScaler()
df['f5'] = scaler.fit_transform.df[['f5']]
print(df['f5'].median())