Changes

1,667 bytes added , 21:28, 12 February 2022

m

no edit summary

Line 5: Line 5:

==Read CSV==

+

+

df = pd.read_csv('news_2019.05.10.csv')

+

</syntaxhighlight>

+

==DataSeries==

+

+

s = pd.Series(['banana', 42])

+

s = pd.Series(['banana', 42], index=['Fruit', 'Calories'])

+

s.values

+

s.keys()

+

s.values[0]

+

s.keys()[0]

+

s.min()

+

s.max()

+

s.std()

+

</syntaxhighlight>

+

==Dataframe==

+

===Create===

+

+

scientists = pd.DataFrame({

+

'Name': ['Rosaline Franklin', 'William Gosset'],

+

'Occupation': ['Chemist', 'Statistician'],

+

'Born': ['1920-07-25', '1876-06-13'],

+

'Died': ['1958-04-16', '1937-10-16'],

+

})

+

</syntaxhighlight>If you want to keep the order of columns:<syntaxhighlight lang="python3">

+

scientists = pd.DataFrame({

+

'Occupation': ['Chemist', 'Statistician'],

+

'Born': ['1920-07-25', '1876-06-13'],

+

'Died': ['1958-04-16', '1937-10-16'],

+

}, index=['Rosaline Franklin', 'William Gosset'], columns=['Occupation', 'Born', 'Died'])

+

</syntaxhighlight>

+

===From CSV===

news = pd.read_csv('news_2019.05.10.csv')

</syntaxhighlight>

−

==~~Dataframe~~==

+

===Info===

+

+

df.index

+

df.columns

+

df.values

+

df.shape

+

df.dtypes

+

df.head()

+

df.tail()

+

df.info()

+

df.describe()

+

</syntaxhighlight>

===Select 1 row===

−

~~texts~~.iloc[[1]]

+

df.iloc[[1]] # for positional indexing

+

df.loc[0] # for label based

+

df.iloc[-1] == df.loc[df.shape[0]-1]

+

</syntaxhighlight>

+

===Select specific rows===

+

+

df.loc[[9, 99, 999]]

</syntaxhighlight>

===Select 1 column===

−

sumarys = ~~news~~[['summary']]

+

sumarys = df[['summary']]

# Or

+

list(df['one'])

dfToList = df['one'].tolist()

+

</syntaxhighlight>

+

===Select multiple columns===

+

+

df[['column1', 'column2', 'column3']]

</syntaxhighlight>

===Select 1 cell===

−

~~texts~~.iloc[1][1]

+

df.iloc[1][1]

−

~~# Or~~

+

df.iloc[1]['summary']

−

~~texts~~.iloc[1]['summary']

+

df.iloc[1, 3]

+

df.loc[1, 'summary']

+

</syntaxhighlight>

+

===Subset multiple rows and multiple columns===

+

+

df.iloc[[1,34,56],[2,4,5]]

+

df.loc[[1,34,56],['modification_date', 'content']]

</syntaxhighlight>

[[Category:Python]]

+

[[Category:DataScience]]

Rafahsolis

Bureaucrats, Administrators

2,306

edits

Changes

Pandas (edit)

Revision as of 21:28, 12 February 2022

Navigation menu

Search