Difference between revisions of "Pandas"

Revision as of 11:09, 21 May 2019

Install

pip install pandas

Read CSV

df = pd.read_csv('news_2019.05.10.csv')

DataSeries

s = pd.Series(['banana', 42])
s = pd.Series(['banana', 42], index=['Fruit', 'Calories'])

Dataframe

Create

scientists = pd.DataFrame({
    'Name': ['Rosaline Franklin', 'William Gosset'],
    'Occupation': ['Chemist', 'Statistician'],
    'Born': ['1920-07-25', '1876-06-13'],
    'Died': ['1958-04-16', '1937-10-16'],
})

From CSV

news = pd.read_csv('news_2019.05.10.csv')

Info

df.index
df.columns
df.values
df.shape
df.dtypes
df.head()
df.tail()
df.info()

Select 1 row

df.iloc[[1]] # for positional indexing
df.loc[0]    # for label based 
df.iloc[-1] == df.loc[df.shape[0]-1]

Select specific rows

df.loc[[9, 99, 999]]

Select 1 column

sumarys = df[['summary']]
# Or
list(df['one'])
dfToList = df['one'].tolist()

Select multiple columns

df[['column1', 'column2', 'column3']]

Select 1 cell

df.iloc[1][1]
df.iloc[1]['summary']
df.iloc[1, 3]

df.loc[1, 'summary']

Subset multiple rows and multiple columns

df.iloc[[1,34,56],[2,4,5]]
df.loc[[1,34,56],['modification_date', 'content']]

@@ Line 7: / Line 7: @@
 <syntaxhighlight lang="python3">
 df = pd.read_csv('news_2019.05.10.csv')
+</syntaxhighlight>
+== DataSeries ==
+<syntaxhighlight lang="python3">
+s = pd.Series(['banana', 42])
+s = pd.Series(['banana', 42], index=['Fruit', 'Calories'])
 </syntaxhighlight>
 ==Dataframe==
+=== Create ===
+<syntaxhighlight lang="python3">
+scientists = pd.DataFrame({
+    'Name': ['Rosaline Franklin', 'William Gosset'],
+    'Occupation': ['Chemist', 'Statistician'],
+    'Born': ['1920-07-25', '1876-06-13'],
+    'Died': ['1958-04-16', '1937-10-16'],
+})
+</syntaxhighlight>
+=== From CSV ===
+<syntaxhighlight lang="python3">
+news = pd.read_csv('news_2019.05.10.csv')
+</syntaxhighlight>
 ===Info===
@@ Line 30: / Line 51: @@
 </syntaxhighlight>
-=== Select specific rows ===
+===Select specific rows===
 <syntaxhighlight lang="python3">
 df.loc[[9, 99, 999]]
@@ Line 57: / Line 78: @@
 </syntaxhighlight>
-=== Subset multiple rows and multiple columns ===
+===Subset multiple rows and multiple columns===
 <syntaxhighlight lang="python3">
 df.iloc[[1,34,56],[2,4,5]]

Difference between revisions of "Pandas"

Revision as of 11:09, 21 May 2019

Contents

Install

Read CSV

DataSeries

Dataframe

Create

From CSV

Info

Select 1 row

Select specific rows

Select 1 column

Select multiple columns

Select 1 cell

Subset multiple rows and multiple columns

Navigation menu

Search