(
笔记
:你可能想看看用熊猫来做这个。请参阅此答案的下半部分以获取示例。)
直接回答你的问题:
如果您想要实际的类型化值(int表示英亩,float表示纬度/经度),下面是一种使用正则表达式的方法:
import re
def readParkFile(fileName="national_parks.csv"):
parkList = []
f = open(fileName, "r")
keys = f.readline().strip("\n").split(",")
for line in f:
v = re.search('(.*?),(.*?),(.*?),(.*?),(.*?),(.*?),(.*?),(.*)', line).groups()
Code, Name, State, Acres, Latitude, Longitude, Date, Description = v[0], v[1], v[2], int(v[3]), float(v[4]), float(v[5]), v[6], v[7][1:-1]
parkDict = dict(zip(keys, [Code, Name, State, Acres, Latitude, Longitude, Date, Description]))
parkList.append(parkDict)
return parkList
parkList = readParkFile()
在循环内部,
re.search()
在所有逗号分隔的组中使用非贪婪限定符,但最后一个组除外(
Description
),然后将数字字段转换为数字,并从中删除周围的引号
描述
。然后将这些类型化的值与
keys
使用
zip()
变成了一个
dict
然后再将其附加到结果中,
parkList
.循环csv文件中的所有行后,函数返回
公园名单
.
结果dict列表的第一个dictionary元素如下所示:
{'Code': 'ACAD', 'Name': 'Acadia National Park', 'State': 'ME', 'Acres': 47390, 'Latitude': 44.35, 'Longitude': -68.21, 'Date': '1919-02-26', 'Description': 'Covering most of Mount Desert Island and other coastal islands, Acadia features the tallest mountain on the Atlantic coast of the United States, granite peaks, ocean shoreline, woodlands, and lakes. There are freshwater, estuary, forest, and intertidal habitats.'}
另一种使用熊猫的方法是:
在熊猫中,你可以这样做:
import pandas as pd
df = pd.read_csv("national_parks.csv")
print(df)
print(df.dtypes)
parkList = df.to_dict('records')
print(parkList[0])
它将给出以下输出:
Code Name State Acres Latitude Longitude Date Description
0 ACAD Acadia National Park ME 47390 44.35 -68.21 1919-02-26 Covering most of Mount Desert Island and other...
1 ARCH Arches National Park UT 76519 38.68 -109.57 1971-11-12 This site features more than 2,000 natural san...
2 BADL Badlands National Park SD 242756 43.75 -102.50 1978-11-10 The Badlands are a collection of buttes, pinna...
3 BIBE Big Bend National Park TX 801163 29.25 -103.25 1944-06-12 Named for the prominent bend in the Rio Grande...
4 BISC Biscayne National Park FL 172924 25.65 -80.08 1980-06-28 The central part of Biscayne Bay, this mostly ...
5 BLCA Black Canyon of the Gunnison National Park CO 32950 38.57 -107.72 1999-10-21 The park protects a quarter of the Gunnison Ri...
6 BRCA Bryce Canyon National Park UT 35835 37.57 -112.18 1928-02-25 Bryce Canyon is a geological amphitheater on t...
7 CANY Canyonlands National Park UT 337598 38.20 -109.93 1964-09-12 This landscape was eroded into a maze of canyo...
8 CARE Capitol Reef National Park UT 241904 38.20 -111.17 1971-12-18 The park's Waterpocket Fold is a 100-mile (160...
9 CAVE Carlsbad Caverns National Park NM 46766 32.17 -104.44 1930-05-14 Carlsbad Caverns has 117 caves, the longest of...
10 CHIS Channel Islands National Park CA 249561 34.01 -119.42 1980-03-05 Five of the eight Channel Islands are protecte...
11 CONG Congaree National Park SC 26546 33.78 -80.78 2003-11-10 On the Congaree River, this park is the larges...
12 CRLA Crater Lake National Park OR 183224 42.94 -122.10 1902-05-22 Crater Lake lies in the caldera of an ancient ...
13 CUVA Cuyahoga Valley National Park OH 32950 41.24 -81.55 2000-10-11 This park along the Cuyahoga River has waterfa...
Code object
Name object
State object
Acres int64
Latitude float64
Longitude float64
Date object
Description object
dtype: object
{'Code': 'ACAD', 'Name': 'Acadia National Park', 'State': 'ME', 'Acres': 47390, 'Latitude': 44.35, 'Longitude': -68.21, 'Date': '1919-02-26', 'Description': 'Covering most of Mount Desert Island and other coastal islands, Acadia features the tallest mountain on the Atlantic coast of the United States, granite peaks, ocean shoreline, woodlands, and lakes. There are freshwater, estuary, forest, and intertidal habitats.'}
正如你所见,只需一个电话
read_csv()
,pandas解析csv文件,计算出每列的数据类型,并将所有这些组合成一个DataFrame对象。然后,您可以通过调用
to_dict()
带有参数的DataFrame对象的
'records'
.