私信  •  关注

Andrej Kesely

Andrej Kesely 最近创建的主题
Andrej Kesely 最近回复了
1 年前
回复了 Andrej Kesely 创建的主题 » 如何在python中将一串键值转换为正确的dict?

试试看( regex101 ):

import re

s = "address: C/O John Smith @ Building X, S/W city: new york state: new york    population:        500000"

d = dict(re.findall(r"([^\s]+)\s*:\s*(.*?)\s*(?=[^\s]+:|$)", s))
print(d)

印刷品:

{
    "address": "C/O John Smith @ Building X, S/W",
    "city": "new york",
    "state": "new york",
    "population": "500000",
}
2 年前
回复了 Andrej Kesely 创建的主题 » Python dataframe将DICT列表列转换为具有单个元素的列

尝试:

# if the lists_of_stuff are strings, apply literal_eval
#from ast import literal_eval
#df["lists_of_stuff"] = df["lists_of_stuff"].apply(literal_eval)

df = df.explode("lists_of_stuff")
df = pd.concat([df, df.pop("lists_of_stuff").apply(pd.Series)], axis=1)
print(df)

印刷品:

   other1  other2      a    b      c    d    e    f
0   Susie     123    sam  2.0    NaN  NaN  NaN  NaN
0   Susie     123  diana  NaN  grape  5.0  NaN  NaN
0   Susie     123   jody  NaN      7  NaN  foo  9.0
1  Rachel     456    joe  2.0    NaN  NaN  NaN  NaN
1  Rachel     456  steve  NaN  pizza  NaN  NaN  NaN
1  Rachel     456   alex  NaN      7  NaN  doh  NaN

编辑:要重新索引列,请执行以下操作:

#... code as above
df = df.reset_index(drop=True).reindex(
    [*df.columns[:1]] + [*df.columns[2:]] + [*df.columns[1:2]], axis=1
)
print(df)

印刷品:

   other1      a    b      c    d    e    f  other2
0   Susie    sam  2.0    NaN  NaN  NaN  NaN     123
1   Susie  diana  NaN  grape  5.0  NaN  NaN     123
2   Susie   jody  NaN      7  NaN  foo  9.0     123
3  Rachel    joe  2.0    NaN  NaN  NaN  NaN     456
4  Rachel  steve  NaN  pizza  NaN  NaN  NaN     456
5  Rachel   alex  NaN      7  NaN  doh  NaN     456
1 年前
回复了 Andrej Kesely 创建的主题 » Python将列表中的元素以偶发模式提取到元组中

另一个解决方案:

from itertools import groupby

g1 = (g for v, g in groupby(arr, type) if v is float)
g2 = (g for v, g in groupby(arr, type) if v is str)

out = [(next(a), *[*b, None][:2]) for a, b in zip(g1, g2)]
print(out)

印刷品:

[
    (1150.1, "James", None),
    (3323.1, "Steve", None),
    (9323.1, "John", None),
    (1233.1, "Gary", "criminal"),
    (3293.1, "Josh", None),
    (9232.1, "Daniel", "criminal"),
]

阅读第一个表格最简单的方法是使用 pandas.read_html :

import pandas as pd

url = "http://www.godaycare.com/child-care-cost/saskatchewan"

df = pd.read_html(url)[0]
print(df.to_markdown())

印刷品:

类型 老年猫。 斑点 平均成本(美元) 条目
0 得到许可的 婴儿 全职的 751.02 717
1. 得到许可的 婴儿 兼职的 41.31 187
2. 无照 婴儿 全职的 699.56 287
3. 无照 婴儿 兼职的 31.05 50
4. 得到许可的 蹒跚学步的孩子 全职的 661.04 604
5. 得到许可的 蹒跚学步的孩子 兼职的 32.69 148
6. 无照 蹒跚学步的孩子 全职的 633.01 342
7. 无照 蹒跚学步的孩子 兼职的 35.99 69
8. 得到许可的 就学前的 全职的 595.45 327
9 得到许可的 就学前的 兼职的 30.85 66
10 无照 就学前的 全职的 602.82 195
11 无照 就学前的 兼职的 30.33 30
12 得到许可的 学前班 全职的 562.87 87
13 得到许可的 学前班 兼职的 28.29 38
14 无照 学前班 全职的 549.12 57
15 无照 学前班 兼职的 23.01 13
16 得到许可的 学龄 全职的 605.34 94
17 得到许可的 学龄 兼职的 25.45 33
18 无照 学龄 全职的 434.9 98
19 无照 学龄 兼职的 19 25

编辑:使用 beautifulsoup :

import requests
from bs4 import BeautifulSoup

URL = "http://www.godaycare.com/child-care-cost/saskatchewan"
page = requests.get(URL)

soup = BeautifulSoup(page.content, "html.parser")

for row in soup.find("table").find_all("tr"):
    tds = [td.text for td in row.find_all(["td", "th"])]
    print(("{:<20}" * len(tds)).format(*tds))

印刷品:

Type                Age Cat.            Spot                AVG. Cost ($)       Entries             
Licensed            Infant              Full-Time           751.02              717                 
Licensed            Infant              Part-Time           41.31               187                 
Unlicensed          Infant              Full-Time           699.56              287                 
Unlicensed          Infant              Part-Time           31.05               50                  
Licensed            Toddler             Full-Time           661.04              604                 
Licensed            Toddler             Part-Time           32.69               148                 
Unlicensed          Toddler             Full-Time           633.01              342                 
Unlicensed          Toddler             Part-Time           35.99               69                  
Licensed            Preschool           Full-Time           595.45              327                 
Licensed            Preschool           Part-Time           30.85               66                  
Unlicensed          Preschool           Full-Time           602.82              195                 
Unlicensed          Preschool           Part-Time           30.33               30                  
Licensed            Kindergarten        Full-Time           562.87              87                  
Licensed            Kindergarten        Part-Time           28.29               38                  
Unlicensed          Kindergarten        Full-Time           549.12              57                  
Unlicensed          Kindergarten        Part-Time           23.01               13                  
Licensed            Schoolage           Full-Time           605.34              94                  
Licensed            Schoolage           Part-Time           25.45               33                  
Unlicensed          Schoolage           Full-Time           434.90              98                  
Unlicensed          Schoolage           Part-Time           19.00               25                  
2 年前
回复了 Andrej Kesely 创建的主题 » 如何将行值列表和列名列表与python结合起来?[重复]

你可以创造 np.array 然后使用 .reshape :

df = pd.DataFrame(np.array(data).reshape((-1, len(columns))), columns=columns)
print(df)

印刷品:

      Name Age     City  Score
0     jack  34   Sydney    155
1     Riti  31    Delhi  177.5
2     Aadi  16   Mumbai     81
3    Mohit  31    Delhi    167
4    Veena  12    Delhi    144
5  Shaunak  35   Mumbai    135
6    Shaun  35  Colombo    111
3 年前
回复了 Andrej Kesely 创建的主题 » Python-沃尔玛的类别名称Web抓取

仅使用 beautifulsoup :

import json
import requests
from bs4 import BeautifulSoup

url = "https://www.walmart.com/browse/snacks-cookies-chips/cookies/976759_976787_1001391"

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:88.0) Gecko/20100101 Firefox/88.0",
    "Accept-Language": "en-US,en;q=0.5",
}

soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")
data = json.loads(soup.select_one("#searchContent").contents[0])

# uncomment to see all data:
# print(json.dumps(data, indent=4))


def find_departments(data):
    if isinstance(data, dict):
        if "name" in data and data["name"] == "Departments":
            yield data
        else:
            for v in data.values():
                yield from find_departments(v)
    elif isinstance(data, list):
        for v in data:
            yield from find_departments(v)


departments = next(find_departments(data), {})

for d in departments.get("values", []):
    print(
        "{:<30} {}".format(
            d["name"], "https://www.walmart.com" + d["baseSeoURL"]
        )
    )

印刷品:

Chocolate Cookies              https://www.walmart.com/browse/food/chocolate-cookies/976759_976787_1001391_4007138
Cookies                        https://www.walmart.com/browse/food/cookies/976759_976787_1001391_8331066
Butter Cookies                 https://www.walmart.com/browse/food/butter-cookies/976759_976787_1001391_7803640
Shortbread Cookies             https://www.walmart.com/browse/food/shortbread-cookies/976759_976787_1001391_8026949
Coconut Cookies                https://www.walmart.com/browse/food/coconut-cookies/976759_976787_1001391_6970757
Healthy Cookies                https://www.walmart.com/browse/food/healthy-cookies/976759_976787_1001391_7466302
Keebler Cookies                https://www.walmart.com/browse/food/keebler-cookies/976759_976787_1001391_3596825
Biscotti                       https://www.walmart.com/browse/food/biscotti/976759_976787_1001391_2224095
Gluten-Free Cookies            https://www.walmart.com/browse/food/gluten-free-cookies/976759_976787_1001391_4362193
Molasses Cookies               https://www.walmart.com/browse/food/molasses-cookies/976759_976787_1001391_3338971
Peanut Butter Cookies          https://www.walmart.com/browse/food/peanut-butter-cookies/976759_976787_1001391_6460174
Pepperidge Farm Cookies        https://www.walmart.com/browse/food/pepperidge-farm-cookies/976759_976787_1001391_2410932
Snickerdoodle Cookies          https://www.walmart.com/browse/food/snickerdoodle-cookies/976759_976787_1001391_8926167
Sugar-Free Cookies             https://www.walmart.com/browse/food/sugar-free-cookies/976759_976787_1001391_5314659
Tate's Cookies                 https://www.walmart.com/browse/food/tate-s-cookies/976759_976787_1001391_9480535
Vegan Cookies                  https://www.walmart.com/browse/food/vegan-cookies/976759_976787_1001391_8007359
2 年前
回复了 Andrej Kesely 创建的主题 » 使用python将存储在列表中的DMS值转换为csv文件

如果 lst 这是你的问题清单,你可以做:

import csv

with open("data.csv", "w") as f_out:
    csv_writer = csv.writer(f_out)
    i = iter(lst)
    csv_writer.writerow(["deg", "min", "sec", "direction"])
    for t in zip(*[i] * 4):
        csv_writer.writerow(t)

这写着 data.csv :

deg,min,sec,direction
9,22,26.9868,N
118,23,48.876,E
9,22,18.6132,N
118,23,5.2188,E
9,19,41.4804,N
118,19,23.1852,E

LibreOffice截图:

enter image description here

2 年前
回复了 Andrej Kesely 创建的主题 » Dataframe具有多个值和一个热编码的同一个键(Python、Pandas)?

IIUC, .pivot_table() 具有 aggfunc="size" 生成您的结果:

x = df.pivot_table(index="id", columns="val", aggfunc="size").reset_index()
x.columns.name = None
print(x)

印刷品:

   id  admin  fin_dep_ds  local_usr
0   0      1           1          1
2 年前
回复了 Andrej Kesely 创建的主题 » 在Python中遍历列表列表列表

尝试:

print(*[i for v in zip(*a) for i in v], sep="\n")

印刷品:

[0, 0]
[1, 0]
[2, 0]
[3, 0]
[4, 0]
[6, 0]
2 年前
回复了 Andrej Kesely 创建的主题 » 从嵌套字典Python创建值列表

你可以用3个 for 列表理解中的循环:

listionary = [
    {"a": [{"id": 30, "name": "bob"}, {"id": 50, "name": "mike"}]},
    {"b": [{"id": 99, "name": "guy"}, {"id": 77, "name": "hal"}]},
]

lst = [d["id"] for d in listionary for v in d.values() for d in v]
print(lst)

印刷品:

[30, 50, 99, 77]

说明:

for d in listionary -这将迭代列表中的所有项目 listionary ( {"a":...}, {"b":...} )

for v in d.values() -这将迭代这些字典中的所有项( [{"id:...}, {"id":...}], [...] )

for d in v -这将从这些列表中获取所有词典( {"id:...}, {"id":...}, .... )

d["id"] -这将从密钥中获取值 "id"

2 年前
回复了 Andrej Kesely 创建的主题 » 使用python通过深层嵌套dict中的特定键获取所有值

可以使用递归。如果 dct 你的字典是从问题中提取的吗

def get_ids(d):
    if isinstance(d, dict):
        for k, v in d.items():
            if k == "id":
                yield v
            else:
                yield from get_ids(v)
    elif isinstance(d, list):
        for v in d:
            yield from get_ids(v)


ids = list(get_ids(dct))
print(ids)

印刷品:

[1, 2, 4, 5, 12, 14]
2 年前
回复了 Andrej Kesely 创建的主题 » 用python进行网页抓取,javascript输出

试着改变 User-Agent 向服务器发出请求时的HTTP头:

import requests
from bs4 import BeautifulSoup

headers = {
    "User-Agent": "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
}

url = "https://www.amiqus.com/jobs?options=,20993,20877,20876&page=1"

soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")
for title in soup.select(".attrax-vacancy-tile__title"):
    print(title.get_text(strip=True))

印刷品:

Engine Programmer C++ AAA opportunity - Remote working
Senior Programmer
Gameplay Programmer

...
2 年前
回复了 Andrej Kesely 创建的主题 » 从与条件python匹配的列表创建字典

尝试:

main = ["dayn is the one", "styn is a main", "tyrn is the third main"]
lst2 = ["dayz", "stzn", "tyrm"]
lst3 = ["styzerwe", "tyrmadsf", "dayttt"]
lst4 = ["dayl", "styyzt", "tyrl"]


tmp = {}
for l in [lst2, lst3, lst4]:
    for v in l:
        tmp.setdefault(v[:3], []).append(v)

out = {v: tmp.get(v[:3], []) for v in main}
print(out)

印刷品:

{
    "dayn is the one": ["dayz", "dayttt", "dayl"],
    "styn is a main": ["styzerwe", "styyzt"],
    "tyrn is the third main": ["tyrm", "tyrmadsf", "tyrl"],
}
2 年前
回复了 Andrej Kesely 创建的主题 » 将python列表拆分为子列表

尝试 itertools.groupby :

from itertools import groupby

acq = ["A1", "A2", "D", "A3", "A4", "A5", "D", "A6"]

for v, g in groupby(acq, lambda v: v == "D"):
    if not v:
        print(list(g))

印刷品:

['A1', 'A2']
['A3', 'A4', 'A5']
['A6']
3 年前
回复了 Andrej Kesely 创建的主题 » Python排序学生课程的嵌套列表

itertools.groupby 分组 data_list 按主题:

from pprint import pprint
from statistics import mean
from itertools import groupby


data_list = [
 ["John", "Physics", 5], ["John", "PC", 7], ["John", "Math", 8],
 ["Mary", "Physics", 6], ["Mary", "PC", 10], ["Mary", "Algebra", 7],
 ["Helen", "Physics", 7], ["Helen","PC", 6], ["Helen", "Algebra", 8],
 ["Helen", "Analysis", 10], ["Bill", "PC", 10], ["Bill", "Analysis", 6],
 ["Bill", "Math", 8], ["Bill", "Biology", 6], ["Michael", "Analysis", 10]
]

out = []
for k, g in groupby(sorted(data_list, key=lambda k: k[1]), lambda k: k[1]):
    g = [*g]
    out.append([k, len(g), mean(v[2] for v in g)])

pprint(out)

[['Algebra', 2, 7.5],
 ['Analysis', 3, 8.666666666666666],
 ['Biology', 1, 6],
 ['Math', 2, 8],
 ['PC', 4, 8.25],
 ['Physics', 3, 6]]
4 年前
回复了 Andrej Kesely 创建的主题 » 如何根据python中的数字列表生成数量?

l = [2, 1, 3]

print([j for i, v in enumerate(l, 1) for j in [i] * v])

印刷品:

[1, 1, 2, 3, 3, 3]

答案基于@kaya3评论-更有效!(谢谢!):

print( [i for i, v in enumerate(l, 1) for _ in range(v)] )
4 年前
回复了 Andrej Kesely 创建的主题 » 移除嵌套列表python中特定项之后的所有项

你可以用 itertools.takewhile 对于此任务:

from itertools import takewhile

my_lst = [['John C, CEO & Co-Funder, ABC company','Eric P, CFO, QWE company','My Profile','Herber W, CTO, PPP company'],
['Eli S, AVP, ACV Company', 'My Profile','Brian M, Analyst, LPL company'],
['Diana F, Managing Director, MS company','Alan X, Associate, JPM company','My Profile', 'Jame R, Manager, AL company']]

out = []
for i in my_lst:
    out.append([*takewhile(lambda k: k!='My Profile', i)])

from pprint import pprint
pprint(out)

[['John C, CEO & Co-Funder, ABC company', 'Eric P, CFO, QWE company'],
 ['Eli S, AVP, ACV Company'],
 ['Diana F, Managing Director, MS company', 'Alan X, Associate, JPM company']]

编辑(列表理解版本):

out = [[*takewhile(lambda k: k!='My Profile', i)] for i in my_lst]
4 年前
回复了 Andrej Kesely 创建的主题 » 一行语法中的Python if else条件[重复]

你在找 list comprehension :

string = 'abcdea'

ab = [c for c in string if c in 'ab']

print(ab)

印刷品:

['a', 'b', 'a']
4 年前
回复了 Andrej Kesely 创建的主题 » Python代码不按指定数量递增[重复]

您可以在中阅读有关浮点问题的信息 official documentation .

作为对代码的快速修复,您可以使用 decimal 标准库中的模块:

from decimal import Decimal

xNum = 0
for x in range(180):
    print(xNum)
    xNum = Decimal('0.1') + xNum

这张照片:

0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
..etc.
4 年前
回复了 Andrej Kesely 创建的主题 » 如何提高python代码的时间效率?
n = '5'
i = '2 5 5 1 1'

def compute(n, i):
    s1 = set(range(1, n+1))
    yield from sorted(s1.difference(i))


for val in compute(int(n), map(int, i.split()) ):
    print(val, end=' ')

印刷品:

3 4 
4 年前
回复了 Andrej Kesely 创建的主题 » 在python中,只在双引号后拆分字符串

你可以使用 ast.literal_eval 然后添加 '"' 手动:

s = '"BLAX", "BLAY", "BLAZ, BLUBB", "BLAP"'

from ast import literal_eval

data = literal_eval('(' + s + ')')

for d in data:
    print('"{}"'.format(d))

印刷品:

"BLAX"
"BLAY"
"BLAZ, BLUBB"
"BLAP"
data = '''{"messages":
[
    {"timestamp": "123456789", "timestampIso": "2019-06-26 09:51:00", "agentId": "2001-100001", "skillId": "2001-20000", "agentText": "That customer was great"},
    {"timestamp": "123456789", "timestampIso": "2019-06-26 09:55:00", "agentId": "2001-100001", "skillId": "2001-20001", "agentText": "That customer was stupid I hope they don't phone back"},
    {"timestamp": "123456789", "timestampIso": "2019-06-26 09:57:00", "agentId": "2001-100001", "skillId": "2001-20002", "agentText": "Line number 3"},
    {"timestamp": "123456789", "timestampIso": "2019-06-26 09:59:00", "agentId": "2001-100001", "skillId": "2001-20003", "agentText": ""}
]
}
'''

import json
from collections import Counter
from pprint import pprint

def words(data):
    for m in data['messages']:
        yield from m['agentText'].split()

c = Counter(words(json.loads(data)))
pprint(c.most_common())

印刷品:

[('That', 2),
 ('customer', 2),
 ('was', 2),
 ('great', 1),
 ('stupid', 1),
 ('I', 1),
 ('hope', 1),
 ('they', 1),
 ("don't", 1),
 ('phone', 1),
 ('back', 1),
 ('Line', 1),
 ('number', 1),
 ('3', 1)]
4 年前
回复了 Andrej Kesely 创建的主题 » python 3中列表中的特定模式字符串
import re

ZTon = ['one-- and preferably only one --obvious', " Hello World", 'Now is better than never.', 'Although never is often better than *right* now.']

def gen(lst):
    for s in lst:
        s = ''.join(i.strip() for g in re.findall(r'(?:-([^-]+)-)|(?:\*([^*]+)\*)', s) for i in g)
        if s:
            yield s

print(list(gen(ZTon)))

印刷品:

['and preferably only one', 'right']
4 年前
回复了 Andrej Kesely 创建的主题 » 如何使用漂亮的soup和python为纳斯达克网站中的表提取HTML代码

你看到的桌子在 <iframe> . 加载此内容 <iframe> 您可以使用此脚本:

import requests
from bs4 import BeautifulSoup

url = "https://www.nasdaq.com/symbol/aapl/revenue-eps"
soup = BeautifulSoup(requests.get(url).text,"html.parser")

iframe_url = soup.select_one('iframe#frmMain')['src']

requests.packages.urllib3.disable_warnings()
soup = BeautifulSoup(requests.get(iframe_url, verify=False).text,"html.parser")

table = soup.select_one('table.ipos')

for tr in table.select('tr'):
    for td in tr.select('td'):
        print('{: <30}'.format(td.get_text(strip=True)), end='')
    print()

印刷品:

Revenue / EPS Summary *                                     Revenue / EPS Summary *                                     
                              Revenue / EPS Summary *                                     


Fiscal Quarter                                              2019(Fiscal Year)                                           2018(Fiscal Year)                                           2017(Fiscal Year)             


December                                                                                                                
Revenue                       $84,310(m)                    $88,293(m)                    $78,351(m)                    
EPS                           4.18 (12/29/2018)             3.89 (12/30/2017)             3.36 (12/31/2016)             
Dividends                     0.73                          0.63                          0.57                          

March                                                                                                                   
Revenue                       $58,015(m)                    $61,137(m)                    $52,896(m)                    
EPS                           2.48 (3/30/2019)              2.74 (3/31/2018)              2.1 (4/1/2017)                
Dividends                     0.77                          0.73                          0.63                          

June                                                                                                                    
Revenue                       $53,809(m)                    $53,265(m)                    $45,408(m)                    
EPS                           2.2 (6/29/2019)               2.36 (6/30/2018)              1.68 (7/1/2017)               
Dividends                     0.77                          0.73                          0.63                          

September  (FYE)                                                                                                        
Revenue                                                     $62,900(m)                    $52,579(m)                    
EPS                                                         2.92 (9/29/2018)              2.07 (9/30/2017)              
Dividends                                                   0.73                          0.63                          


Totals                                                                                                                  
Revenue                       $196,134(m)                   $265,595(m)                   $229,234(m)                   
EPS                           8.86                          11.91                         9.21                          
Dividends                     2.27                          2.82                          2.46                          

Previous 3 Years   

html.HTMLParser doc

from html.parser import HTMLParser

class MyHTMLParser(HTMLParser):
    def __init__(self):
        super().__init__()
        self.__tags = {}
        self.__counter = 1

        self.__result = []

    def handle_starttag(self, tag, attrs):
        if not tag in self.__tags:
            self.__tags[tag] = '0x{:02x}'.format(self.__counter)
            self.__counter += 1
        self.__result.append(self.__tags[tag])

    def handle_endtag(self, tag):
        self.__result.append('0xff')

    def handle_data(self, data):
        self.__result.append(data.strip())

    @property
    def result(self):
        return [v for v in self.__result if v]

parser = MyHTMLParser()
parser.feed('''<body>
    <entry1> 0x12 </entry1>
    <entry2> 0x01 </entry2>
</body>''')

print(' '.join(parser.result))

印刷品:

0x01 0x02 0x12 0xff 0x03 0x01 0xff 0xff