我正在使用pandas read_html从几个HTML文件中读取表格,并使用pandas的ExcelWriter将它们放在Excel文件中。
我遇到的问题是,每个文件在我想删除的表上方都有14行垃圾数据;我发现了一些线程建议使用跳过行,它可以删除表上方的数据,但也会删除表中的前14行。
-
对于如何在不丢失表中的任何行的情况下删除表上方的行,是否有人有任何建议?
-
此外,我还使用index_col=0去掉了行上的索引,但找不到语法来去掉列上的索引?
任何帮助或建议都将不胜感激。
这是我的阅读HTML调用:
for i in os.listdir(dl):
if "Export" in i:
for df in pd.read_html(i, skiprows = 14, index_col = 0):
df_list.append(df)
dfs = pd.concat(df_list)
这是我的文件的格式,包含几行垃圾数据和下面的表:
===============================================
GPF采购订单预测
生成日期:2018-08-30
订货日期:2018-09-08
交货日期0000-00-00
供应商编号:全部
仓库:全部
===============================================
仓库项目编号项目描述UPC编号包装尺寸预测
XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX XXXX
HTML文件的前100行:
<!-- For export to excel style needs to be written on the page-->
<style type="text/css">
.Header
{
font-weight: bold;
}
.HeadUnderline
{
font-weight: bold;
text-decoration: underline;
}
</style>
</head>
<body id="portal">
<form name="frmMain" method="post" action="Export.aspx?DcNbr=0&VendorNbr=0&OrdDate=2018-09-01&GenDate=2018-08-30&DivNbr=0&DelDate=0000-00-00" id="frmMain">
<div>
<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="/wEPDwUKLTg0NDMyMzg5OGQYAQUJZ3ZSZXN1bHRzDzwrAAwBCAIBZC77FhJcYYUB/Yk3jdfFNSAWWS9MSP5BghZFEKqOFLXh" />
<!-- c1under - to use this page as a popup window without the header change the id from rlHeader
to rlStyle. The rlFooter literal could be removed if you do not want the footer on the popup window.
-->
<div id="main-content-area" style="vertical-align: top;">
<table width="100%" border="0" bordercolor="#FFCC00" cellpadding="0" cellspacing="0" align="center" style="vertical-align: top">
<tr style="vertical-align: top" align="center">
<td style="vertical-align: top; border: solid 2 black;" align="center" colspan="8">
<span id="lblAppTitle" class="HeadUnderline">GPF Purchase Order Forecasts</span>
</td>
</tr>
<tr>
<td colspan="8">
</td>
</tr>
<tr style="height: 27px">
<td align='right' colspan="8">
<span id="lblGenDate" class="Header">Generation Date:</span>
<span id="lblGenDateValue">2018-08-30</span>
</td>
</tr>
<tr>
<td colspan="8">
<span id="lblOrderDate" class="Header">Order Date:</span>
<span id="lblOrderDateValue">2018-09-01</span>
</td>
</tr>
<tr>
<td colspan="8">
<span id="lblDeliveryDate" class="Header">Delivery Date</span>
<span id="lblDeliveryDateValue">0000-00-00</span>
</td>
</tr>
<tr>
<td colspan="8">
</td>
</tr>
<tr style="height: 27px">
<td align="right" colspan="7">
<span id="lblVendorNumber" class="Header">Vendor No.:</span>
</td>
<td align="left">
<span id="lblVendorNumberValue">ALL</span>
</td>
</tr>
<tr>
<td id="vendorAddress" align="right"></td>
<td colspan="7">
</td>
</tr>
<tr>
<td colspan="8">
</td>
</tr>
<tr style="height: 27px">
<td align='right' colspan="7">
<span id="lblWarehouse" class="Header">Warehouse:</span>
</td>
<td align="left">
<span id="lblWarehouseValue">ALL</span>
</td>
</tr>
<tr>
<td id="depotAddress" align="left" colspan="8"></td>
</tr>
<tr>
<td colspan="8">
</td>
</tr>
</table>
<table cellspacing="0" cellpadding="0" border="0">