Py学习  »  Python

跟着PNAS学数据分析:minigraph构建的泛基因组解析出来的SV划分不同类型的python脚本

小明的数据分析笔记本 • 3 月前 • 52 次点击  

论文 

Novel functional sequences uncovered through a bovine multiassembly graph

https://www.pnas.org/doi/10.1073/pnas.2101056118

代码链接 

https://github.com/AnimalGenomicsETH/bovine-graphs/blob/main/scripts/get_bialsv.py

论文里关于这部分方法的描述

structural variations were classified as biallelic if two paths were observed in a bubble and multiallelic if a bubble contained more than two paths. The structural variations were further classified into:

Alternate deletion: when the nonreference path was shorter than the reference path (but the reference path has nonzero length) (这里是不是写错了 是不是应该是 nonreference path has nonzero length)

complete deletion: when the nonreference path has a length of zero

Alternate insertion: when the nonreference path was longer than the reference path (这里是不是应该标注the reference path has nonzero length)

complete insertion: when the reference path has a length of zero, the nonreference path was longer than teh reference path

还有一篇NC的牦牛泛基因组论文

Evolutionary origin of genomic structural variations in domestic yaks

https://doi.org/10.1038/s41467-023-41220-x

这里把 ref 和 nonref 的长度都不是0的情况划分为了 divergent

今天推文开头提到的论文里提供的脚本需要一个 graph_length 文件还有一个 biallelic的bubble文件

接下来介绍如何获取这两个文件

首先用minigraph构建图形泛基因组

minigraph --inv no -cxggs -L 5 -t 8 seq1.fa seq2.fa seq3.fa seq4.fa seq5.fa seq6.fa  -o LPA.gfa

用gfatools将变异解析出来

gfatools bubble LPA.gfa > LPA_bubble.tsv

获取二等位的变异

awk '$5==2 {print $1,$2,$4,$5,$12}' LPA_bubble.tsv > LPA_biallelic_bubble.tsv

获取graph 长度文件



    
awk '$1~/S/ {split($5,chr,":");split($6,pos,":");split($7,arr,":");print $2,length($3),chr[3],pos[3],arr[3]}' LPA.gfa > graph_len.tsv

运行论文中提供的脚本

python get_bialsv01.py graph_len.tsv LPA_biallelic_bubble.tsv > biallelic_sv.type

这里106420 是 AltDel 用BandageNG 看一下图形泛基因组的这个位置

欢迎大家关注我的公众号

小明的数据分析笔记本


小明的数据分析笔记本 公众号 主要分享:1、R语言和python做数据分析和数据可视化的简单小例子;2、园艺植物相关转录组学、基因组学、群体遗传学文献阅读笔记;3、生物信息学入门学习资料及自己的学习笔记!


Python社区是高质量的Python/Django开发社区
本文地址:http://www.python88.com/topic/180693
 
52 次点击