{"id":11509,"date":"2024-10-28T10:31:32","date_gmt":"2024-10-28T09:31:32","guid":{"rendered":"https:\/\/www.areasciencepark.it\/?p=11509"},"modified":"2024-11-25T15:36:33","modified_gmt":"2024-11-25T14:36:33","slug":"the-new-dpcfam-uhgp50-dataset-a-valuable-resource-for-research-into-the-human-gastrointestinal-proteome","status":"publish","type":"post","link":"https:\/\/www.areasciencepark.it\/en\/the-new-dpcfam-uhgp50-dataset-a-valuable-resource-for-research-into-the-human-gastrointestinal-proteome\/","title":{"rendered":"DPCfam-UHGP50: a dataset for research on the gastrointestinal proteome"},"content":{"rendered":"<p>The <strong>Data Engineering Laboratory (LADE)<\/strong> at Area Science Park has recently published an <a href=\"https:\/\/www.nature.com\/articles\/s41597-024-03131-4\"><strong>article<\/strong><\/a> in <em>Nature &#8211; Scientific Data <\/em>on protein sequence annotation.<\/p>\n<p>Thanks to technological advances in genomic sequencing, the number of known protein sequences has grown exponentially. Many of these sequences come from metagenomic projects that analyze environmental and clinical samples. Among the most relevant datasets in this field stands the <strong>Unified Human Gastrointestinal Proteome (UHGP)<\/strong> catalog, with a variety of applications in medicine and biology. However, the limited annotation of these sequences reduces their effectiveness.<\/p>\n<p>To address this issue, the <strong><a href=\"https:\/\/dpcfam.areasciencepark.it\/uhgp\/\">DPCfam-UHGP<\/a> <\/strong>dataset was developed, classifying UHGP sequences into protein families that typically group proteins sharing the same biological function. The dataset contains <strong>10,778 families<\/strong>, generated through <strong>DPCfam clustering<\/strong>, an unsupervised method that organizes sequences into single- or multi-domain architectures.<\/p>\n<p>This project, part of <strong>Federico Barone<\/strong>&#8216;s doctoral research supervised by <strong>Alessio Ansuini<\/strong> and <strong>Alberto Cazzaniga<\/strong>, exemplifies the fruitful interaction between data management and data science. In this context, the construction of a curated database of gastrointestinal proteins enabled more refined cataloging through advanced machine learning algorithms, allowing continuous database updates in fruitful feedback loop aimed at promoting new discoveries.<\/p>\n<p>The DPCfam-UHGP50 dataset, accessible through a <strong><a href=\"https:\/\/dpcfam.areasciencepark.it\/uhgp\">web server<\/a><\/strong>, was developed following the <strong>best FAIR (Findable, Accessible, Interoperable, Reusable) practices<\/strong>, with the aim of <strong>fostering new discoveries in the field of human gastrointestinal tract metagenomics<\/strong>.<\/p>\n<p>Previously, LADE had already produced the <a href=\"https:\/\/dpcfam.areasciencepark.it\/\"><strong>DPCfam-UR50 database<\/strong><\/a>, accompanied by a <a href=\"https:\/\/journals.plos.org\/ploscompbiol\/article?id=10.1371\/journal.pcbi.1010610\">publication<\/a> in <em>PLOS &#8211; Computational Biology.<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>The Data Engineering Laboratory (LADE) at Area Science Park has recently published an article in Nature &#8211; Scientific Data on protein sequence annotation. Thanks to technological advances in genomic sequencing, the number of known protein sequences has grown exponentially. Many of these sequences come from metagenomic projects that analyze environmental and clinical samples. Among the [&hellip;]<\/p>\n","protected":false},"author":8,"featured_media":11422,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[360],"tags":[],"class_list":["post-11509","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-technological-infrastructures"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.9 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>DPCfam-UHGP50: a dataset for research on the gastrointestinal proteome - Area Science Park<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.areasciencepark.it\/en\/the-new-dpcfam-uhgp50-dataset-a-valuable-resource-for-research-into-the-human-gastrointestinal-proteome\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"DPCfam-UHGP50: a dataset for research on the gastrointestinal proteome - Area Science Park\" \/>\n<meta property=\"og:description\" content=\"The Data Engineering Laboratory (LADE) at Area Science Park has recently published an article in Nature &#8211; Scientific Data on protein sequence annotation. Thanks to technological advances in genomic sequencing, the number of known protein sequences has grown exponentially. Many of these sequences come from metagenomic projects that analyze environmental and clinical samples. Among the [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.areasciencepark.it\/en\/the-new-dpcfam-uhgp50-dataset-a-valuable-resource-for-research-into-the-human-gastrointestinal-proteome\/\" \/>\n<meta property=\"og:site_name\" content=\"Area Science Park\" \/>\n<meta property=\"article:published_time\" content=\"2024-10-28T09:31:32+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-11-25T14:36:33+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.areasciencepark.it\/wp-content\/uploads\/2024\/10\/thumbnail_shutterstock_1715975092.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1920\" \/>\n\t<meta property=\"og:image:height\" content=\"1080\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"silvia.reinotti@asp\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"silvia.reinotti@asp\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"2 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.areasciencepark.it\/en\/the-new-dpcfam-uhgp50-dataset-a-valuable-resource-for-research-into-the-human-gastrointestinal-proteome\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.areasciencepark.it\/en\/the-new-dpcfam-uhgp50-dataset-a-valuable-resource-for-research-into-the-human-gastrointestinal-proteome\/\"},\"author\":{\"name\":\"silvia.reinotti@asp\",\"@id\":\"https:\/\/www.areasciencepark.it\/en\/#\/schema\/person\/f25112f3fd3cba030c0cae6d516f3110\"},\"headline\":\"DPCfam-UHGP50: a dataset for research on the gastrointestinal proteome\",\"datePublished\":\"2024-10-28T09:31:32+00:00\",\"dateModified\":\"2024-11-25T14:36:33+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.areasciencepark.it\/en\/the-new-dpcfam-uhgp50-dataset-a-valuable-resource-for-research-into-the-human-gastrointestinal-proteome\/\"},\"wordCount\":251,\"image\":{\"@id\":\"https:\/\/www.areasciencepark.it\/en\/the-new-dpcfam-uhgp50-dataset-a-valuable-resource-for-research-into-the-human-gastrointestinal-proteome\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.areasciencepark.it\/wp-content\/uploads\/2024\/10\/thumbnail_shutterstock_1715975092.jpg\",\"articleSection\":[\"Technological Infrastructures\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.areasciencepark.it\/en\/the-new-dpcfam-uhgp50-dataset-a-valuable-resource-for-research-into-the-human-gastrointestinal-proteome\/\",\"url\":\"https:\/\/www.areasciencepark.it\/en\/the-new-dpcfam-uhgp50-dataset-a-valuable-resource-for-research-into-the-human-gastrointestinal-proteome\/\",\"name\":\"DPCfam-UHGP50: a dataset for research on the gastrointestinal proteome - Area Science Park\",\"isPartOf\":{\"@id\":\"https:\/\/www.areasciencepark.it\/en\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.areasciencepark.it\/en\/the-new-dpcfam-uhgp50-dataset-a-valuable-resource-for-research-into-the-human-gastrointestinal-proteome\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.areasciencepark.it\/en\/the-new-dpcfam-uhgp50-dataset-a-valuable-resource-for-research-into-the-human-gastrointestinal-proteome\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.areasciencepark.it\/wp-content\/uploads\/2024\/10\/thumbnail_shutterstock_1715975092.jpg\",\"datePublished\":\"2024-10-28T09:31:32+00:00\",\"dateModified\":\"2024-11-25T14:36:33+00:00\",\"author\":{\"@id\":\"https:\/\/www.areasciencepark.it\/en\/#\/schema\/person\/f25112f3fd3cba030c0cae6d516f3110\"},\"breadcrumb\":{\"@id\":\"https:\/\/www.areasciencepark.it\/en\/the-new-dpcfam-uhgp50-dataset-a-valuable-resource-for-research-into-the-human-gastrointestinal-proteome\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.areasciencepark.it\/en\/the-new-dpcfam-uhgp50-dataset-a-valuable-resource-for-research-into-the-human-gastrointestinal-proteome\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.areasciencepark.it\/en\/the-new-dpcfam-uhgp50-dataset-a-valuable-resource-for-research-into-the-human-gastrointestinal-proteome\/#primaryimage\",\"url\":\"https:\/\/www.areasciencepark.it\/wp-content\/uploads\/2024\/10\/thumbnail_shutterstock_1715975092.jpg\",\"contentUrl\":\"https:\/\/www.areasciencepark.it\/wp-content\/uploads\/2024\/10\/thumbnail_shutterstock_1715975092.jpg\",\"width\":1920,\"height\":1080},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.areasciencepark.it\/en\/the-new-dpcfam-uhgp50-dataset-a-valuable-resource-for-research-into-the-human-gastrointestinal-proteome\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.areasciencepark.it\/en\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"DPCfam-UHGP50: a dataset for research on the gastrointestinal proteome\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.areasciencepark.it\/en\/#website\",\"url\":\"https:\/\/www.areasciencepark.it\/en\/\",\"name\":\"Area Science Park\",\"description\":\"\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.areasciencepark.it\/en\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.areasciencepark.it\/en\/#\/schema\/person\/f25112f3fd3cba030c0cae6d516f3110\",\"name\":\"silvia.reinotti@asp\",\"url\":\"https:\/\/www.areasciencepark.it\/en\/author\/silvia-reinottiasp\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"DPCfam-UHGP50: a dataset for research on the gastrointestinal proteome - Area Science Park","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.areasciencepark.it\/en\/the-new-dpcfam-uhgp50-dataset-a-valuable-resource-for-research-into-the-human-gastrointestinal-proteome\/","og_locale":"en_US","og_type":"article","og_title":"DPCfam-UHGP50: a dataset for research on the gastrointestinal proteome - Area Science Park","og_description":"The Data Engineering Laboratory (LADE) at Area Science Park has recently published an article in Nature &#8211; Scientific Data on protein sequence annotation. Thanks to technological advances in genomic sequencing, the number of known protein sequences has grown exponentially. Many of these sequences come from metagenomic projects that analyze environmental and clinical samples. Among the [&hellip;]","og_url":"https:\/\/www.areasciencepark.it\/en\/the-new-dpcfam-uhgp50-dataset-a-valuable-resource-for-research-into-the-human-gastrointestinal-proteome\/","og_site_name":"Area Science Park","article_published_time":"2024-10-28T09:31:32+00:00","article_modified_time":"2024-11-25T14:36:33+00:00","og_image":[{"width":1920,"height":1080,"url":"https:\/\/www.areasciencepark.it\/wp-content\/uploads\/2024\/10\/thumbnail_shutterstock_1715975092.jpg","type":"image\/jpeg"}],"author":"silvia.reinotti@asp","twitter_card":"summary_large_image","twitter_misc":{"Written by":"silvia.reinotti@asp","Est. reading time":"2 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.areasciencepark.it\/en\/the-new-dpcfam-uhgp50-dataset-a-valuable-resource-for-research-into-the-human-gastrointestinal-proteome\/#article","isPartOf":{"@id":"https:\/\/www.areasciencepark.it\/en\/the-new-dpcfam-uhgp50-dataset-a-valuable-resource-for-research-into-the-human-gastrointestinal-proteome\/"},"author":{"name":"silvia.reinotti@asp","@id":"https:\/\/www.areasciencepark.it\/en\/#\/schema\/person\/f25112f3fd3cba030c0cae6d516f3110"},"headline":"DPCfam-UHGP50: a dataset for research on the gastrointestinal proteome","datePublished":"2024-10-28T09:31:32+00:00","dateModified":"2024-11-25T14:36:33+00:00","mainEntityOfPage":{"@id":"https:\/\/www.areasciencepark.it\/en\/the-new-dpcfam-uhgp50-dataset-a-valuable-resource-for-research-into-the-human-gastrointestinal-proteome\/"},"wordCount":251,"image":{"@id":"https:\/\/www.areasciencepark.it\/en\/the-new-dpcfam-uhgp50-dataset-a-valuable-resource-for-research-into-the-human-gastrointestinal-proteome\/#primaryimage"},"thumbnailUrl":"https:\/\/www.areasciencepark.it\/wp-content\/uploads\/2024\/10\/thumbnail_shutterstock_1715975092.jpg","articleSection":["Technological Infrastructures"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.areasciencepark.it\/en\/the-new-dpcfam-uhgp50-dataset-a-valuable-resource-for-research-into-the-human-gastrointestinal-proteome\/","url":"https:\/\/www.areasciencepark.it\/en\/the-new-dpcfam-uhgp50-dataset-a-valuable-resource-for-research-into-the-human-gastrointestinal-proteome\/","name":"DPCfam-UHGP50: a dataset for research on the gastrointestinal proteome - Area Science Park","isPartOf":{"@id":"https:\/\/www.areasciencepark.it\/en\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.areasciencepark.it\/en\/the-new-dpcfam-uhgp50-dataset-a-valuable-resource-for-research-into-the-human-gastrointestinal-proteome\/#primaryimage"},"image":{"@id":"https:\/\/www.areasciencepark.it\/en\/the-new-dpcfam-uhgp50-dataset-a-valuable-resource-for-research-into-the-human-gastrointestinal-proteome\/#primaryimage"},"thumbnailUrl":"https:\/\/www.areasciencepark.it\/wp-content\/uploads\/2024\/10\/thumbnail_shutterstock_1715975092.jpg","datePublished":"2024-10-28T09:31:32+00:00","dateModified":"2024-11-25T14:36:33+00:00","author":{"@id":"https:\/\/www.areasciencepark.it\/en\/#\/schema\/person\/f25112f3fd3cba030c0cae6d516f3110"},"breadcrumb":{"@id":"https:\/\/www.areasciencepark.it\/en\/the-new-dpcfam-uhgp50-dataset-a-valuable-resource-for-research-into-the-human-gastrointestinal-proteome\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.areasciencepark.it\/en\/the-new-dpcfam-uhgp50-dataset-a-valuable-resource-for-research-into-the-human-gastrointestinal-proteome\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.areasciencepark.it\/en\/the-new-dpcfam-uhgp50-dataset-a-valuable-resource-for-research-into-the-human-gastrointestinal-proteome\/#primaryimage","url":"https:\/\/www.areasciencepark.it\/wp-content\/uploads\/2024\/10\/thumbnail_shutterstock_1715975092.jpg","contentUrl":"https:\/\/www.areasciencepark.it\/wp-content\/uploads\/2024\/10\/thumbnail_shutterstock_1715975092.jpg","width":1920,"height":1080},{"@type":"BreadcrumbList","@id":"https:\/\/www.areasciencepark.it\/en\/the-new-dpcfam-uhgp50-dataset-a-valuable-resource-for-research-into-the-human-gastrointestinal-proteome\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.areasciencepark.it\/en\/"},{"@type":"ListItem","position":2,"name":"DPCfam-UHGP50: a dataset for research on the gastrointestinal proteome"}]},{"@type":"WebSite","@id":"https:\/\/www.areasciencepark.it\/en\/#website","url":"https:\/\/www.areasciencepark.it\/en\/","name":"Area Science Park","description":"","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.areasciencepark.it\/en\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.areasciencepark.it\/en\/#\/schema\/person\/f25112f3fd3cba030c0cae6d516f3110","name":"silvia.reinotti@asp","url":"https:\/\/www.areasciencepark.it\/en\/author\/silvia-reinottiasp\/"}]}},"_links":{"self":[{"href":"https:\/\/www.areasciencepark.it\/en\/wp-json\/wp\/v2\/posts\/11509","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.areasciencepark.it\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.areasciencepark.it\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.areasciencepark.it\/en\/wp-json\/wp\/v2\/users\/8"}],"replies":[{"embeddable":true,"href":"https:\/\/www.areasciencepark.it\/en\/wp-json\/wp\/v2\/comments?post=11509"}],"version-history":[{"count":0,"href":"https:\/\/www.areasciencepark.it\/en\/wp-json\/wp\/v2\/posts\/11509\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.areasciencepark.it\/en\/wp-json\/wp\/v2\/media\/11422"}],"wp:attachment":[{"href":"https:\/\/www.areasciencepark.it\/en\/wp-json\/wp\/v2\/media?parent=11509"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.areasciencepark.it\/en\/wp-json\/wp\/v2\/categories?post=11509"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.areasciencepark.it\/en\/wp-json\/wp\/v2\/tags?post=11509"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}