{"id":10293,"date":"2023-07-05T19:09:00","date_gmt":"2023-07-06T00:09:00","guid":{"rendered":"https:\/\/www.rushworth.us\/lisa\/?p=10293"},"modified":"2023-08-28T10:47:55","modified_gmt":"2023-08-28T15:47:55","slug":"neo4j-using-apoc-to-load-html-table-data","status":"publish","type":"post","link":"https:\/\/www.rushworth.us\/lisa\/?p=10293","title":{"rendered":"NEO4J: Using APOC To Load HTML Table Data"},"content":{"rendered":"\n<p>I&#8217;ve been playing around with loading neo4j data from random tables on the web using <a href=\"https:\/\/neo4j.com\/labs\/apoc\/4.1\/import\/html\/\">apoc.load.html<\/a> from the extended APOC library. The first trick to it is knowing how to use jquery to find elements of a webpage &#8212; the table named &#8220;listtable&#8221; then the path down to the data elements (tbody tr td) and column numbers.<\/p>\n\n\n\n<p>Once you have extracted the data, you can then manipulate it, map it into fields, create relationships, etc.<\/p>\n\n\n\n<p><a href=\"https:\/\/neo4j.com\/docs\/cypher-manual\/current\/clauses\/unwind\/\">UNWIND<\/a> is used as a &#8220;for each&#8221; loop that allows us to iterate through the result set. <\/p>\n\n\n\n<p><a href=\"https:\/\/neo4j.com\/docs\/cypher-manual\/current\/clauses\/merge\/\">MERGE<\/a> creates or updates records (which, in this case, means I have a poor data model &#8230; someone could well have run in multiple elections and I am not really accommodating those cases well. Since I don&#8217;t actually <em>want<\/em> a database of presidential elections but was really just testing some new-to-me functionality &#8230; we&#8217;re going to ignore these logic problems)<\/p>\n\n\n\n<p><a href=\"https:\/\/neo4j.com\/docs\/cypher-manual\/current\/clauses\/set\/\">SET<\/a> adds (or updates) properties of the node.<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: plain; title: ; notranslate\" title=\"\">\nCALL apoc.load.html(&quot;https:\/\/www.iweblists.com\/us\/government\/PresidentialElectionResults.html&quot;,\n{electionyear: &quot;#listtable tbody tr td:eq(0)&quot;\n\t, winner: &quot;#listtable tbody tr td:eq(1)&quot;\n\t, loser: &quot;#listtable tbody tr td:eq(2)&quot;\n\t, electoral_win: &quot;#listtable tbody tr td:eq(3)&quot;\n\t, electoral_lose: &quot;#listtable tbody tr td:eq(4)&quot;\n\t, popular_win: &quot;#listtable tbody tr td:eq(5)&quot;\n\t, popular_delta: &quot;#listtable tbody tr td:eq(6)&quot; }) yield value \n\nWITH value, size(value.electionyear) as rangeup\n\nUNWIND range(0,rangeup) as i WITH value.electionyear&#x5B;i].text as ElectionYear\n\t, value.winner&#x5B;i].text as Winner\n\t, value.loser&#x5B;i].text as Loser\n\t, value.electoral_win&#x5B;i].text as EC_Winner\n\t, value.electoral_lose&#x5B;i].text as EC_Loser\n\t, value.popular_win&#x5B;i].text as Pop_Vote_Winner\n\t, value.popular_delta&#x5B;i].text as Pop_Vote_Delta\n\nMERGE (ew:Candidate {name: coalesce(Winner,&quot;Unknown&quot;)}) \nMERGE (el:Record {name: coalesce(Loser,&quot;Unknown&quot;)}) \nSET ew.EC_Votes = coalesce(EC_Winner,&quot;Unknown&quot;) \nSET el.EC_Votes = coalesce(EC_Loser,&quot;Unknown&quot;)\nSET ew.Year = ElectionYear\nSET el.Year = ElectionYear\n\nWITH *, replace(Pop_Vote_Delta,&quot;,&quot;,&quot;&quot;) as Pop_Vote_Delta_Int, replace(Pop_Vote_Winner,&quot;,&quot;,&quot;&quot;) as Pop_Winner_Int\n\nSET ew.Pop_Votes = Pop_Winner_Int\nSET el.Pop_Votes = apoc.number.exact.sub(Pop_Winner_Int, Pop_Vote_Delta_Int)\n\nMERGE (ew)-&#x5B;:DEFEATED]-&gt;(el);\n\n<\/pre><\/div>","protected":false},"excerpt":{"rendered":"<p>I&#8217;ve been playing around with loading neo4j data from random tables on the web using apoc.load.html from the extended APOC library. The first trick to it is knowing how to use jquery to find elements of a webpage &#8212; the table named &#8220;listtable&#8221; then the path down to the data elements (tbody tr td) and &hellip;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1896],"tags":[1904,1902,1897],"class_list":["post-10293","post","type-post","status-publish","format-standard","hentry","category-neo4j","tag-apoc","tag-cypher","tag-neo4j"],"_links":{"self":[{"href":"https:\/\/www.rushworth.us\/lisa\/index.php?rest_route=\/wp\/v2\/posts\/10293","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.rushworth.us\/lisa\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.rushworth.us\/lisa\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.rushworth.us\/lisa\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.rushworth.us\/lisa\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=10293"}],"version-history":[{"count":1,"href":"https:\/\/www.rushworth.us\/lisa\/index.php?rest_route=\/wp\/v2\/posts\/10293\/revisions"}],"predecessor-version":[{"id":10300,"href":"https:\/\/www.rushworth.us\/lisa\/index.php?rest_route=\/wp\/v2\/posts\/10293\/revisions\/10300"}],"wp:attachment":[{"href":"https:\/\/www.rushworth.us\/lisa\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=10293"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.rushworth.us\/lisa\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=10293"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.rushworth.us\/lisa\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=10293"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}