Author: Lisa

July 6, 2023

Tableau PostgreSQL Query: Stats About Data Source Types

I wanted to report on the different types of data sources used in our Tableau instance — as well as show how many of each type are in use.

-- Query to found how many of each data source type
select datasources.db_class, count(datasources.db_class) as count
from datasources
left outer join users on users.id = datasources.owner_id
left outer join system_users on users.system_user_id = system_users.id
left outer join sites on datasources.site_id = sites.id
left outer join projects on datasources.project_id = projects.id
left outer join workbooks on datasources.parent_workbook_id = workbooks.id
group by datasources.db_class 
order by count desc ;

Answer: About half of them are Oracle!

July 5, 2023

NEO4J: Using APOC To Load HTML Table Data

I’ve been playing around with loading neo4j data from random tables on the web using apoc.load.html from the extended APOC library. The first trick to it is knowing how to use jquery to find elements of a webpage — the table named “listtable” then the path down to the data elements (tbody tr td) and column numbers.

Once you have extracted the data, you can then manipulate it, map it into fields, create relationships, etc.

UNWIND is used as a “for each” loop that allows us to iterate through the result set.

MERGE creates or updates records (which, in this case, means I have a poor data model … someone could well have run in multiple elections and I am not really accommodating those cases well. Since I don’t actually want a database of presidential elections but was really just testing some new-to-me functionality … we’re going to ignore these logic problems)

SET adds (or updates) properties of the node.

CALL apoc.load.html("https://www.iweblists.com/us/government/PresidentialElectionResults.html",
{electionyear: "#listtable tbody tr td:eq(0)"
	, winner: "#listtable tbody tr td:eq(1)"
	, loser: "#listtable tbody tr td:eq(2)"
	, electoral_win: "#listtable tbody tr td:eq(3)"
	, electoral_lose: "#listtable tbody tr td:eq(4)"
	, popular_win: "#listtable tbody tr td:eq(5)"
	, popular_delta: "#listtable tbody tr td:eq(6)" }) yield value 

WITH value, size(value.electionyear) as rangeup

UNWIND range(0,rangeup) as i WITH value.electionyear[i].text as ElectionYear
	, value.winner[i].text as Winner
	, value.loser[i].text as Loser
	, value.electoral_win[i].text as EC_Winner
	, value.electoral_lose[i].text as EC_Loser
	, value.popular_win[i].text as Pop_Vote_Winner
	, value.popular_delta[i].text as Pop_Vote_Delta

MERGE (ew:Candidate {name: coalesce(Winner,"Unknown")}) 
MERGE (el:Record {name: coalesce(Loser,"Unknown")}) 
SET ew.EC_Votes = coalesce(EC_Winner,"Unknown") 
SET el.EC_Votes = coalesce(EC_Loser,"Unknown")
SET ew.Year = ElectionYear
SET el.Year = ElectionYear

WITH *, replace(Pop_Vote_Delta,",","") as Pop_Vote_Delta_Int, replace(Pop_Vote_Winner,",","") as Pop_Winner_Int

SET ew.Pop_Votes = Pop_Winner_Int
SET el.Pop_Votes = apoc.number.exact.sub(Pop_Winner_Int, Pop_Vote_Delta_Int)

MERGE (ew)-[:DEFEATED]->(el);

July 1, 2023

Corn Spacing

A note so we can (hopefully) avoid over-seeding corn in the future — we want to have a foot between the plants in each row and 30″ between rows. Scott’s seen between 20″ and 30″ row spacing — so we can go denser than this if we want to increase the number of plants.

June 30, 2023

Proto Hazelnuts

Hazelnut forming — June 30

June 28, 2023

NEO4J: WITH and Scope

I have encountered another manual reading failure error — if you actually read the Cypher documentation for WITH, it clearly states that entering a WITH block creates a new scope into which previous variables are not imported. Unless you specifically include them. You can individually include variables in this new scope (WITH oldvariable1, oldvariable2, newSomething as newvariable1) or just use * to include all previous variables.

Doing neither will produce an error that a variable that you are absolutely positive exists does not, in fact, exist.

June 24, 2023

Proto-Hazelnuts (A few days later)

The proto-hazelnuts are starting to look more like hazelnuts!

Anya noticed that there is a little disc forming if you look from the bottom.

June 23, 2023

Unable to use JStat with Cassandra

We have been having some problems with a Cassandra cluster, so I wanted to look at the java heap space. Unfortunately, jstat cannot find the pid. And, yes, it is the right PID!

Looking in /tmp/hsperfdata_cassandra/, there’s no file! Reading through the whole line where Cassandra is running, I noticed +PerfDisableSharedMem … that’d do it!

It looks like they intentionally set +PerfDisableSharedMem in the Cassandra startup script. I assume their rational is still reasonable … so wouldn’t remove the parameter for day-to-day operation. But, when there’s a problem … restarting Cassandra without this parameter allows us to check how garbage collection is going.

June 23, 2023

Java Heap Stats with JStat

While there are plenty of third-party utilities for looking at the java heap space, I just use jstat (in OpenJDK, this means installing java-<Version>-openjdk-devel

JStat will display the following columns:

--------------------------------------------------------------------------------
S0C: Survivor space 0 size in K
S1C: Survivor space 1 size in K

S0U: Survivor space 0 usage in K
S1U: Survivor space 1 usage in K

--------------------------------------------------------------------------------

EC: Eden space size in K
EU: Eden space usage in K

--------------------------------------------------------------------------------

OC: Old space size in K
OU: Old space usage in K

--------------------------------------------------------------------------------

MC: Meta space size in K
MU: Meta space usage in K

--------------------------------------------------------------------------------

CCSC: CodeCache size in K
CCSU: CodeCache usage in K

--------------------------------------------------------------------------------

YGC: Young generation garbage collection count
YGCT: Young generation garbage collection total time in seconds

FGC: Full garbage collection count
FGCT: Full garbage collection total time in seconds

CGC: Concurrent garbage collection count
CGCT: Concurrent garbage collection time in seconds
GCT: Total garbage collection time in seconds

--------------------------------------------------------------------------------

https://stackoverflow.com/questions/13660871/jvm-garbage-collection-in-young-generation/13661014#13661014 does a good job of explaining the nomenclature & how stuff gets moved around in the heap space

Sample output — this command is for java PID 19356 and will list 100 lines 2 seconds apart (2000 ms)

server01:bin # jstat -gc 19356 2000 100
S0C S1C S0U S1U EC EU OC OU MC MU CCSC CCSU YGC YGCT FGC FGCT GCT
68096.0 68096.0 0.0 64207.5 545344.0 319007.2 30775744.0 19221750.2 137452.0 124322.4 18860.0 15380.6 324697 14589.985 228 45.830 14635.815
68096.0 68096.0 0.0 64207.5 545344.0 386674.5 30775744.0 19221750.2 137452.0 124322.4 18860.0 15380.6 324697 14589.985 228 45.830 14635.815
68096.0 68096.0 0.0 64207.5 545344.0 457055.4 30775744.0 19221750.2 137452.0 124322.4 18860.0 15380.6 324697 14589.985 228 45.830 14635.815
68096.0 68096.0 0.0 64207.5 545344.0 485538.8 30775744.0 19221750.2 137452.0 124322.4 18860.0 15380.6 324697 14589.985 228 45.830 14635.815
68096.0 68096.0 0.0 64207.5 545344.0 505893.4 30775744.0 19221750.2 137452.0 124322.4 18860.0 15380.6 324697 14589.985 228 45.830 14635.815

And this is a time where a third-party tool would be helpful but I never really ‘get’ what is and what is not OK to install on servers, so try not to install things — because the *useful* bit of information for any of this is really the usage / size percent utilization value.

That last grouping of stuff — I look at those v/s how long the pid has been running. If you’ve gotten a billion GC’s and the PID has only been running for eight seconds, that is a crazy amount of I/O. If I’ve only had 3 GCs and the pid has been running for seven years, it hasn’t been doing anything. In between? I don’t really find the numbers useful unless I’ve got a baseline from normal operation.

June 22, 2023

Proto-Hazelnuts

We have observable little proto-hazelnuts forming!

June 21, 2023

NEO4J: Deleting All Nodes and Relationships

Admittedly, this is not likely to be something you do a lot in production … but while playing around in a sandbox, I have found it useful to have a quick command to “empty” my database

MATCH (n)
DETACH DELETE n