Tag: ms excel

Did you know … Excel can use maps to visualize data?

I remember visiting my uncle at a NASA design lab sometime in the mid-80’s – it was a huge cavernous room that he explained used to house the computer. A computer his graphing calculator could draw circles around. It was a powerful visual reminder how quickly computing technology advances – components are smaller, more powerful, and simpler to use.

More than two decades ago, I wrote a visualization application that presented a graphical representation of the geographic distribution of records. Which is a long way of saying it showed where something happened to a lot of people. The application was part of a cooperative effort between the FBI and local law enforcement – a data mining project meant to identify serial offenders across jurisdictional boundaries I wanted to be able to visualize where different types of crime were occurring and identify anomalies, so I built a program to do so. It took months to develop and took hours to crunch values and draw a map. The first time I used Excel to visualize frequency distribution on a map, I thought of that NASA computer room. What used to take a high-end Unix server with a RISC processor and tonnes (for the time) of memory – not to mention an entire summer of code development – is clickity-click and done on my little laptop. And the results are nicer:

How do you create this type of visualization? First you need data with something that is mappable – the example here is going to show the office locations listed in PeopleSoft. Click within the data set.

On the ribbon bar, select “Insert” then select “3D Map” in the “Tours” section.

If you have not used it before, you will be asked to enable data analysis service.

A new window will be displayed – select the column you want to map. Here, I am using zip codes, which is mapped to the “Postal Code” field in my spreadsheet. If your fields do not map automatically, you will need to click the drop-down next to a location data type and select the appropriate column.

There are different types of visualization – here, I have switched to a “heat map” where the color of the blob represents how many records fall into this zip code. It is a quick way of identifying clusters – hot spots.

You can control the look of the map as well – here, I have switched to a flat map and added location labels.

If you would like to include a copy of your map in another program – say, this Word document – select “Capture Screen” from the ribbon bar. You can also create a video to show an animated view of your map (zooming in on specific locations, rotating the globe to see people over in Mongolia)

After you’ve clicked “Screen Capture”, just paste and an image of your map will be inserted into your file – see!

Going A Little Farther:

Data isn’t perfect, and even when the data looks good it may not map properly. My sister used to live on a street in New Jersey that does not exist on a map. The post office affirmed it was the correct address, but UPS and FedEx claimed it didn’t exist. It was funny to me, but I wasn’t the one trekking two kids down to the neighbor on the main road who nicely accepted packages for her. She moved before they ever got the address situation sorted, but I’ve got first-hand experience with addresses that don’t map in some systems but are perfectly fine in others. Why do I mention this? The map visualization provides a “Mapping confidence” statistic – it is the percentage that appears above the box where you select the location data to be mapped. 98% is pretty good – there are a handful of records that don’t appear on the map … but the data I am presenting is a decent representation of our employee office locations. A low percentage would indicate that your map does not accurately convey your data.

What if my map confidence level is low? Click on the map confidence value to see what didn’t map. There are some marked with a result that is questionable – spot-checking them, 03109 is Manchester NH and 10001 is New York, NY. The one with no resolution, according to the US Postal Service lookup isn’t a valid postal code. If your data is wrong, fix it 😊 In cases where the data is right but the application isn’t confident about the location, you can add additional data to make the address more specific (here, I might increase the confidence by having the zip+4, or including the street address in my data set).

You can filter data in your map – first we’ll need some field on which to filter. Here, I’ve added the employee’s department to my data set.

On the right-hand pane, expand “Filters”. Click “Add filter”.

Select the column on which to filter data. A unique list of values will be presented – you can scroll through it or start typing the value to search. Once you find what you want to display, click the check-box before the value.

Now we are visualizing where people in my department work.

If your data is hard to see – records are distributed out fairly evenly across the map – you can increase the area of influence to make smaller clusters easier to identify. Scroll to the bottom of the right-hand pane and drag the “Radius of influence” slider to the right. If you have very clustered data, you can drag the slider to the left to turn a large red blob into a more nuanced visualization.

When you have finished visualizing your data, click “File” on the ribbon bar and select “Close”.

 

Did you know … you can name ranges in Excel to make formulae clearer?

Formulae in Excel aren’t always easy to decode – even a relatively simple formula, like the volume of a right rectangular pyramid below, can be a little cryptic with the A2 type cell identifiers.

You can name ranges and use range names to make a formula easier to understand. Highlight a data set – in this case, I am highlighting the “length” values – column A. On the “Formulas” ribbon bar, click on “Define Name” (you don’t need to hit the inverted caret on the right of the button – just click the ‘define name’ text).

Supply a name for the range – in this case, I am calling it “Base_Length” (range names need to start with a letter or underscore and cannot contain spaces). Click OK to save the range name. Repeat this operation with all of the other data groups – in my case, I named Column B “Base_Width” and Column C “Height”.

Use the name instead of the cell identifier – as you type your formula, the range names matching your typed text will appear.

It is now a lot clearer what this formula means – base length times base width time height all divided by three. Which is the formula to calculate the volume of a right rectangular pyramid.

The calculated answer is the same either way – but this makes it easier to figure out what exactly you were computing when you open the spreadsheet again in six months 😊 (Or share the spreadsheet with others).

Did you know … there are ways to split data in Excel?

Applications can generate data in formats that aren’t quite useful – glomming multiple fields together to make something unusable. And asking people to type information can yield inconsistent results – is my name Lisa Rushworth, Lisa J Rushworth, or just Lisa? Excel has several functions that allow you to produce consistent, usable data (without copy/pasting or deleting things!)

Flash Fill

Flash Fill will try to figure it out for you. Add an empty column (or more) and manually type one or two values. On the “Data” ribbon bar, select “Flash Fill” and Excel will use the data you’ve entered into the row to figure out what should go in the rest of the row.

The guesses aren’t 100% accurate – especially if your information is not consistent – but it’s a lot easier to delete the handful of things that are obviously not zip codes …

Than to work out a formula that extracts the same information

Text to columns

Text to columns uses the fixed-length file and delimited file import wizard on a column of data – essentially treating that column as a file to be imported. In this example, a DateTime value is provided in a way that Excel only sees it as a string. And, frankly, I am not interested on the exact hundredth of a second the event occurred. What I really want to do is group these creation dates by day, so all I need is the date component.

If you want to retain all the data, you’ll need to insert empty columns to the right – otherwise the data being split out can overwrite existing data. In my case, I only want to keep one of the new columns.

Highlight the column that holds your data. On the “Data” ribbon, select “Text to columns”

Select if the column should be split based on a fixed width definition or a delimiter and click ‘Next’

Indicate the proper delimiter – in this case, I need to use ‘Other’ and enter the letter T. A preview of the split data will appear below – make sure it looks reasonable. Click “Next”.

For each new row, you can specify a data type. Or leave the type set to “General” and Excel will try to figure it out.

If you do not need to retain the data, select “Do not import this column (skip)”. Click “Finish” to split your column.

Voilà – I’ve got a usable date value.

Notice, though, I have lost my original data. If you want to retain the original data, create a copy of the column. In this example, I want to know how many e-mail addresses use each domain, but I want to have the e-mail addresses in a recognizable and usable format too.

Text to columns will still replace the values from the selected column. But the copy will contain the original text.

You can even use Text to columns to sort out odd data that doesn’t actually get split into multiple columns. In this example, negative values have the minus sign after the number … which isn’t actually a negative number and isn’t usable in calculations.

Pick a delimiter that doesn’t appear in your data, and you’ll only have one column. When selecting the data format, click “Advanced”

Make sure the “Trailing minus for negative numbers” checkbox is checked and click OK.

And we’ve got negative numbers

Right, Left, Mid, and Search Functions:

You can also use the Search function in conjunction with Right, Left, and Mid to extract components of column data. In this example, we have first and last names. Since there are a few middle initials in there, we cannot just split on the space character.

These formulae aren’t perfect – Mary Ann will have ‘Mary’ as a first name – but

Working out where to start the text extraction and the number of characters to extract can get complex. I’ll usually include the Substitute function to simplify things a little – the zip code, in this case, is whatever is left over after we find the city and state.

Producing columns with the city, state, and zip code from the ‘Location’ column.

Did you know … you can use Microsoft Excel to count the number of records within a range?

I’ve been generating reports to track our Microsoft Teams adoption – how many people are using Teams, how many messages are being sent in Teams, how many Teams are there. Some of these metrics have easily visualized count-per-unit-time summaries available. Some, like the number of Teams, do not.

Team Created On
Directory Services 1/19/2017
App Proxy 1/19/2017
LDAP 1/19/2017
ADFS 1/19/2017
Nagios 1/19/2017
File Cluster 1/19/2017
Exchange Online 1/19/2017
Active Directory 1/19/2017
Commvault 1/19/2017

But it’s easy to turn a list of groups and creation dates into visualizable data. Paste the data into Excel. To find the number of items where “Created On” falls in a range, we need to be able to define that range. 01 January 2017 is easy enough, but how do you get the end of January? Excel has a function, EOMONTH, that returns the last day of a month.

Date is any date object. Offset is an integer number of months prior (negative numbers) or after (positive numbers) Date for which you want the last day of the month. I can list the dates to start and end quarters with =EOMonth(Date,2). With 01 January 2017 in cell D2, the last day of January is =EOMonth(D2,0)

 I don’t want to type01 Feb, Mar, April … flash fill and the fill handle need a few values before they can figure out the rest of a sequence. But I can use the last day of the month to get the first day of the next month – just add one! With 31 January 2017 in cell E2, I want =E2 + 1 in cell D3. (Yes, there are other ways to do this – probably dozens.)

Now that we’ve got a formula for the start and end of the month, just fill down to produce the ranges we need to see how many Teams were created each month. Then we just need a formula to do the counting for us. I use the COUNTIFS function.

=COUNTIFS($B$2:$B$1000,”>=”&D2,$B$2:$B$1000,”<=”&E2)

Counts the number of items in the range $B$2:$B$1000 (the cell range is static as the formula is copied elsewhere, hence the

Fill down – you’ll see the range remains static, and the comparison is to the D and E columns on the current row.

Voila – easily visualized data. And a graph 😊

Did you know … you can use mini-charts to visualize Excel data?

Using charts and images, data visualization, clearly and efficiently communicates data. But when you’re trying to visualize statistics for several items, your chart can be anything but clear and hardly efficient to read. In this example, I’ve created a line chart depicting the monthly score for eight different people. While you can pick out obvious high or low performance, there’s not a whole lot of information being communicated here.

Did you know Excel can create mini-charts, known as “sparklines” to visualize individual statistics and compare statistics across items? Select the data that you want to compare. From the Insert ribbon bar, look for the “Sparklines” section. I am going to use a “line” style sparkline.

The data range will be selected. Enter the range where you want the mini-charts to display – this can be the row under your data or the column next to your data, or it can be some completely different location.

By default, the y-axis range for each mini-chart depends on the values of the data contained in the chart. This makes comparing the charts a little difficult – the scale is different. In the example below, scores in the 30’s don’t look different than scores in the 80’s.

Click on one of the mini-charts, and a “Design” tab will appear on the ribbon bar. Select it. Under “Axis”, change the minimum and maximum values to “Same for All Sparklines”.

Now you can see how individual performance varied as well as compare individuals.

Blank values will show up as broken lines in the mini-charts. If you do not want to display a gap, return to the “Design” ribbon bar and select “Edit data”. Select “Hidden & Empty Cells”

Select what you want instead of gaps – you can treat null values as zero or have a line drawn between the values on either side of the missing value.