Search results by user ID 223190https://www.infragistics.com/community/search?q=&category=blog&users=223190&sort=date%20descSearch results by user ID 223190en-USTelligent Community 10 Non-ProductionFoundations of Random Number Generation in JavaScripthttps://www.infragistics.com/community/blogs/b/tim_brock/posts/foundations-of-random-number-generation-in-javascriptThu, 14 Jul 2016 09:34:00 GMT7a8b7c76-b7ad-48e0-9694-5b04ca132ed0:505006Tim Brock/community/blogs/b/tim_brock/posts/foundations-of-random-number-generation-in-javascript0Being able to generate (apparently) random numbers is an essential requirement for many areas of math, statistics, the sciences, technology and gaming. For instance, they can be used to assign participants to a group in a randomized controlled trial, to determine the output of a computer game or to find approximate solutions to otherwise intractable problems. I frequently use random numbers to check my solutions to perfectly tractable statistical problems too. Cryptographic random number generators, which we won't cover here, can be used in security. Types of Random Number Generator For the purpose of this article at least, we can think of there being three categories of random-number collection: "True" random numbers; Pseudorandom numbers; Quasirandom sequences. True (or hardware) random number generators (TRNGs) use real-world physical processes believed to be random in nature to create streams of numbers. Decay events from a radioactive source, for example, are random and uncorrelated with each other, atmospheric noise can also be used. TRNGs often aren't practical or convenient (or necessary) for many purposes. A far more common alternative is to create a stream of numbers that appear to be randomly distributed over some interval using a computer algorithm. These algorithms are not truly unpredictable and so are referred to as pseudorandom number generators (PRNGs). Finally, quasirandom sequences are a finite collection of numbers that are meant to be representative of a sample space in some way. For example, the mean of the sequence may be the same (or very similar to) the known mean of the population. While quasirandom sequences are interesting and can be useful, they're not the focus of this article and won't be discussed further. The Standard Uniform Distribution Generally, the output of a large number of values from a pseudorandom number generator is meant to approximate the standard uniform distribution (SUD) that covers the range 0 to 1. There is, however, some variation in whether either/both of the end points are included. In the conceptual world of the mathematics of continuous distributions there is basically no difference between the uniform distribution between [0,1] (includes 0 and 1) and the uniform distribution between (0,1) (excludes 0 and 1). In the real world of floating point numbers, with only a finite number of possible values between 0 and 1, the difference is real and could potentially be problematic. One might, for example, want to use the generated number inside the natural logarithm function Math.log . Generating a random number for another distribution is usually "just" a matter of using one or more numbers from an SUD PRNG to produce an appropriate value from the distribution of interest. Depending on the desired final distribution, this might involve one line of code or something quite complex. For the rest of this article I will stick to discussing the generation of numbers from a SUD PRNG. Period A useful PRNG must have a large period, which is to say that it must be able to output a large number of numbers without repeating itself. For example, the Wichmann Hill PRNG of 1982 (more on this later) has a period of nearly 7 trillion numbers, while the exceedingly popular Mersenne Twister PRNG has a period of 2 19937 − 1 numbers. The former is considered quite short by modern standards. What's wrong with Math.random ? You may use JavaScript's built-in Math.random regularly and have no problems with it. It does, however, have one big limitation: its output is not reproducible. Run code that utilizes Math.random again and again and you'll get a different set of results each time. In many cases that really isn't a problem. In fact in many (most, probably) cases it'll be exactly what you want. Unpredictability (to a human sat staring at the output at least) is exactly what is required. But sometimes we do want to see the same set of results. Moving away from JavaScript for a moment, consider running a simulation experiment for a piece of work you wish to publish. Perhaps you're using C++ or Java or R or... You want your results to be reproducible. These languages (and many others besides) offer a way of "seeding" the initial state of their PRNGs. You set the seed in some way or other and you get the same sequence of "random" numbers out. Math.random requires a seed too, there's just no way of setting it yourself. The specification for Math.random is also fairly open-ended, allowing browser vendors to use "an implementation-dependent algorithm" so long as the output is approximately uniform over the range [0,1). From a personal perspective, I'm a big fan of browser-based interactive data visualization for communicating both data and concepts. This could include simulation; there are limits to what is practical in a browser but web workers can help. Simulation frequently requires random numbers. If the random numbers are not reproducible then the conditions for a simulation can't be re-run. There are plenty of other use cases too, such as a repeatable animation, and JavaScript isn't just the programming language of the browser any more. Problematic PRNGs Up until now I've skipped over how uniform-distribution PRNGs work. It's all been something of a black box: you call a function one or more times, possibly setting a seed, and then some pseudorandom numbers come out. The problem is that creating a good PRNG is difficult. There are thousands of papers on the topic and multiple methods. And multiple instances where people who know much more about this topic than I appear to have got things wrong. For instance... RANDU RANDU is a "linear congruential generator" (LCG) developed by IBM in the 1950's. LCG's use a recurrence relation of the following form to generate new pseudorandom numbers: In the case of RANDU, c is 0, a is 65,539 and m is 2 31 . Because c is 0, RANDU is a member of a subset of LCG's called "multiplicative congruential generators" (MCG). To get a number in the range 0 to 1 (as is desired for a replacement for Math.random ), one just needs to divide the result of the RANDU recurrence relation by m . A JavaScript implementation (which you definitely shouldn't use!) could look something like this: var randu = function(seed){ "use strict"; if(!isFinite(seed)){ throw new Error("Seed not a finite number"); } var x = Math.round(seed); var a = 65539; var m = Math.pow(2, 31); if(x<1 || x≥m){ throw new Error("Seed out of bounds"); } return function(){ x = (a*x)%m; return x/m; }; }; The parity of the value generated by the recurrence relation in RANDU never changes. That is to say, an odd seed gives rise only to odd values of x while an even seed gives rise only to even values of x . This isn't exactly a desirable property, but there are bigger problems. The period of an LCG is at most m , but for RANDU it is much less than that and depends on the parity of the seed. For odd seeds it's over 536 million but for even seeds it can be as little as 16,384 . There's another reason not to bother with an even seed: One common, simple method for assessing the randomness of a generator is to plot pairs of successive values as a scatterplot. A good PRNG should fill a 1-by-1 square fairly evenly. With 10,000 random numbers, 5,000 points, and a seed of 1 everything looks reasonable. (You can think of "x" as referring to the values in even-index positions in an (0-indexed) array of 10,000 random numbers. That is indices 0, 2, 4, 6... 9,996, 9,998. Points on the scatter plot are made by matching up with the next odd-index (1, 3, 5, 7... 9,997, 9999) "y" value.) With some even seeds we something quite different. Below is a scatterplot for the seed 32,768. Clearly we don't just have a deficit of points in the case of even seeds. In some cases, we have unambiguous relationships between neighboring values. It would be simple enough to adapt the randu function above to reject even seeds, giving an appropriate error message. Unfortunately, odds seeds show structure too. To see this we just need to extend the scatterplot idea to three dimensions by looking at triplets of successive random numbers. 3D scatter plots are frequently pretty useless. RANDU data provides something of an exception to this rule. (Here the indices for "x" values are 0, 3, 6, 9... 9,993, 9,996, the indices for "y" values are 1, 4, 7, 10... 9,994, 9997 and the indices for "z" values are 2, 5, 8, 11... 9,995, 9,998.) Rather than fill the box roughly evenly, triplets of numbers all lie in one of 15 planes, regardless of the seed! (Actually, for the even seed 32,768 it's worse than this.) We can compare this with results from Math.random (I used Chrome for this); the difference is stark. One general problem with 3D scatterplots is that the visibility of structure in the plot can depend on the viewing angle. From certain angles the structure in RANDU plot is hidden. This could be the case for Math.random . To help with this issue I created an interactive version that lets you choose a PRNG (and, where appropriate, one or more seeds) and visualizes the results using WebGL . This demo can be found here and the box of random numbers can be rotated (using the mouse) and viewed from different angles. I've yet to find any obvious signs of structure when using Math.random across a number of browsers (Chrome, Firefox, Maxthon, IE, Edge and Opera on the desktop and Safari on iOS). The appearance of lattice structures in 3 or more dimensions exists for all MCG's, but it is particularly bad for RANDU. It has been known about since the 1960's . Despite this, RANDU was still used in the 1970's and some simulation results from that era should, perhaps, be viewed with skepticism . Excel You may be wondering whether problems with PRNG's were confined to the 1960's and '70's? The short answer to this question is "no". After criticism of it's old random number generator, Microsoft changed to the Wichmann–Hill PRNG for Excel 2003. The WH generator (first published in 1982 ) combines three MCG's to overcome some of the shortcomings of single MCGs (such as a relatively short period and the lattice planes and hyperplanes seen when looking at groups of neighboring values). A quick JavaScript implementation of WH could look like the following: var wh = function(seeds){ "use strict"; if(seeds.length<3){ throw new Error("Not enough seeds"); } var xA = Math.round(seeds[0]); var aA = 171; var mA = 30269; var xB = Math.round(seeds[1]); var aB = 172; var mB = 30307; var xC = Math.round(seeds[2]); var aC = 170; var mC = 30323; if(!isFinite(xA) || !isFinite(xB) || !isFinite(xC)){ throw new Error("Seed not a finite number"); } if(Math.min(xA,xB,xC)<1 || xA≥mA || xB≥mB || xC≥mC){ throw new Error("Seed out of bounds"); } return function(){ xA = (aA*xA)%mA; xB = (aB*xB)%mB; xC = (aC*xC)%mC; return (xA/mA + xB/mB + xC/mC) % 1; }; }; Again we can look for structure when plotting sets of neighboring values in two or three dimensions: While this clearly isn't sufficient to say whether or not we have a good random number generator, the plots above at least look reasonable. (You can also check the three-dimensional box in the WebGL demo mentioned above.) However, the original implementation of the WH algorithm for the RAND function in Excel occasionally spat out negative numbers ! The WH algorithm fails a number of more modern and stringent tests of PRNGs and it seems Microsoft has moved on to using the popular (but rather more complex) Mersenne Twister algorithm. V8 Earlier I used output from Chrome's implementation of Math.random to illustrate what the distribution of triplets of random numbers should look like when plotted as a three-dimensional scatterplot. However, the algorithm that was used by V8, Chrome's JavaScript engine, was recently shown to be flawed. Specifically, it reproduced the same "random" character strings with far higher frequency than it should have, as this long but informative article describes. V8 developers promptly changed the algorithm used and the issue seems to have been fixed in Chrome since version 49. Speed Another consideration before you try to "grow your own" JavaScript PRNG is speed. One should expect Math.random to be highly optimized. For example, some very rough tests using several browsers showed Math.random to be ~3 times quicker at producing 100,000 random numbers than the simple WH implementation above. Having said this, we're still talking of the order of just tens of milliseconds (at least in Chrome and Firefox on my laptop) so this may not be a bottleneck, even if you do require a large number of random numbers. Of course, more complex PRNGs with better statistical properties may be slower. Conclusions I've barely touched the surface here. I've only looked at a couple of simple PRNGs and showed that they can still be problematic. There are far more complicated algorithms for PRNGs out there, some that have passed a large number of quite stringent statistical tests. And some of these already have open source JavaScript implementations. If you don't need reproducibility, Math.random may be just fine for all your random number needs. Either way, if reliability of your random number generator is critical to how your site functions then you should perform relevant checks. In the case of Math.random this means checking in all target browsers since the JavaScript specification does not specify a particular algorithm that vendors must use. Try our JavaScript HTML5 controls for your web apps and take immediate advantage of their stunning data visualization capabilities. Download Free Trial today.SVG versus Canvashttps://www.infragistics.com/community/blogs/b/tim_brock/posts/svg-versus-canvasThu, 23 Jun 2016 08:59:00 GMT7a8b7c76-b7ad-48e0-9694-5b04ca132ed0:504079Tim Brock/community/blogs/b/tim_brock/posts/svg-versus-canvas0Suppose you want to draw something on your web page using browser-native technologies. It might be some kind of animated scene, it might be a technical diagram, it could be some kind of custom infographic. What should you use? As the title of the article suggests, there are a couple of obvious answers: a canvas element or a scalable vector graphic (SVG). In some cases, either SVG or canvas might work reasonably well and you can choose whichever you prefer working with. In other cases, practical considerations will make it almost inevitable that you favour one over the other. Vectors versus Pixels As the name suggests, SVG images are a vector image format. As the name also suggests, they can generally be scaled indefinitely without becoming blurry or pixelated. By contrast, an HTML canvas element is composed of pixels. If you blow-up or zoom in on an image drawn on a canvas element you'll be able to see pixelation. But things are not quite as simple as just saying "use SVG if you want a vector graphic and canvas if you want a raster graphic". Firstly, an SVG image can include one or more raster graphics, either through a data URI or by linking to a file. An embedded image won't be magically vectorized, so if you zoom in your "vector graphic" will look pixelated. And while, ultimately, the appearance of a canvas element is just a collection of variously colored pixels, if you're drawing something from scratch programmatically you can adapt what's drawn to the screen size. While you can change the colors of individual canvas pixels should you want to, much of the time you'll be drawing paths. Stateful versus Stateless SVG is stateful. That is, once you've added element — a line, a circle, a rectangle etc, you can go back and change them (move them around, change their size or color...). Once you've drawn something on a canvas, by contrast, the only record you get for free is the color (and transparency) of the resultant pixels. If you want to change how the canvas looks, you may have to redraw the whole thing from scratch! In many cases I think this fact is far more important when comparing SVG to canvas than whether or not what you've actually drawn is a vector graphic. The next two subsections will explain why. Ease of Event Monitoring With SVG it's really easy to associate graphical elements with events like click, mouseover and touch. Why? Because the elements you add to your SVG become part of the page. As a result, you can add event handlers in the same way you would to other DOM elements like div s, img s, span s and section s. If you move an element around the event handler doesn't go away (unless you tell it to). With canvas the only element you have on the page is the canvas element itself. The rectangle you drew isn't an element. There's no record you even drew a rectangle once it's been drawn unless you kept it yourself. All you have is some (presumably) colored pixels. And the only events you can fire are on the canvas element as a whole. That's not to say that all is lost for canvas. You can keep your own records of where you placed graphic elements and use the coordinates associated with an event on the canvas to work out what a cursor or finger is over. It's just a lot more work if you don't have a good library to help you along. Coping with A Large Number of Graphical Elements The fact that an SVG image naturally creates a record in the DOM of all the elements drawn, their positions and other attributes certainly seems to make it favourable for any kind of complex, interactive graphic. In many cases this is true. But SVGs also have a big flaw: they create a record in the DOM of all the elements drawn, their positions and other attributes. If your SVG contains a lot of elements, then so does your DOM. And this can make things slow. Very slow. If you want to draw a couple of thousand elements in your SVG you might have problems. If you want to draw them and then move them around 60 times a second to create a smooth animation you're in big trouble. Canvas is almost certainly a better idea. A Third Option I'll finish by pointing out that SVG and canvas aren't the only options for drawing and animation with web-native technologies. It's amazing what can be done in a modern browser with only a chunk of CSS a little JavaScript and some regular HTML div elements . Try our jQuery HTML5 controls for your web apps and take immediate advantage of their stunning data visualization capabilities. Download Free Trial today.The Importance of Prose in Communicating Datahttps://www.infragistics.com/community/blogs/b/tim_brock/posts/the-importance-of-prose-in-communicating-dataThu, 02 Jun 2016 16:05:00 GMT7a8b7c76-b7ad-48e0-9694-5b04ca132ed0:503279Tim Brock/community/blogs/b/tim_brock/posts/the-importance-of-prose-in-communicating-data0If you're a data communicator, having a good understanding of chart and table design is important. Thankfully, the art and science of creating effective charts and tables is the subject of a great number of books. (One of my favorites is Stephen Few's Show Me the Numbers .) This doesn't, however, mean that how we use ordinary prose - spoken or written - should be ignored. About a year ago I wrote an article here titled "7 Do's and Don't of Dataviz" . The first of those seven things was "Don’t use a chart when a sentence will do". Ryan Sleeper takes the opposing view in this article: : "Even when the numbers are in units, you can likely tell that the first number is smaller than the second number, but it is challenging to consider the scale of the difference." I'm not convinced by this argument for written text if the numbers are formatted consistently, with commas (or points) as thousand separators. The varying number of digits and separators make it fairly obvious when numbers differ by a couple of orders of magnitude. (Aside: this is also why you should right-align numbers in tables (or align on the decimal point where possible and applicable).) Better still, with written prose we can be explicit about large differences: "Value 1 (4,500,000) is 150 times larger than value 2 (30,000)." In a single sentence we have expressed a pair of values precisely and provided a simple comparison that is really easy to understand: the first entity is 150 times the size of the second. Even if you don't like the use of parentheses or you don't want to type out all the 0's in 4,500,000 there are plenty of other ways to create a sentence that clearly conveys the difference between two very different numbers. In most cases you'll probably be looking at real entities or concepts rather than completely abstract numbers so we'll imagine a scenario: "With a salary of $30,000, Bob would have to work for 150 years to earn the $4.5 million that Aaron earns in just one twelve-month period." While you can see there's a massive difference between the wages of Aaron and Bob in the bar chart below, it's pretty tough to get the factor of 150 from comparing the lengths or end-points of the bars alone. You might think "Wow, Aaron earns a lot more than Bob" but you're unlikely to get the bit where it'd take Bob a century and a half to earn as much. Of course, neither the descriptive sentence nor the bar chart tell us anything about why Aaron earns so much or why Bob's salary is more modest. Neither really tells us why we should care either. Two numbers are different. So what? Well there might be a good story in the difference. They could be twins or friends who made slightly different life choices with huge consequences. Or we could "just" be comparing a company CEO and someone further down the company's pyramid. As I keep saying, context is key when it comes to conveying data. Even if you really insist that your two data points require a chart, chances are that to convey the proper context and make a real impact you'll need to provide some descriptive information that doesn't have a natural chart form. So it makes sense to take some time to think about prose whether you listen to my earlier advice or not. I'm not saying we can't enhance charts by adding more visual cues that help with context, if we have relevant information available. The chart below is the same as above except with a reference line for the median salary for the made-up company I've decided they work for. Now we can see that Bob's salary is much closer to the median than Aaron's (as you might expect). Let's try to put the salient details from the chart above in a sentence or two: "Bob's salary of $30,000 is 80% of the company median of $37,500; he would have to work for 150 years to earn the $4.5 million that Aaron earns in just one twelve-month period." From this we learn precisely what Bob's salary is, how Bob's salary compares to the median, precisely what the median is, how Bob's salary compares to Aaron's salary and what Aaron's salary is. The only information we're not given directly is how Aaron's salary compares to the median. But since we know Bob's salary is similar to the median but two orders of magnitude less than the Aaron's, it should be fairly evident that Aaron's salary is two orders of magnitude greater than the median. Using prose to communicate data effectively isn't just about picking the right number formatting and sentence structure. You also need to use the right units for your audience: I could tell you that the Andromeda galaxy is about 780 kiloparsecs away, but unless you've taken an astronomy course you're unlikely to feel better informed. If I told you it was two and a half million light years from Earth you might at least be inclined to think "oh, that's a really long way". More people will understand that light travels a long way in a year than will know that a parsec is the distance at which the mean radius of the Earth's orbit subtends an angle of one arcsecond and that a kiloparsec is a thousand times that distance. Don't be afraid to use unusual units of measurement , perhaps alongside conventional ones, to provide context when you expect your audience will lack domain expertise. Take the time to think about how best to express your key values, which set of units to express them in and how to make comparisons easier. Turn to charts when you have something important to convey that can't be said in a few words and when you want to highlight patterns, trends and differences rather than provide precise values (where a table should be used instead). Whichever option you go for, remember to provide context. Try one of our most wanted features - the new XAML 3D Surface Chart and deliver fast, visually appealing and customizable 3D surface visualizations! Download Infragistics WPF 16.1 toolset from here.New Solutions to Old JavaScript Problems: 2) Default Valueshttps://www.infragistics.com/community/blogs/b/tim_brock/posts/new-solutions-to-old-javascript-problems-2-default-valuesMon, 09 May 2016 09:45:00 GMT7a8b7c76-b7ad-48e0-9694-5b04ca132ed0:502124Tim Brock/community/blogs/b/tim_brock/posts/new-solutions-to-old-javascript-problems-2-default-values0Introduction This is the second in a series on how new JavaScript features, introduced in the ECMAScript 2015 standard (aka ES6), allow for simpler solutions to some old JavaScript problems. In part 1 I covered block scope and the new let and const keywords. Here I will look at default arguments for functions. Default Arguments the Old Way Here's a very simple JavaScript function for logging a greeting message to the console: let greet = function(name){ console.log("Hi, I'm " + name); } And here's two examples of calling that function: greet("Elvis Presley"); //Hi, I'm Elvis Presley greet(); //Hi, I'm undefined Since the second call to greet doesn't pass in any arguments, the call to console.log just outputs name as "undefined". We can add a more appropriate fallback by using the logical OR operator, || , inside the function and supplying a suitable moniker: let greet = function(name){ name = name || "John Doe"; console.log("Hi, I'm " + name); } greet("Elvis Presley"); //Hi, I'm Elvis Presley greet(); //Hi, I'm John Doe Sometimes the use of || doesn't work as intended. Here's the skeleton of a function for plotting the mathematical function y = x 2 + 1 using IgniteUI . (I've deliberately left out important aspects like chart labels to keep the code to a minimum). let makeChart = function($selector, min, max){ "use strict"; min = min || -5; max = max || 5; let stepSize = (max-min)/10000; let data = []; for(let i=min; i≤max; i=i+stepSize){ data.push({x:i, y:Math.pow(i,2)+1}); } $selector.igDataChart({ dataSource: data, width: "300px", height: "200px", axes: [ { name: "xAxis", type: "numericX", minimumValue: min, maximumValue: max, }, { name: "yAxis", type: "numericY", minimumValue: 0 } ], series: [ { name: "series1", type: "scatterLine", xAxis: "xAxis", yAxis: "yAxis", xMemberPath: "x", yMemberPath: "y", }, ], }); } Assuming three floating (left) div elements — with id's of "chart1", "chart2" and "chart3" — we might draw three copies of the function like this (here $ is the jQuery object ): makeChart($("#chart1")); makeChart($("#chart2"), -3, 3); makeChart($("#chart3"), 0); The final results may comes as a surprise: rather than the x axis starting at 0, the third chart is identical to the first. This happens because 0, the value passed in for the minimum value of the x axis ( min ) in the third call to makeChart , is falsy . That means the line min = min || -5; changes min to -5. We can get around this easily, but with a few more keystrokes, by explicitly comparing min (and max ) to undefined , rather than checking for truthyness: min = min!==undefined ? min : -5; max = max!==undefined ? max : 5; With this in place, the third chart does now have an x axis that starts at 0. More Intuitive Default Arguments ES6 makes setting defaults easier and the new syntax may seem familiar if you've ever programmed in C++ or R. Simply add an equals sign and the default value for each function argument that needs one. The makeChart example above then becomes: let makeChart = function($selector, min=-5, max=5){ "use strict"; let stepSize = (max-min)/10000; let data = []; /*rest of the function*/ } In an ES6 compliant browser, this is equivalent to the example above, where we compared min and max to undefined, but with less typing. I think it also makes the code more transparent and a casual user no longer has to inspect the function body in order to find the default values. The greet function from earlier can also be simplified: let greet = function(name="John Doe"){ console.log("Hi, I'm " + name); } greet("Elvis Presley"); //Hi, I'm Elvis Presley greet(); //Hi, I'm John Doe More Complex Examples ES6 default values don't have to be simple strings or numbers. You can use any JavaScript expression and later arguments can refer to earlier ones. One common way to introduce object-oriented programming to new coders is through the construction of various animal classes and objects. JavaScript isn't really a class-based language but objects are still important. Object factories - functions that return objects - are an easy way to make a large number of objects quickly. They're also a place where default arguments can be very useful. Let's create some talking dog (!) objects. The following is all perfectly valid ES6 code that utilizes default values (the defaults for name are, apparently, the most popular male and female dogs names as listed here at the time of writing): let createDog = function(gender="male", name=(gender==="male")?("Bailey"):("Bella"), hobby="barking"){ return { name: name, gender: gender, hobby: hobby, sayName: function(){console.log("Hi, I'm " + this.name); return this;}, sayGender: function(){console.log("I am a " + this.gender + " dog"); return this;}, sayHobby: function(){console.log("I enjoy " + this.hobby); return this;}, }; }; Here's an example of use with the default values: createDog().sayName().sayGender().sayHobby(); This prints out... Hi, I'm Bailey I am a male dog I enjoy barking We could specify all arguments, for example... createDog("female","Barbara","fetching").sayName().sayGender().sayHobby(); prints out... Hi, I'm Barbara I am a female dog I enjoy fetching We can also accept default values for one or more parameters by passing in undefined . For example... createDog("female",undefined,"fetching").sayName().sayGender().sayHobby(); leads to the output Hi, I'm Bella I am a female dog I enjoy fetching Note that this does mean you can't pass in undefined as a placeholder when you don't know the true value. You probably want to use null instead. The expression for the default name isn't particularly robust. For example, the following is probably not what was intended: createDog("Male").sayName().sayGender().sayHobby(); Hi, I'm Bella I am a Male dog I enjoy barking In this case gender was set to "Male", which doesn't match "male", thus the default name expression results in a "Male" dog called Bella. We could just change the first argument to isMale (or isFemale of course) and assume a Boolean input. And that might be the sensible option here. But it doesn't lend itself so well to highlighting the fact that we can use even more complex expressions, so we won't. Instead we'll use the following modified function: createDog = function(gender="male", name=(gender.charAt(0).toUpperCase()==="M")?("Bailey"):("Bella"), hobby="fetcing"){ return { /*object unchanged*/ }; } Now any gender string beginning with "m" or "M" is taken to be male for the purposes of default-name assignment. That could mean "male", "Male", "man", "m", "M", or "Marmalade". Any other string — "female", "Female", "JavaScript" etc — will set name to "Bella" if you don't provide an alternative. This version will also throw an error (a good thing, generally) if you pass in something really silly for gender , like {} or /male/ . It's important to restate that ES6 default values are assigned from the left. It may be that you'd prefer name to be the first argument of the createDog function. The following won't work when called without a name parameter because the expression for the default tries to use gender before it is defined. let createDog = function(name=(gender.charAt(0).toUpperCase()==="M")?("Bailey"):("Bella"), gender="male", hobby="reading"){ return { /*object unchanged*/ }; } If you come to JavaScript from a language like Python or R then you may be used to using named parameters alongside default arguments in your function calls. This is still not possible in JavaScript. The closest you can do is to use an options object in place of multiple arguments. Unfortunately, combining an options object with ES6-style default parameters is not particularly easy (see here for details). Now with full support for Angular 2 Beta and Bootstrap 4, see what's new in Ignite UI 16.1 . Download free trial today.When it Comes to Dataviz, Color is Complicated: Part 2https://www.infragistics.com/community/blogs/b/tim_brock/posts/when-it-comes-to-dataviz-color-is-complicated-part-2Mon, 02 May 2016 07:00:00 GMT7a8b7c76-b7ad-48e0-9694-5b04ca132ed0:501683Tim Brock/community/blogs/b/tim_brock/posts/when-it-comes-to-dataviz-color-is-complicated-part-21This is Part 2 (in a series of 2) on why color is a complex and confusing topic. In Part 1 I looked at cases where colors might not be interpreted as expected. Here I'll cover the difficulties of picking a suitable palette. Be Subtle Even if you avoid color contrast illusions and palettes that are difficult for those with CVD to interpret, it's still easy to make something that looks bad. Strong, saturated, vibrant colors stand out... so long as they're used sparingly. If everything is strong, saturated and vibrant you'll get something unpleasant like the chart below. In general, use muted colors for anything that will take up a large area (like bars in bar charts). Use stronger colors for smaller items (such as points) and to highlight. Using a color that's simply different to the norm, rather than significantly more vivid, can also be effective for highlighting something significant: Be Consistent If you use color to distinguish two or more categories in one chart it makes sense to repeat the color scheme when the same categories appear in another chart in the same document. If you keep swapping and switching your audience might get confused and draw the wrong conclusions from your presentations of data. That's worse than not showing the data at all. Sometimes this advice may come in to conflict with the advice above regarding object size and color vibrancy: if one chart shows bars and another points then something has to give. The best option may be a compromise set of colors that are slightly more vivid than you'd like for the bars and slightly less vivid than ideal for the points. Another option is to use the same hues (eg "red", "blue", "yellow") but vary the lightness depending on the chart type. (It's also plausible that a dot plot might be as good as or better than a bar chart anyway.) As discussed here , you should also try to follow common conventions where applicable. This, of course, may be at odds with my advice about traffic light colors in Part 1. I did say color was complicated! Get Some Assistance Unless we want to draw particular attention to one category, the colors we select for our categories should be of similar vividness. That is, one should not stand out more than the others. This is a tricky task. You can't, for example, just compare the sums of the red, green and blue values a color picker tool will give you. Human perception of color simply doesn't work that way. Creating color scales that encode numerical values is just as, if not more, difficult. You may be coming to the conclusion that creating a good color palette for visualizations can be hard. The easy way out is not to bother. That doesn't mean you have to accept your software's defaults though. One of my favorite resources is ColorBrewer . It offers an interactive palette selector that was designed with maps in mind. There is, however, no inherent reason not to use it for other visualizations. You can pick from a range of "sequential", "diverging" and "qualitative" palettes. The qualitative palettes are best for encoding categorical information. Sequential and diverging palettes can be used for encoding values; the latter should be used when you wish to highlight how the high and low values differ from some middle value (perhaps the mean or median or simply a 0 point when both positive and negative values are possible). There's an option to export a chosen palette as a JavaScript array that I find particularly helpful. Printing is Problematic ColorBrewer lets you restrict palettes to only those that are CVD friendly, those that remain distinguishable when photocopied in black and white, and/or those that work well when color printed. This latter option illustrates another issue when it comes to color: The range of colors that a typical monitor can display (its "gamut") is less than a human can see but greater than can be printed on a basic CMYK printer. What you see on your laptop screen is generally not what you get on paper. But Wait! There's Much More My goal here wasn't to make you scared of using color, but to point out some of the dangers in order that you may be able to avoid them. I've skimmed over most of the underlying science, partly because it's not exactly trivial and partly because it's not really my area of expertise. Everything I have covered barely scratches the surface. I don't have space to tell you about the problems with rainbow color palettes or why brown is a bit weird or anything about opponent process theory or perceptual color models or to explain the difference between luminance, brightness, and lightness (these confuse me all the time). All these things and a lot more are covered in chapters 3 and 4 of Colin Ware's book Information Visualization (mentioned in Part 1). It does get quite technical a times but I highly recommend it for anyone who wants to know about the science of color and of information visualization. Try one of our most wanted features - the new XAML 3D Surface Chart and deliver fast, visually appealing and customizable 3D surface visualizations! Download Infragistics WPF 16.1 toolset from here .When it Comes to Dataviz, Color is Complicated: Part 1https://www.infragistics.com/community/blogs/b/tim_brock/posts/when-it-comes-to-dataviz-color-is-complicated-part-1Wed, 27 Apr 2016 16:57:00 GMT7a8b7c76-b7ad-48e0-9694-5b04ca132ed0:501606Tim Brock/community/blogs/b/tim_brock/posts/when-it-comes-to-dataviz-color-is-complicated-part-10In all the articles I've written here I've covered a fairly broad range of topics related to data visualization: the use of tick marks and labels , data density , the problems with dual-axis charts and much more. I've touched upon the use of color a few times but only in passing. That's because I think, while interesting, the topic can be quite confusing and that makes writing short articles difficult. In this two-part series I'll try to bring together previous advice on the use of color, cover why I think it's a complex topic, define some relevant jargon, and provide links to a few resources that I have found useful. In this part I cover why what we see might not be what we expect to see. In Part 2 I'll look at picking suitable color palettes. What Color is That? One morning in February 2015 I awoke and checked what was going on in the world via Twitter. Everyone was talking about a white and gold dress. Or rather they were talking about what looked like to me like a white and gold dress. Many felt the same as me but some strange people were claiming it was blue and black. It turns out those strange people were actually right. The viral phenomenon that was "the dress" showcased the peculiarities of our vision system. Colin Ware describes, on page 69 of Information Visualization (third edition), how "[n]eurons processing visual information in the early stages of the retina and primary visual cortex do not behave like light meters; they act as change meters". One benefit of the complex way our visual system works is that we can usually detect a gray surface as being gray, a white surface as being white and a black surface as being black whether we're in bright sunlight or a dimly lit room and independently of the color of the illuminant. This is called "color constancy" and it shouldn't be too difficult to imagine how this could have been an evolutionary advantage in the past. Information about the light source itself is usually much less important. To achieve color constancy the brain has to make some educated guesses about the illuminant. Sometimes it gets things wrong. This would appear to be at least part of the reason for the disagreement over the dress. If you're creating a visual representation of some data it's rare you'll ever have to worry too much about the perceived colors of a dress. But it does still highlight the fact that sometimes we misinterpret color stimuli. Take the simple image below: If you've never seen this illusion before you may be surprised to learn that the small squares are the same color. You can check this using the eyedropper or color-picker tool of your favorite image editing program. If you're on a Mac it's quicker to use OSX's DigitalColor Meter app. This color contrast illusion can be significant for data visualization: if you're using the same color encoding on two different backgrounds you need to check whether they really look the same. Remember the blocks of color in your key or legend too. If a chart background is, say, light gray then the background in the key should also be light gray and not white or black (we're not talking about natural illuminants here so don't expect your brain to fix it for you). Not Everyone Has Perfect Color Vision The color-sensitive cells of the retina are called cones and we (most of us) have three types - millions of each - making us "trichromats". The types are frequently referred to as red, green and blue, though it's more proper to use long (L), medium (M) and short (S), describing the wavelengths of peak sensitivity. Even this is very much a relative designation: L cones are most sensitive to light at around 580 nanometers, M cones to light at around 540 nm and S cones to light at around 450 nm (Ware, page 97). (There's no relation here with the designations for radio waves!) Color blindness, or color vision deficiency (CVD), in humans is the result of a lack of, or deficiency in, one type of cone cell. It can be acquired or inherited and the latter is fairly common in men (around one in 12 suffer). If it is L or M cones that are lacking the resulting condition is frequently described as red-green color blindness, while blue-yellow color blindness results from defective S cones. In reality, the effect is more nuanced than these common names would suggest and a number of tools have been developed to help trichromats without a CVD ensure their work is accessible to those who do suffer. My favorite is ColorOracle . It's a really simple app for Windows, Mac and (some) Linux OS's that sits in the notification area (system tray) or menubar. You click on its icon, select a form of color deficiency and it instantly (temporarily!) changes the colors on the screen to simulate the deficiency. Deuteranopia, the formal name for a problem with M cones, is the most common form of CVD. As I've previously mentioned , it's a good reason to avoid using only red and green color encoding in your visualizations. If you do want to use a "traffic light" color scheme then one option is to use a secondary encoding to reinforce the differences, for example a red circle and a green triangle (perhaps with an amber square). Try one of our most wanted features - the new XAML 3D Surface Chart and deliver fast, visually appealing and customizable 3D surface visualizations! Download Infragistics WPF 16.1 toolset from here .New Solutions to Old JavaScript Problems: 1) Variable Scopehttps://www.infragistics.com/community/blogs/b/tim_brock/posts/new-solutions-to-old-javascript-problems-1-variable-scopeTue, 22 Mar 2016 11:18:00 GMT7a8b7c76-b7ad-48e0-9694-5b04ca132ed0:499618Tim Brock/community/blogs/b/tim_brock/posts/new-solutions-to-old-javascript-problems-1-variable-scope1Introduction I love JavaScript but I'm also well aware that, as a programming language, it's far from perfect. Two excellent books, Douglas Crockford's JavaScript : The Good Parts and David Herman's Effective JavaScript , have helped me a lot with understanding and finding workarounds for some of the weirdest behavior. But Crockford's book is now over seven years old, a very long time in the world of web development. ECMAScript 6 (aka ES6, ECMAScript2015 and several other things), the latest JavaScript standard, offers some new features that allow for simpler solutions to these old problems. I intend to illustrate a few of these features, examples of the problems they solve, and their limitations in this and subsequent articles. Almost all the new solutions covered in this series also involve new JavaScript syntax, not just additional methods that can be pollyfilled . Because of this, if you want to use them today and have your JavaScript code work across a wide range of browsers then your only real option is to use a transpiler like Babel or Traceur to convert your ES6 to valid ES5. If you're already using a task runner like Gulp or Grunt this may not be too big a deal, but if you only want to write a short script to perform a simple task it may be easier to use old syntax and solutions. Browsers are evolving fast and you can check out which browsers support which new features here . If you're just interested in playing around with new features and experimenting to see how they work, all the code used in this series will work in Chrome Canary . In this first article I am going to look at variable scope and the new let and const keywords. The Problem with var Probably one of the most common sources of bugs (at least it's one I trip over on regularly) in JavaScript is due to the fact that variables declared using the var keyword have global scope or function scope and not block scope . To illustrate why this may be a problem, let's first look at a very basic C++ program: #include #include using std::cout; using std::endl; using std::string; int main(){ string myVariable = "global"; cout Button 1 Button 2 Button 3 Button 4 Button 5 You might think the following code would make it so that clicking any of the buttons would bring up an annoying alert dialog box telling you which button number you pressed: var buttons = document.querySelectorAll("#my-form button"); for(var i=0, n=buttons.length; i<n; i++){ buttons[i].addEventListener("click", function(evt){ alert("Hi! I'm button " + (i+1)); }, false); } In fact, clicking any of the five buttons will bring up an annoying alert dialog box telling you that the button claims to be the mythical button 6. The issue is that the scope of i is not limited to the ( for ) block and each callback thinks i has the same value, the value it had when the for loop was terminated. One solution to this problem is to use an immediately invoked function expression (IIFE) to create a closure , in which the current loop index value is stored, for each iteration of the loop: for(var i=0, n=buttons.length; i<n; i++){ (function(index){ buttons[index].addEventListener("click", function(evt){ alert("Hi! I'm button " + (index+1)); }, false); })(i); } let and const ES6 offers a much more elegant solution to the for -loop problem above. Simply swap var for the new let keyword. for(let i=0, n=buttons.length; i<n; i++){ buttons[i].addEventListener("click", function(evt){ alert("Hi! I'm button " + (i+1)); }, false); } Variables declared using the let keyword are block-scoped and behave much more like variables in languages like C, C++ and Java. Outside of the for loop i doesn't exist, while inside each iteration of the loop there is a fresh binding: the value of i inside each function instance reflects the value from the iteration of the loop in which it was declared, regardless of when it is actually called. Using let works with the original problem too. The code let myVariable = "global"; console.log("1) myVariable is " + myVariable); { let myVariable = "local"; console.log("2) myVariable is " + myVariable); } console.log("3) myVariable is " + myVariable); does indeed give the output 1) myVariable is global 2) myVariable is local 3) myVariable is global Alongside let , ES6 also introduces const . Like let , const has block scope but the declaration leads to the creation of a "read-only reference to a value" . You can't change the value from 7 to 8 or from "Hello" to "Goodbye" or from a Boolean to an array. Consequently, the following throws a TypeError : for(const i=0, n=buttons.length; i<n; i++){ buttons[i].addEventListener("click", function(evt){ alert("Hi! I'm button " + (i+1)); }, false); } It's important (and perhaps confusing) to note that declaring an object with the const keyword does not make it immutable . You can still change the data stored in an object or array declared with const , you just can't reassign the identifier to some other entity. If you want an object or array that is immutable you need to use Object.freeze (introduced in ES5 ).Stacked Area Charts and Mathematical Approximationshttps://www.infragistics.com/community/blogs/b/tim_brock/posts/stacked-area-charts-and-mathematical-approximationsThu, 25 Feb 2016 09:26:00 GMT7a8b7c76-b7ad-48e0-9694-5b04ca132ed0:498193Tim Brock/community/blogs/b/tim_brock/posts/stacked-area-charts-and-mathematical-approximations0I've previously noted that I think stacked area charts are frequently used when a conventional line chart would be a better option. Here is the (fictional) example I used previously and the conventional line chart alternative. In short, if you want people to be able to make reasonably accurate judgments of the magnitudes of the individual components, and how they change depending on some other variable (such as time), the conventional line chart design is almost always going to be the best option. The lack of a steady baseline for all but the bottom component makes this task difficult for the stacked area chart. Stacked area charts can be useful if you want to illustrate an ordered sum of components that change with another variable. While previously I suggested how the cost of milk production from farm to shop might change with time might be suitable, here I'd like to consider something very different: selected mathematical series. You're probably familiar with trigonometric functions like sine and cosine and you may also know about the exponential function and hyperbolic functions . It's fairly easy to draw graphs of these functions if you have a calculator of some sort. When tied up in complicated equations, these functions may become awkward to deal with. Consequently, alternative ways of approximating these functions can come in very handy. The functions mentioned above are all analytic functions . What this really means is quite complicated to attempt to explain so I won't try to do so here. Instead I'll just stick to the following: these functions can all be written as a sum of powers of their argument (typically denoted x ), that is, as a polynomial . Being explicit helps, so here is a way of rewriting the exponential function: In a similar manner, here is another way of expressing the cosine function: And here is the hyperbolic cosine function (typically written as cosh): In general, to get an exact value for one of these functions using summation we need to sum to infinity. This is not the case at the origin where all but the first term will equal 0. Close to the origin we will also get a good approximation as x is small. But how close and how good? We can plot the first few terms of, for example, the exponential function expression and see. The black line in the GIF below shows the exact exponential function, the blue wedges show the result of adding more and more terms from the right-hand side of the equation (from the zeroth power of x up to the 8th) for the exponential function above. The translucent red wedge indicates the area not covered by the polynomial approximation. Below about x =1 we can see that the first three terms of the polynomial are a pretty good approximation for the exponential function. To get a good approximation around x =3 we need to go up to the sixth or seventh power of x (i.e. seven or eight terms of the polynomial). As the GIF below shows, even going to the eighth power of x isn't sufficient around x =6. We can look at the hyperbolic cosine function in a similar way, though there are no terms with odd powers of x . As you might expect, when we look at large distances from the origin, we need more and more terms of the polynomial in order to closely match the exact function. At x =±6, adding up terms up to the 8th power of x is not sufficient to get a good approximation. I think these are cases where stacked area charts can be of real use. We're genuinely interested in the progressive sums of components, not the individual parts and that's where stacked charts excel. You probably noticed that I skipped over producing charts for the cosine function. That's because stacked charts fail. Why? Because successive terms have opposite signs. While including more and more terms in the polynomial approximation does get you closer and closer to the exact function, you can't show this as a simple stack because some terms add to the total while others subtract. This also a problem for the exponential function when x is negative: terms involving even powers of x will be positive while those involving odd powers of x will be negative. This is a purely visual issue that doesn't crop up when we plot lines instead of stacks. Hopefully I've shown that stacked area charts can be useful when it is the ordered sums of components that are of interest and if the conditions are right. For the conditions to be right then all components of the stack must share the same sign (or be 0) at each (visible) point along the horizontal axis. Bring high volumes of complex information to life with Infragistics WPF powerful data visualization capabilities! Download free trial now !Why We Should Report More Than Just the Meanhttps://www.infragistics.com/community/blogs/b/tim_brock/posts/why-we-should-report-more-than-just-the-meanThu, 18 Feb 2016 12:16:00 GMT7a8b7c76-b7ad-48e0-9694-5b04ca132ed0:497845Tim Brock/community/blogs/b/tim_brock/posts/why-we-should-report-more-than-just-the-mean0Numbers without context are of very limited use. So it's a good thing that articles in newspapers and reports in the wider world will often compare the figures they relay to the (mean) average. But invariably that simply isn't enough to get a gauge of what the data being reported really tells us. There's an old "joke" about a statistician who drowned in a lake of average depth a few inches (the precise average depth seems to vary depending on who is telling the joke), but over-simplifying by just reporting or comparing with an average really can be highly misleading. At the time of writing, the White Moose Café in Dublin in the Republic of Ireland has a rating of 3.8 stars (out of 5) on Facebook. From just this number, without looking at the distribution of scores, you might take that to mean something like "People generally think this is a good café which could perhaps make a few improvements to bump it above 4 stars". In fact the establishment has well over seven thousand reviews but only 42 reviewers gave it a 2-star, 3-star or 4-star rating! The overwhelming majority of ratings are either 1 or 5 stars. This rather extreme example of polarized opinions is the result of a disagreement between the proprietor and a vegan customer that led initially to a bombardment of negative reviews from many further vegans and a subsequent backlash from meat-eaters; It's safe to say most of the reviewers have never been to the café. (You can find out much more about this story here .) The average rating doesn't give us any hint of the underlying story. So hopefully you can see why it's a good idea to go beyond just reporting (mean) averages or comparing one result to the average. We have plenty of other descriptive statistics that can tell us something more about the distribution of a set of results: median, mode, standard deviation, variance, skew, kurtosis, range, interquartile range... But frequently the best option is to visualize the results. Facebook does actually do this with its review system, as the screenshot below shows: A classic example illustrating the need for visualization is Anscombe's quartet : a set of four small datasets of paired x and y values. All four datasets have identical mean (9) and variance (11) in the x variable and almost identical mean (~7.5) and variance (~4.12) in the y variable. The correlation coefficient for each dataset is also the same (0.82) to two decimal places. Actually plotting the data as a simple set of scatter plots highlights that the four datasets are, in fact, very different. Perhaps most surprisingly, the linear regression lines for each set are (almost) the same. This is a case of garbage in, garbage out ; if you try to fit a straight line to show how one variable effects another and the relationship is not even close to linear then don't expect your line to be even remotely representative of your data. Of course, we're not particularly good at absorbing and interpreting large amounts of data in tabular form so the fact set II isn't linear may not be entirely obvious in, say, a spreadsheet: Plot your data before trying to fit it! Scatter plots are the obvious choice for paired datasets like Anscombe's. The one dimensional equivalent is the strip plot. Let's just use Anscombe's y values as a quick example: The strip plots nicely highlights the presence of outliers in Set III and Set IV and show that the bulk of the data points lie between 5 and 10 for all sets. Strip plots often work well when there is only a modest number of data points for each set. With larger datasets things quickly become overcrowded. One could try to get around this by giving each point a random vertical offset to clear things up a bit, essentially adding jitter to a non-existent second variable, but a more common alternative is to bin the data and create histograms. Below, for example, is a histogram made from 300,000 data points generated by a specific continuous random number generator. Picking an appropriate bin width is important. Given that the above figure shows continuous data you may be able to tell that the bin width used is really unnecessarily wide. Instead of using bins one unit wide, we can decrease it to, say, 0.1 units wide. Hopefully this makes it more obvious that the random number generator was pulling numbers from a normal distribution . The mean of the specific distribution was 15 and the standard deviation 2. In the next example numbers are drawn from a different normal distribution. The normal distribution in this case has the same mean as the previous example — 15 — but the standard deviation is much bigger — 5. This means that the probability of getting a number below 8 or above 22 is much much higher than for the previous example. But there's no way of telling that if you just quote the mean. Create modern Web apps for any scenario with your favorite frameworks. Download Ignite UI today and experience the power of Infragistics jQuery controls.A Step-by-step Introduction to JavaScript Setshttps://www.infragistics.com/community/blogs/b/tim_brock/posts/a-step-by-step-introduction-to-javascript-setsFri, 12 Feb 2016 08:27:00 GMT7a8b7c76-b7ad-48e0-9694-5b04ca132ed0:497569Tim Brock/community/blogs/b/tim_brock/posts/a-step-by-step-introduction-to-javascript-sets0As mentioned previously , we have a new JavaScript standard commonly known as ECMAScript 6 (ES6). I've spent quite a bit of time recently reading around the new features outlined in the standard that are coming to, or have been recently implemented in, our browsers. One of my favorite additions is the new Set type. A set is somewhat like an array. It's a place to store values: numbers, strings, objects, actual arrays, Booleans, other sets etc. The most notable difference is that it will only store the same element once. Creating and Adding Elements to a Set It's really easy to set-up a new, empty, set: var A = new Set(); //A is a set with nothing in it Then we can add elements to the set using the chainable add method: A.add(1).add(2).add("1").add([1, 2, 3, 4, 5]).add({}); //The set now contains 5 elements of various types Note that 1 and "1" are different — there is no type coercion so both can be added to the set. On the other hand, if we try to add the number 1 again then nothing changes: A.add(1); //The set still contains 5 elements However, we could add another empty object to the array. A.add({}); //The set contains 6 elements, 2 of them are empty objects Why does this work? Because two objects are only equal if they refer to the same space in memory . The two objects may contain the same properties and methods but they are not considered equal here. The same holds for arrays: A.add([1, 2, 3, 4, 5]); //The set contains 7 elements, including two arrays Conversely, the following only adds one array to the set, not two or three because x and y refer to the same object in memory. var x = ['a', 'b', 'c']; var y = x; //y is an alias of x A.add(x).add(x).add(y); //Only the first call to add adds anything to the set You can also add elements to a set by passing in any iterable as an argument to the original construction call. The simplest option is an array: var B = new Set([7, 8, 9]); //B is a set containing 3 elements Confusingly, strings are iterable (but numbers are not). So this results in a TypeError ... B = new Set(123); //Nope, can't do this ...but the following creates a set with three elements! B = new Set('123'); //B is a set containing three elements, the strings '1', '2' and '3' To add a single string to a new set, place it in an array: B = new Set(['123']); //B is a set containing the single element (string) '123' There's nothing to stop you making sets of sets using the add method: B.add(new Set([1, 2, 3])) //B now contains 2 elements, the string '123' and the set of numbers 1, 2 and 3 B.add(new Set([7, 8, 9])) //B now contains 3 elements, including 2 sets On the other hand, the following just creates a set containing the first three positive integers: var C = new Set(new Set([1, 2, 3])); //Same as var C = new Set([1,2,3]) You can also clone (rather than alias) another set using new Set: var D = new Set(A); //A and D are completely different sets, they just (currently) have the same members Checking for Set Membership Checking for set membership is also really easy using the has method. A.has(1); //true A.has(2); //true A.has('1'); //true A.has(75); //false Of course you can't just check for an empty object or the array [1, 2, 3, 4, 5] for the same reason that you can add more than one such object or array. A.has({}); //false A.has([1, 2, 3, 4, 5]); //false You can, however, check for specific objects or arrays (or dates or other non-primitives) that you have references to. A.has(x); //true A.has(y); //true: y is an alias for x (see above) Removing Elements from a Set The delete method removes members from a set. A.delete(1); //Removes 1 from the set, returns true A.has(1); //false A.delete(x); //true A.has(y); //false: y is an alias for the array x which is no longer a member of A As noted earlier, the add method is chainable because it returns this . By contrast, if you use pop or shift on a regular array they return the extracted element. The delete method of Set does neither of these things, it just returns a Boolean to indicate whether or not an element was deleted. Hence you can't chain delete calls. And of course you can't delete an object or array you don't have a reference to: A.delete({}); //false A.delete([1, 2, 3, 4, 5]); //false You can remove all elements from a set in one step using the clear method (which returns undefined regardless of whether anything was cleared or not): B.clear(); //B is now an empty set. Checking the Size of a Set Like an array, you can check the number of elements contained with a Set. Unlike an array, the relevant property is called size , not length . A.size; //6: The number 2, the string '1', 2 arrays containing the numbers 1 to 5 and 2 empty objects B.size; //0 Looping Over a Set A set doesn't allow for any form of random access. For instance, you can't access the first or any other element of the set using square bracket notation like you can with an array. (Having said that, if you try the result is undefined, not an error.) This also means you can't use an old fashioned for loop. for-in loops don't work either. However, there are a couple of options for getting at elements. You can use the new JavaScript for-of loop for looping over any iterable object, and that includes sets. The following simply logs all elements to the console in the order they were added to the set: for(var element in A) { console.log(element); //Prints out representations of the 6 elements } And, like arrays, sets have a forEach method. This does the same as the example above: A.forEach(function(el){console.log(el);}); //Does the same as the for-of loop above Uses and Limitations I like JavaScript sets because they seem relatively easy to use. They provide a simple system for holding and accessing a collection of data when you don't want to worry about duplicates and the problems they can cause. Unsurprisingly, they also somewhat resemble the mathematical concept of a set. Because of this it's perhaps slightly surprising that the ES6 specification doesn't include methods for performing common mathematical set operations such as union , intersection and symmetric difference . For the time being at least, you have to implement these for yourself (some pointers can be found here ). Moreover, there's no direct way of representing, say, the set of all natural numbers or the set of all real numbers in a JavaScript set (since they're both infinite sets) as you might wish to in mathematics. Browser Support Support for sets is already good in desktop browsers. Mobile browsers are still catching up . Further Reading I found the online book Exploring ES6 by Dr Axel Rauschmayer a great reference for all things ES6. Chapter 19 covers Sets and the related "collections" WeakSets , Maps and WeakMaps . This blog post on Collections by Jason Orendorff for Mozilla is also well worth a read Try our jQuery HTML5 controls for your web apps and take immediate advantage of their stunning data visualization capabilities. Download Free Trial today.Conveying the Right Messagehttps://www.infragistics.com/community/blogs/b/tim_brock/posts/conveying-the-right-messageMon, 08 Feb 2016 12:02:00 GMT7a8b7c76-b7ad-48e0-9694-5b04ca132ed0:497320Tim Brock/community/blogs/b/tim_brock/posts/conveying-the-right-message0We communicate data and the conclusions we've drawn from our datasets in numerous ways: conversations, presentations, reports, charts shared on Twitter... Frequently, the results will be compressed. After all, we can't write a 200-page thesis on every bit of research. Instead we might publish a short article with a headline. And busy people may only read the headline. The end result can be something like a game of telephone , where the message received is a much distorted (or plain wrong) version of what the underlying data is actually telling us. Take the recent announcement from the World Health Organization's (WHO) "specialized cancer agency", the International Agency for the Research on Cancer (IARC), regarding consumption of processed and red meats. The UK press came out with headlines like "Processed meats rank alongside smoking as cancer causes – WHO" (Guardian), and "Drop the bacon roll - processed meats including sausages 'as bad for you as SMOKING'" (Daily Express). Is this really what the data tells us? Should we be worried? The article "Processed meat and cancer – what you need to know" by Casey Dunlop of Cancer Research UK avoids the sensationalist, attention-grabbing, headlines. The result is something much more informative. The IARC did conclude that there is sufficient evidence to say that processed meats do cause cancer, and placed it in the same group as smoking (group 1). Red meats were placed in group 2A. That doesn't mean a lot without group definitions, so here are the five groups alongside bars illustrating the number of entities in each group (taken from the IARC website on 4th November 2015). It seems the headline writers in the newspapers didn't take the time to understand IARC's classification system. As Dunlop explains, the group placements show "how confident IARC is that red and processed meat cause cancer, not how much cancer they cause". Moreover, the IARC are explicit about this in the Q&A document that accompanied the press release: "processed meat has been classified in the same category as causes of cancer such as tobacco smoking and asbestos (IARC Group 1, carcinogenic to humans), but this does NOT mean that they are all equally dangerous". It's easy to blame the confusion on newspapers going for eye-catching headlines. Perhaps IARC should take some responsibility too? From the chart above we can see that most things that have either been classified as "Possibly carcinogenic to humans" or as not classifiable(!?). But do such classifications even make sense? And why not include in the press release the clear statement that the classification of processed meat doesn't mean it's as dangerous as smoking? So how do we make sure we convey results accurately? There's a number of steps we could take, here's four I think are particularly important. Be clear as to who collected the data and how I'm more inclined to trust research results from a multinational group of scientists who are experts in their field than a group of politicians with an agenda. That doesn't mean we should give scientists a free pass though; we all make mistakes from time to time. Full disclosure of the method by which data was collected is vital for the assessment of the reliability of results. It's not unreasonable to expect that of others so expect others to expect that of you. State how much data your conclusions are based on "Seven out of ten people preferred product A to product B". There's a massive difference in the strength of that last statement depending on whether that was the result of asking 10,000 people, with 7,000 preferring product A, or if that's literally the result of asking ten people. Also, if you asked 10,000 people, seven preferred product A, three preferred product B and 9,990 had no preference then don't forget to mention the latter group! It probably says more about your products or the formulation of your survey than the other numbers do. Express numbers in a format that is easy to understand It's important to consider how we express critical numbers that pop up. As with chart design there may be effective and ineffective ways of doing so. For example, saying that action X (like eating processed meat) increases our chances of getting disease Y (eg cancer) by Z% makes for a dramatic headline but tells us nothing about our chances of actually getting the disease, which is what we probably want to know. To calculate that we need to know what the chance was in the absence of taking action X and then do some math. There's a strong case for using natural frequencies instead. For example, you could state the expected number of people getting disease Y in a group of 1,000 processed-meat eaters and the expected number from a group of 1,000 people who don't eat processed meat. From that kind of information most people should be able to make an informed choice about whether the increased risk is worth it. This is something the IARC failed to do in their press release, stating only that there was an 18% increase in risk of colorectal cancer for each "50 gram portion of processed meat eaten daily" . This probably sounds scarier than it actually is . Check with other people If you can, take the time to check how your presentation of results comes across with at least one person not directly involved with your work. If they find your terminology confusing or misinterpret what you say, there's a good chance other people will too. Let your data tell a compelling story with over 75 HTML5 charts, gauges, financial charts, statistical and technical indicators, trend lines and more. Download Ignite UI now and experience the power of Infragistics jQuery controls.Introducing JavaScript's Math Functionshttps://www.infragistics.com/community/blogs/b/tim_brock/posts/introducing-javascript-39-s-new-math-functionsTue, 02 Feb 2016 09:41:00 GMT7a8b7c76-b7ad-48e0-9694-5b04ca132ed0:497060Tim Brock/community/blogs/b/tim_brock/posts/introducing-javascript-39-s-new-math-functions0As of June 2015, the world has a new JavaScript standard. Officially called ECMAScript 2015, but commonly known as ECMAScript 6 (ES6) and sometimes ECMAScript Harmony or ECMAScript.next, the specifications detail a broad range of new features you can expect to see being implemented in browsers in the coming months and years. This includes — but is not limited to — block-level scoped variables let and const , new class syntax, default parameters, modules, sets, and template strings. It also includes some useful additions to the Math object. Unlike most new features of ES6 at the time of writing, (almost in their entirety) these features have already been implemented in the desktop versions of Chrome, Firefox, Safari and Opera, as well as Microsoft Edge (though not Internet Explorer). Support in mobile browsers is weaker. None of these new additions are anything groundbreaking. In fact it's more about adding basic math functionality that is already present in most popular languages. I still think they're worth taking a look at some of them as they will likely be overlooked in other guides to ES6. MDN has simple polyfills for all of them, so if you want to use them on your production site you can, even if you get significant mobile traffic. The idea of Math.trunc is simple — it removes the digits after the decimal point from a number. 13.2 becomes 13 and -12.6 becomes -12. This is similar to Math.ceil and Math.floor , except we're not always rounding up or always rounding down, we're rounding towards 0. That is, if the number is positive then Math.trunc rounds down but if it is negative it'll round up. The chart below shows how Math.trunc and the other rounding function, Math.round , work over the range -5 to 5. (Plotting Math.floor , Math.ceil and Math.trunc in the same chart is confusing.) Math.sign is even simpler. Pass in a number and it will return 1 if the number is positive (even if that "number" is Infinity) and -1 if it is negative (including -Infinity). If the number is 0 then it will return 0 (or -0, which is the same). Math.cbrt returns the cube root of a number. For positive x , Math.cbrt(x) returns the same result as Math.pow(x,1/3) . However, if x is negative the latter returns NaN where as the former will return the same result as -1*Math.pow(-x,1/3) . For example, Math.cbrt(-8) returns -2 while Math.pow(-8,1/3) returns NaN . This isn't a bug , it's part of the specification . At the same time -2, is a correct solution; all non-zero real numbers, positive and negative, have three cubed roots but two of them are complex . The chart below shows curves for Math.cbrt and the (old) square root function Math.sqrt . (The square roots of all negative numbers are complex so Math.sqrt also returns NaN when fed a negative argument.) JavaScript has had the Math.log function, that calculates the natural logarithm (that is base-e) of its argument, since the 1st edition of ECMAScript. ES6 adds Math.log10 and Math.log2 that calculate the base-10 and base-2 (a.k.a. binary) logarithms of their argument. The former is frequently used in 2D data visualization in the sciences when the numbers being plotted span a large range in one or both dimensions. Confusingly, many math text books will use "log" to signify the base-10 logarithm, shortening the natural logarithm to "ln". ES6 also sees the introduction of Math.log1p , which is the same as Math.log(1 + x) , and Math.expm1 , which produces the same result as Math.exp(x)-1 . The chart directly below plots the aforementioned logarithmic functions while the one below that plots Math.exp and Math.expm1 . Math.hypot calculates the square root of the sum of the squares of its arguments. Obviously you can use it to find the hypotenuse of a right triangle if you know the lengths of the other two sides, but you can pass it more than two numbers, or just 1, and it will still calculate the sum of the squares of its arguments. The chart below, for example, shows the result of calling Math.hypot for all non-negative integers up to x. For example, the y-value at x=1 is the result of Math.hypot(0,1) , at x=2 it's the result of Math.hypot(0,1,2) , at x=3 it's the result of Math.hypot(0,1,2,3) and so on. Finally, ES6 sees the addition of the hyperbolic functions sinh, cosh and tanh and their inverses asinh, acosh and atanh. Hyperbolic functions are analogs of their similarly named trigonometric counterparts (sin, cos etc) and crop up regularly in physics, engineering and architecture. The chart immediately below plots Math.sinh , Math.cosh and Math.tanh ; the one below that plots their inverses. There are three further functions added to the Math object that, for reasons of brevity, I won't discuss. They are Math.imul , Math.fround , and Math.clz32 . Try our jQuery HTML5 controls for your web apps and take immediate advantage of their stunning data visualization capabilities. Download Free Trial today.Demystifying Box-and-whisker Plots — Part 2https://www.infragistics.com/community/blogs/b/tim_brock/posts/demystifying-box-and-whisker-plots-part-2Tue, 26 Jan 2016 09:04:00 GMT7a8b7c76-b7ad-48e0-9694-5b04ca132ed0:496648Tim Brock/community/blogs/b/tim_brock/posts/demystifying-box-and-whisker-plots-part-20Having shown you how to read range bars and box-and-whiskers in Part 1 , I now want to use some real-world data to illustrate why they can be useful. Specifically, I'm going to use data relating to the UK general election of 2015. First, for those not familiar with the UK's political system, I'll give a brief overview of how our electoral system works. The UK is divided in to 650 constituencies for the purpose of elections to the nationwide parliament. At a general election, anyone eligible to vote can vote for one candidate, standing in their constituency, that they want to represent them in the UK's lower house, the House of Commons. The candidate with the most votes in each constituency wins and is elected a Member of Parliament (MP). Hence there are currently 650 MPs in the House of Commons (this changes from time to time as constituency boundaries are redrawn). While this set-up should be easy to follow and means everybody gets a specified representative in parliament, it has led to a system where the proportion of MPs in parliament for a specific political party is not generally very representative of the share of the votes the party won. For example, around 11.3 million people voted for Conservative party candidates (about 37% of all votes cast) and they won over half the "seats" in the House of Commons while UK Independence Party (UKIP) candidates won close to 3.9 million votes but only one of them was elected. To get a better idea of why there was such a big difference we can look at the spread in the share of votes across constituencies using the data compiled by the House of Commons Library . One option is to bin the vote share by seat and plot it for the different political parties. Here's what that looks like for the Conservatives, UKIP and Labour (who came 2nd in terms of both votes and seats won). We can see there's a large number of constituencies where the Conservatives won 40-60% of the vote and a large number where UKIP won only 10-20%. This starts to explain things but, with only three parties, this form of visualization already looks a bit of a mess. There are other significant parties that helped determine the result of the election that we haven't seen yet. We could use small multiples , but box plots also provide an elegant alternative. Let's look at "typical" box-and-whisker plots (as defined and illustrated in Part 1 ) for the six parties who garnered more than 100,000 votes. We've lost a lot of detailed information compared to the earlier chart but we're free from clutter and cross-party comparisons are easy. We can see that the 75th percentile (the top of the box) is lower for the Liberal Democrats (Lib Dem) than the 25th percentile is for UKIP (the bottom of the box). Despite this, the Liberal Democrats won 8 times as many seat (ie 8) as UKIP. There is one more thing I'd like to add: a box plot to show the distribution of share of the vote for the winning candidates: Now we can see the importance of those outliers. Not only does the Liberal Democrat distribution have more of them, it has more in the region above ~35% which, as we see from the Winner distribution on the left, is the kind of percentage you'll typically need to win a seat. (Obviously this is all complicated by the fact the distributions are not at all independent.) The Green Party also won one seat and, as with UKIP, it was won by their one candidate that won over 40% of the votes in their constituency. So far there's one party in these charts that I haven't yet mentioned, the Scottish National Party (SNP). Their box plot looks more like the Winner box plot than any of the real parties. As their name might suggest, they only place candidates in the 59 constituencies of Scotland. The other parties shown all stood in over 570 constituencies (the Conservatives stood in all but three). The box plots above don't represent the differences in sample size at all. A common solution to highlight varying sample size is to scale the width of boxes accordingly. Frequently it's the square root of the number of data points in the distribution that is used: Another option is to plot all points, not just the outliers, and use jitter in the horizontal direction to separate them. This can be difficult to both implement and get right, you'll likely have to deal with overplotting , but it is probably more intuitive than scaling the box width. All the SNP's candidates won in excess of 30% of the vote, putting them in the region where winning a seat becomes likely. With this information it's probably unsurprising to learn that they did, in fact, win in 56 of the 59 constituencies in which they stood. As a result, SNP MPs now make up more than 8% of MPs in the House of Commons. None of this is meant as any kind of political statement. I just think it's a nice collection of data for illustrating the power of box-and-whisker plots. Deliver the most demanding and beautiful touch-friendly dashboards for desktop and mobile apps with over 75 HTML5 charts, gauges, financial charts, statistical and technical indicators, trend lines and more. Download Ignite UI now and experience the power of Infragistics jQuery controls!Demystifying Box-and-whisker plots — Part 1https://www.infragistics.com/community/blogs/b/tim_brock/posts/demystifying-box-and-whisker-plots-part-1Mon, 25 Jan 2016 13:11:00 GMT7a8b7c76-b7ad-48e0-9694-5b04ca132ed0:496574Tim Brock/community/blogs/b/tim_brock/posts/demystifying-box-and-whisker-plots-part-17If you browse through a large, printed, newspaper and pick out all the charts you find you'll probably come across some of the following: bar chart, timeseries (line chart), pie chart, donut chart, stacked area chart. You may come across the odd scatter plot too. If you've picked up the New York Times on a good day then you might even stumble into a connected scatter plot . What you're highly unlikely to find is a box-and-whisker plot (aka a "boxplot" or a "box plot"). By contrast, they're very popular in scientific circles. One of the problems with box-and-whisker plots, one of the reasons I think thy haven't made it into the consciousness of the general public via the media, is that they're not particularly intuitive from a visual point of view. You can look at a bar chart and immediately get the idea that the longer the bar, the bigger the number being represented. Similarly, we can look at a pie chart and grasp the part-to-whole concept without much thought or see a sharply downsloping line in a timeseries chart and think "this thing is declining very quickly". But box plots are frequently a mix of rectangles, lines and points. However, I really don't think they're that difficult to understand. And they can be very useful when you have multiple distributions you want to compare. So in this post I'm going to try to demystify them. Rather than dive straight in with "proper" box-and-whisker plots, I'm going to start with something a little simpler: range bars. The diagram below shows how a range bar might look and labels the salient parts. As you can see, there's not a lot to them. You take a univariate dataset and draw a box to signify the lowest and highest values. Typically, you also add a line to indicate the position of the median relative to the lowest and highest values. Now a single range bar doesn't tell us very much. Without an accompanying axis all we can tell is whether the median is closer to the highest or lowest value. You don't really need a chart to convey that small amount of information. Stick the range bar on a scale and we can estimate absolute values of all three of these things. But the real power of range bars, and their box plot cousins, is how they enable simple comparisons between different univariate datasets. The chart below illustrates this for five arbitrary datasets (the precise details aren't important here) I created, each made up of 100 data points. You could extend this layout to ten or so datasets without much problem. As a collection, the range bars look like a set of shifted bars from a bar chart. That's basically what they are. The longer the bar the bigger the range of each distribution. But, like I said, the real insight is from comparing bars. We can see, for example, that dataset E has a much larger range than the others and a much lower median. We also see that while the medians for datasets A to D lie (very) roughly halfway between their respective minimum and maximum values, the median for E is much much closer to the minimum. Shortly I'll turn the range-bar plot above into what I'll call a simple box-and-whisker plot. But first, here's a labeled diagram illustrating the important parts of a simple box-and-whisker: Now the box only covers 50% of the data. Above and below the box we have "whiskers" extending out to the highest and lowest values in the dataset. 25% of data points have a value between the minimum and the bottom of the box, 25% of data points have a value between the top of the box and the maximum. Here's the data I generated earlier displayed as a set of simple box-and-whiskers. We can now see that the large range seen for dataset E comes mostly from (at most) just a quarter of the data points — the 75th percentile is much closer to the minimum than the maximum. There are a number variations of the box-and-whisker plot that attempt to show outliers. The version I see most often (and which I was taught in school) is as follows: Rather than the whiskers necessarily extending out to the smallest and largest values, they instead extend out to the smallest/largest values that are up to 1.5 times the interquartile range (IQR) below/above the 25th/75th percentile. The interquartile range is simply the distance between the 25th and 75th percentiles . Still, all that is quite a mouthful and an explanatory diagram certainly helps: Individual points that fall outside the permitted range for the whiskers are explicitly marked and given the status of "outlier". (I find this a strange use of the term "outlier". In other circumstances "outlier" refers to a data point distant from all other data points. As you'll see below, only a couple of outliers really fit that definition in the datasets we're using here.) The chart below illustrates our 5 datasets using a "typical" box-and-whisker plot. For datasets A, C, and D there's no change from the simple box-and-whisker since no data point lies more than 1.5 times the IQR from the 25th or 75th percentile. For dataset B there is one point just below this range. Dataset E has four high-lying outliers (two data points are almost on top of each other); despite the maximum value in E being greater than 100, 96% of points lie below 70. Now that I've (hopefully) demystified box-and-whisker plots, in Part 2 I'm going to use them with some real-world data to illustrate their strengths. Try our jQuery HTML5 controls for your web apps and take immediate advantage of their powerful data visualization capabilities. Download Free Trial now!Image Manipulation with HTML5 <canvas> elementhttps://www.infragistics.com/community/blogs/b/tim_brock/posts/simple-image-manipulation-with-lt-canvas-gtWed, 02 Dec 2015 05:00:00 GMT7a8b7c76-b7ad-48e0-9694-5b04ca132ed0:491878Tim Brock/community/blogs/b/tim_brock/posts/simple-image-manipulation-with-lt-canvas-gt0Introductions too frequently concentrate on how it allows web developers to draw all manner of graphic objects, from straight lines and rectangles to complex Bezier curves, on to the screen. Here, however, I'd like to focus on another use case: photo-editing in the browser. If you're keen to see what can be done with canvas right away then skip on down to the interactive examples below and come back here when you want to find out how it's done. [custom]width="650" height="11250" src="http://www.infragistics.com/community/cfs-filesystemfile.ashx/__key/CommunityServer.Blogs.Components.WeblogFiles/tim_5F00_brock.Canvas_5F00_Blog/1512.simple_2D00_image_2D00_manipulation_2D00_with_2D00_canvas.html" [/custom]Minimalist Maps: Are They a Good Idea?https://www.infragistics.com/community/blogs/b/tim_brock/posts/minimalist-mapsTue, 01 Dec 2015 13:45:00 GMT7a8b7c76-b7ad-48e0-9694-5b04ca132ed0:493735Tim Brock/community/blogs/b/tim_brock/posts/minimalist-maps0Data maps are everywhere. And it's not just the conventional ones that use Google Maps, OpenStreetMaps or Bing Maps to show the underlying geographical information. Cartograms , "maps" with land masses resized based on data, are quite popular. I'm not a big fan of them because they require us to judge magnitudes based on the relative sizes of some peculiar and many-sided shapes. Generally, we're not very good at this. However, I do think distorted, simplified or unrealistic maps can be useful. In the recent UK general election several media outlets chose to eschew conventional choropleth maps in favor of ones in which all constituencies were equally sized hexagons. The resultant maps were still reminiscent of the United Kingdom, but the amount of any given color became directly proportional to the number of seats won by a particular political party. Maps of the USA with square states have also been used by media outlets to show data. For example here , here , and here . But what if, instead of distorted borders, we don't show any borders at all? In the last year or two I've seen an increasing number of what may be termed "minimalist maps". Specifically, I'm referring to the display of geographic or geopolitical data in such a manner that the underlying geography can be seen, perhaps roughly, without ever drawing conventional features of a map like land/sea, country or state borders. Below is a simple example I made. I'm sure you don't need me to tell you it is a "map" of the world. You may even recognize it makes use of the (in)famous Mercator projection . It shows the locations of all urban agglomerations with 300,000 or more inhabitants in 2014 (data published by the UN's population division ). Each circle is scaled according to its population in 2010 (data for 2014 wasn't specifically available). I'd normally add a scale to a chart like this, but here I'm primarily concerned with the locations of large cities and so some concept of relative size is enough. From the map I'm sure you can make out the location of the USA, the thin band that is central America, the Eastern protrusion of Brazil, the Cape of Good Hope at the bottom of Africa and the Indian subcontinent, without requiring any lines. Conversely, there's little detail about the shape of Canada or Australia and the Southern tip of South America is completely missing. There's nothing in the desert area of North Africa, while the cities on its northern coastline are hard to pick apart from those of southern Europe. It's probably fairly obvious why the map does look familiar despite the lack of sea/land borders: 1) we don't build cities in the oceans; 2) we do build cities by the sea; 3) we're familiar with maps of the Earth, particularly ones that use the Mercator projection. But it doesn't need to be the whole Earth to look familiar. The next map is clearly of East Asia. We can still pick out the Indian subcontinent easily in the map above and the eastern coast of China is fairly obvious too. How about the next example? Hopefully you identified that was the USA (plus northern Mexico and southern Canada). This last one reminds me of the night sky on a cloudless night... Some labels may be needed here to help you get your bearings: Europe has a large number of large urban agglomerations, but they're frequently not found near the sea. Of course, we could tell that many European cities weren't built near the sea if we added the land/sea borders. So one obvious question might be: "Is there really any point to this minimalist approach to mapping data?". For the maps shown here the answer to that question may well be "no". At least, probably not in terms of data visualization best practices. I did, however, find it an interesting test of my geography knowledge trying to label the cities in the last example without looking at a "proper" map. One small advantage with minimalist maps is that you don't have to worry about the size of the map files you're using, which can be large when maps are highly detailed. If you're using vector images on a website that is certainly a positive thing. But sacrificing clarity in favor of reducing file size is never a great idea. More important than file size, however, is that other people have made more elegant and more effective minimalist maps than the ones I created above. This article by James Cheshire includes several great minimalist maps that also show where people live. Arthur Charpentier has created several nice examples of minimalist maps with other types of data. And in terms of letting the data speak for itself, I think Michael Pecirno's "Minimal Maps" are exceptional. Bring high volumes of complex information to life with Infragistics WPF powerful data visualization capabilities! Download free trial now !Choosing the Right Way to Flatten the Earthhttps://www.infragistics.com/community/blogs/b/tim_brock/posts/choosing-the-right-way-to-flatten-the-earthWed, 18 Nov 2015 15:45:00 GMT7a8b7c76-b7ad-48e0-9694-5b04ca132ed0:493126Tim Brock/community/blogs/b/tim_brock/posts/choosing-the-right-way-to-flatten-the-earth0The art and science of drawing our 3D Earth on to a 2D sheet of paper or computer monitor is worthy of a book . Unfortunately, I've only got a few hundred words. As a result, I'm going to largely concentrate on a single, controversial choice: the Mercator projection. When evaluating map projections we're not just talking about finding a suitable representation for a sphere in Flatland , because the Earth isn't a sphere. It's an oblate spheroid (i.e. flatter at the poles than the equator). Or, at least, it's more an oblate spheroid than it is a sphere. But it's a bit bumpy too. Still, even if it was a perfect sphere, the task of representing the whole Earth completely accurately on a sheet of flat paper is entirely impossible. A flat map cannot simultaneously display area, shape, direction, bearing, distance and scale perfectly all at once. So we have different map projections which do represent one or two of these things accurately and we pick the most appropriate or go for a compromise projection . At least that's how things probably should work. You're no doubt familiar with the Mercator projection (even if you didn't know the name). It's long been derided for making places near the poles (like Greenland) look very big and places near the equator (like much of Africa) look comparatively small. It has been suggested that this has led to misunderstanding in the United States about travel to and from Africa in relation to the recent ebola outbreak. It's even been tied to racism by some! Interestingly, there have been several recent attempts to redress apparent misperceptions about the relative size of Africa by super imposing other countries, like the United States, China and India on top (see, for example, here and here ). However, Google Maps, OpenStreetMap, Bing Maps and others use a variant of the Mercator projection termed Web Mercator . Not only do we have the same area issues as with "ordinary" Mercator maps, the Earth is assumed to be a perfect sphere. So, given all this, why do all these popular "slippy map" applications use (a variant of) the Mercator projection? Should we be using something else? Bing Maps software architect Joe Schwartz gives an excellent and detailed answer to the first question here . In short, the projection used means north is always up and east to the right regardless of where you may be zoomed in to. Moreover, the projection is ( almost ) conformal, meaning "small" objects (like buildings) have the right shape. This is critical for street maps. Using OpenStreeMap and IgniteUI I've created two maps with location markers. For simplicity, I'll show one screenshot for each at an appropriate scale. The first shows the whole Earth with the locations of all urban agglomerations with populations of 300,000 or more as of 2014. The second map shows a small portion of central London. The locations of seven London Underground stations in this relatively small area are already marked, but I've added my own markers for them too. The magenta-colored circles in the bottom right (i.e. south east) corner are the locations of stations that are on the Piccadilly Line . The black hexagons mark stations that are not on the Piccadilly line. (I used the geolocations given here which obviously don't align perfectly with the markers already on the map.) The points made by Schwartz and illustrated by these two examples highlight an important issue that I've previously noted in relation to more general data visualization: context is key. The Mercator projection may not be completely ideal for showing your global dataset. But slippy maps have other uses too. Using one to walk a few blocks from A to B (e.g. to get from a station on one train line to a station on a different line) would be a lot more difficult if we had a non-conformal projection where small-scale objects were distorted and angles were all wrong. None of this is to say that the Mercator projection is the best all-round solution for every scenario. It isn't. In an ideal world we'd be using equal area maps to show data related to areas. Having said that, if we're talking small areas, a Mercator projection should work just fine. On the other hand, we could never use Google Maps, OpenStreetMap or Bing Maps to plot data around the South Pole because the South Pole can't be shown, such is the nature of the projection. All projected maps have limitations, some you may just need to be aware of, others are absolute. If you're planning to stick a map of the world up on a classroom wall, there are better options than Mercator. Personally I like the compromise of the Robinson projection . National Geographic abandoned Robinson in 1998 in favor of the Winkel-Tripel projection, having previously also used the Van der Grintern projection. There's even some pretty complicated math to back up National Geographic's choice. And when they wanted to focus on the oceans, they decided on something entirely different . The important thing to realize is that all 2D maps are wrong, but some are useful for specific purposes. Try our jQuery HTML5 controls for your web apps and take immediate advantage of their stunning data visualization capabilities. Download Free Trial now !Jitter - Another Solution to Overplottinghttps://www.infragistics.com/community/blogs/b/tim_brock/posts/jitter-another-solution-to-overplottingWed, 11 Nov 2015 11:54:00 GMT7a8b7c76-b7ad-48e0-9694-5b04ca132ed0:492726Tim Brock/community/blogs/b/tim_brock/posts/jitter-another-solution-to-overplotting0Back when I discussed tricks for coping with overplotting I omitted (at least) one popular "solution": jittering the data. Jittering, is the process of adding random noise to data points so that when they are plotted, they are less likely to occupy the same space. It is most commonly used when the data being plotted is discrete. In such cases, in the absence of jitter, it's not just that the edges of data-point markers that overlap, the markers actually sit perfectly on top of each other. No amount of reduction in the size of the data point can remove this problem. While jitter may be added to points in, for example, a box plot , it's most frequently used in 2D scatter plots. Chart A in the graphic below shows a contrived example dataset with no attempt to deal with overplotting. Both the x and y variables only take integer values. The dataset actually contains 2000 points, but there are only 780 unique points. Chart B shows the same data but with the addition of jitter. Specifically, for each point a random number drawn from the continuous uniform distribution between -0.5 and 0.5 is added to the x coordinate and another random number drawn from the same distribution is also added to the y coordinate. As noted previously, another potential solution is to make points semi-transparent (chart C) and, of course, these two options can be combined (chart D). What conclusions are there to be drawn from these plots? Because there are only 780 unique pairs of values, 1220 (61%) of data points are completely obscured in chart A. With the addition of just jitter each point becomes unique, but there is still some degree of overlapping of the dots used to represent them. Making the points translucent certainly helps show that there are more than the 780 points visible but it's not always an acceptable option. Because translucency and other alternatives, 2D histograms for example, aren't always acceptable solutions it might be worth thinking about what other issues can arise with jittering data. One key concern may be that of integrity. If we move the points away from their "true" positions, are we deliberately distorting the data? While chart A above may seem like the more "correct" way to plot the data, chart B is better at showing approximately where most of the data is. In chart A, all points are in exactly the right place but they are not all equally representative of the distribution of data; one visible dot can mark the position of anything from 1 to 12 data points. Without translucency or color or something else there's no way of knowing which is which. Despite this, I'd like to point out once again that you should consider your audience. Will they be confused by non-integer values being plotted for something they know can only be integer-valued, for example? What about points at the extremes that in one dimension are no longer even in the permissable range? I think it's also worth studying some real-world data that will look familiar if you've been reading my other articles here. The GIF below shows a 2D histogram of RGB image data from an 8-bit png image (precise details and the image from which it is extracted can be found here ). As the animation progresses the length of the uniform distribution from which jitter values are drawn (the "Jitter Extent") increases in both dimensions. Here the use of jitter does allow us to see more details about which of the value pairs occur most frequently. Because the data remains in square blocks, there is still the sense of there only being a modest number of discrete values in the underlying data. In that previous article I also looked at the distribution of blue and green values for low- and high-quality JPEGs of the same initial image. The GIF below shows the effect of adding jitter to these. Aside from the points spilling out beyond the confines of the axes (which looks weird if nothing else), the clear differences between the two scatter plots dissapears as the Jitter Factor increases. This is highly undesirable. As with many things in data visualization, there's no clear answer to the simple question: "Should I jitter data points?". Jitter can help clarify where the bulk of the data lies but it can also distort important patterns. Where appropriate I prefer to use translucency, but sometimes — e.g. when the color of points already tells us something important — that isn't an option. Bring high volumes of complex information to life with Infragistics WPF powerful data visualization capabilities! Download free trial now !Visualizing the Data Behind Your Imageshttps://www.infragistics.com/community/blogs/b/tim_brock/posts/visualizing-imagesMon, 09 Nov 2015 10:00:00 GMT7a8b7c76-b7ad-48e0-9694-5b04ca132ed0:492506Tim Brock/community/blogs/b/tim_brock/posts/visualizing-images0I think it's an interesting exercise to visualize data contained in a photograph as we might other datasets. Some cameras and photo-editing software do just this, constructing histograms from the red (R), green (G) and blue (B) values (the three "primaries") in the image to help expert photographers/editors judge and improve color balance. To illustrate this idea, I'll use the color photograph and its grayscale counterpart below. It's easiest to start with the grayscale image where the R, G and B values are, by definition, identical for any given pixel. On the left we have a grayscale value of 0, meaning an (R, G, B) value of (0, 0, 0), a.k.a. black; on the right it is (255, 255, 255) or white. In between we have 254 shades of gray. This is one of the few cases where use of a gradient in a chart may actually help clarify things, in this case by coloring each bar according to the grayscale value it represents. The only problem with this is then picking a background color that is neutral: With a color image we need to plot each of the R, G, and B histograms separately. It's less cluttered to use lines rather than bars, but with a sensible color scheme they probably don't need labeling. Aside from achieving a good color balance, I think there is another issue that is interesting to explore that is important for web development: compression. JPEG compression is a fairly complex topic and I'll confess I've yet to really get my head around it. The color photo above is a high-quality (low-compression) miniature of the original (eight megapixel) image I shot. It is 46 kilobytes in size. The image below is the same in terms of number of pixels but is more highly compressed and of lower quality. It is, however, only nine kilobytes in size. The lower quality should be obvious in the photo itself, but does the RGB histogram look significantly different? Before trying this out I genuinely had no idea what to expect. While the histograms below are clearly different to the ones above they don't really scream "this image has been compressed much more than the other one". An alternative to the JPEG is the PNG. Here's an 8-bit PNG version of the photograph: This image looks a lot better than the low-quality JPEG, but it's also six times bigger than it (in terms of file size) and a third bigger than the high-quality JPEG. The histogram is also very very different and difficult to read (note also the difference in the vertical scale). This chaotic histogram is a result of the fact that only 256 different colors can be used in an 8-bit PNG (just like a GIF ). The histogram data above can be extracted from (at least some versions of) Photoshop and presumably other photo editing software. But having not found any major difference between the histograms for the high- and low-quality JPEGs I was curious to see whether more complex representations of the RGB data would highlight the difference. This required extracting all the individual RGB values, not just sums for the different channels. I didn't know how to do this in Photoshop so I used R (other options are available). In an ideal world a 3D scatter plot would work brilliantly - one dimension for each of R, G and B. But 3D scatter plots on 2D screens rarely work. So I opted for a more conventional 2D scatter plot, using point color to show the third color. In the examples below I (somewhat arbitrarily) opted to plot blue value against red value, and colored the points according to the green value. For instance, a data point representing an image pixel with an RGB value of (30, 96, 92) would be plotted at the point (30, 92) on the chart and have an RGB color of (0, 96, 0). My original attempt to do this suffered from some serious overplotting issues . To reduce, though not remove , this issue I made the points much smaller and took a random sample of "just" 20,000 points (just under 20% of the data) for each image. I also added the marginal distributions (i.e. the relevant histograms) at the unlabeled extremities of the plot. These were derived from all the image pixels, not just the sample. Now we can see a difference between the high- and low-quality JPEGs: the latter has much more pronounced diagonal bands of points and gaps. Examining the data, there are around 20,000 different RGB values in the ~105,000 pixels of the high-quality JPEG but only 15,000 in the low-quality JPEG. As for the PNG plot, that is an extreme example of overplotting. There really was 20,000 points plotted, they just occupy only a few hundred different positions. Ultimately you can probably get through life as, say, a web designer or developer without understanding the intricacies of image compression - I'm still not all that sure how JPEG compression actually works. But I think it's interesting to know the underlying data is there to be played with and for constructing your own dataviz experiments. Try our jQuery HTML5 controls for your web apps and take immediate advantage of stunning data visualization capabilities. Download Free Trial now !Simplifying Visual Search for Presentationshttps://www.infragistics.com/community/blogs/b/tim_brock/posts/simplifying-visual-search-for-presentationsMon, 02 Nov 2015 11:26:00 GMT7a8b7c76-b7ad-48e0-9694-5b04ca132ed0:492113Tim Brock/community/blogs/b/tim_brock/posts/simplifying-visual-search-for-presentations0When giving presentations it's very tempting to try and squeeze as much information as possible into a short time slot of perhaps only ten or fifteen minutes. Practice certainly helps when it comes to matching the talk to the time slot, but we really should also consider whether the audience is likely to have had enough to time to absorb all the information we've thrown at them. One of the difficulties is that this is quite a hard thing to measure. It's relatively easy to determine the speed of a moving vehicle, the speed of sound in air or the speed of light in vacuum. But measuring the speed of comprehension or the speed of thought is a quite different task . Nevertheless, when it comes to chart presentation, we can still help out our audience by making visual search — literally the task of identifying a specific object in a visual environment that contains other distracting objects — easier. The basic principle is simple: make the thing you want the audience to focus on stand out. The easiest way to do this (at least without resorting to big "LOOK HERE" arrows) is through judicious use of color. First an example without any color variation. See how quickly you can find the one square in the field of circles and triangles. The chart below is the same except the square is now colored red and "pops out" at a glance: Simple, right? And you probably already know to highlight things of interest with more vibrant colors than the less interesting "distractors". But maybe it's not just the square data point that is of interest. Maybe you want to discuss all three point types at different times in the presentation. If the data were being presented in a book or newspaper you could just use three different colors of similar salience and, with the aid of a legend, the reader can take the time to work things out for themselves: This is space-efficient. But specific points no longer pop out and the observer may have to switch from data area to legend and back several times, an inefficient visual search strategy. When you're more bothered about effective use of time than space it can make more sense to use multiple charts that all share the same basic structure but that highlight different aspects of the data. When talking about the square data point the visible slide should have a chart like the one above with only the square data point colored. When you move on to discuss the circles, switch to a slide with those colored instead and with them moved to the front. And do the same thing for the triangles too: Creating multiple charts will obviously make the presentation preparation process a little more laborious but it's worth the effort if it helps your audience understand what you're actually saying. It also adds more structure to the talk so some of the additional time in creation of graphics may be offset by reduced time spent on memorizing the spoken part. You can use the same simple "trick" with line charts too. The chart below shows some fictitious time-series data. This graphic is perhaps acceptable for a printed page or website. But on a slide deck we can probably do better by matching the visuals to the audio again. If we're really only concerned with sales of apples (the other lines only providing context), give that line color and make the others thin and gray. As with the scatter plot, it may be that we're interested in all categories but still focusing on one at a time. We can fade those of less interest (at some moment) by making them semi-transparent and thus bring focus to the category of interest, like apples (again)... ... or pears... And so on. We can also compare two at a time with subtler distractions (though I think this is somewhat less effective): You can (and sometimes should) use the simple techniques above with your printed charts too, of course. But when your main constraint becomes temporal rather than spatial, like with a short presentation, it becomes more important. Match the focus of your chart to the focus of your speech at that time and make your audience's task of following along that bit easier. Don't just assume the chart you designed for the website or a printed article will "do" for a presentation. Take the time to adapt for the situation at hand. The one (major?) drawback here is that audience viewing conditions for your presentation will likely not be as great as yours when you're sat in front of your monitor. Because of this, you probably don't want to reduce the opacity in faded lines (and labels) quite as much as you can get away with when viewing on your own screen. If at all possible, test your slides out in advance in similar conditions to those in which you will give your presentation. Try our jQuery HTML5 controls for your web apps and take immediate advantage of stunning data visualization capabilities. Download Free Trial now !