Visualization Sprint

Global Water Experiment

Visualization Sprint:
an experiment in collaborative
data visualization
Visualization Sprint Process Outline

Global Water Experiment

Coders, designers, and data experts: contribute to a collaborative effort in data visualization. Show off your skills, discuss technique, or dive into a new HTML5 toolkit by participating in our first "Visualization Sprint." One lucky contributor will be selected at random to win a pass to this year's sold out Eyeo Festival and all contributors will be credited on the final project.

This first Visualization Sprint was conceived in response to a massive citizen science project involving 75,000 students from 80 countries around the world who have been collecting samples of water purity, pH, and salinity for the past year. The project is called the Global Water Experiment and it's organized by UNESCO and IUPAC as a central activity of the International Year of Chemistry. In keeping with the spirit of how the data was collected, we've devised this Sprint as an experiment in collaborative data visualization. Our goal is to have a finished visualization of the students' data that we can share with the world on World Water Day (March 22).

Jan Willem Tulp, the winner of our Eyeo 2011 Challenge, has kicked things off with a sketch using d3.

Data: we've posted an XLS version of the data set for exploration. URLs for JSON versions of the data are available in the Resources box when you fork a version.

How to participate:
  • Everyone should check out the edits made so far, vote for which modifications are successful, and comment on the direction of the visualization
  • Coders are encouraged to fork an existing version and improve it, or submit a new "alpha" version to start a branch from scratch
  • Data scientists can analyze the data set and send us your revised version for others to use

Explore Versions

Comments

De Datagraaf's picture

Okay, thanks again.

Edward Lee's picture

There are two missing versions (33, 34) because I created a couple test versions when fixing the page, which have since been unpublished. The versions' identifying numbers do not adjust, since we didn't want to confuse any references made to them in the comments.

De Datagraaf's picture

Shouldn't that be version number 33?

gabrielflorit's picture

jerome great use of color! :) :)

jerome cukier's picture

(mmm for my contribution 35 to work you have to click somewhere inside the vis first before you can use the keyboard)

De Datagraaf's picture

Thanks Edward, for that info.

And thanks Jerome. for those dinosaurs.

Edward Lee's picture

@DeDatagraaf: No, I identified a couple things that were slowing down the sprint page and would get exponentially worse as more versions were added (recursive functions, unoptimized queries, etc.). Still waiting to iron out one of the bugs, but all the versions should be visible now.

De Datagraaf's picture

@Edward That sentence was supposed to end with a question mark.

De Datagraaf's picture

@Edward Something wrong with those graphs themselves_

Edward Lee's picture

I've had to unpublish a few of the most recent versions to get the page to load. Don't worry, they'll be back soon.

Edward Lee's picture

We should have the page working again shortly...

De Datagraaf's picture

@Fangoria I already changed that in my latest version.

Fangoria's picture

when selecting different water types and sources the frequency plot does not change. it will be nice to see how these differences impact on the values frequency,

De Datagraaf's picture

Fixed it.

De Datagraaf's picture

@Fangoria Ah, yes. I hadn't noticed. The easiest solution would be to have the divs popup straight underneath the pointer. But maybe it's nicer to have it show up in a position depending on the position of the circle it goes with.

Regards,
wim

Fangoria's picture

when visiting the schools on Australia and other far East countries, the information is only partially shown

De Datagraaf's picture

Okay, in version 23 I have tried to indicate the safe/unsafe dichotomy in all graphs. I have used the color green for this. This is still a it tentative, though, because I only use one scaling routine for all the graphs. In an ideal world every dataset would be scaled to have it's own safety limits as found in the scientific literature.
But, at first glance, I would say the coloring is pretty well near the mark.
Alexbrey earlier mentioned 6.5 to 8.5 as being the safe limits for the pH-value. This pretty much where the green area in the graph lies.
On the net I found that safe salt levels are to be found between 11 and 33mg (but I have to admit this is based on one short search).
And the less bleach needed for cleaning your water the better, I'd say.

Hope you find the results informative.

If anyone has good info on the safe levels of any of the variables, please let me know.

Regards,

wim

De Datagraaf's picture

One more thing on the temperature issue:

I do realize that temperature is represented in my forks as the radius of the circles. This stems from Jan Willem Tulip's first sketch, and I did not see enough reason to change anything about that.

Any version of the chart that represents temperature both as this radius ánd as a gradient is presenting redundant information. I think this is too much attention for data that is not very relevant if at all.

Regards,

wim

De Datagraaf's picture

@artzub First of all, thanks for your positive reply to my remarks.
I take it that Dutch 'directness' is not appreciated by everyone :).

I haven't worked with gradient maps just yet. So I would take me some time to be of any help in coming up with an alternative approach.
As I said in my repsonse to alexbrey: I don't think either the temperature data or the country borders are very relevant to the pH data or it's presentation.
I do think there are discernable pH gradients between all the points of measurements. And it would be nice to animate the circles into those gradients.

Regards,

wim

De Datagraaf's picture

@alexbrey Those safe/unsafe indicators are a good idea. They would make things a lot clearer. I'll see what I can do, or someone else may beat me to it.

And I think you are right about temperature not being very important or even misleading. The subject of the experiment is pH solely. Temperature adds no relevant information.

What would be nice though maybe, is an animation turning the pH point measurements into a gradient (and back)!

Again, with the gradient being attached to their sources and not tied to (also highly irrelevant) country borders.

Regards,

wim

artzub's picture

What i mean, when creating gradient temperature. I wanted to show how the distributed temperature measurement across countries. That is, show that in some measure were taken at higher temperatures than others. But there are countries like the U.S., have and low temperature and high. Of course I may be in the wrong direction began to move =)
Thanks for the tips, I'll try to make a more objective picture.

artzub's picture

@De Detagraaf Thanks for the tip, I'll try to make circles around the points of measurement, or something like that, but they do not go beyond the boundaries of redistribution of the country.

Calculation colors of temperatures was based on all data in the temperature measurements, but for the country taken by appropriate values ​​to it. Russia have one color, because one dimension.

artzub's picture

@de-datagraaf Well, I understand you. Indeed the gradient maps are not an objective picture. My idea to paint a specific area on the map where the measurements were made, maybe even a plasma. But I still cant figure out how to do it.
The idea of gistogramoy just super!

alexbrey's picture

Temperature depends on when the sample was taken (both time of day and time of year) so unless you're going to include this additional information it's misleading to include it for comparison. You could also include a "safe pH for drinking" bar on the histogram running from 6.5 to 8.5 (at least these numbers are according to the American EPA: http://water.epa.gov/drink/contaminants/index.cfm) -- or maybe an unsafe for drinking bar below 6.5 and above 8.5 would be clearer.

De Datagraaf's picture

@artzub I have to admit I was so focused on the pH data and the histograms I misred your chart at first, thinking you also had used the pH data. Just to make sure: representing the temperature data aswell is a good idea of course. But it shouldn't compete with the pH data for attention, which it does a bit. And I still think you can't color such large areas based on so few measurements.
I haven't looked at your code long enough to fathom all the details, so I couldn't do it myself, but I wonder whether it would also be possible to limit the radius of the gradients to much smaller circles around the measurementcoordinates instead of tying them to the country outlines.

Regards,

wim

De Datagraaf's picture

I added the histograms myself. Now we get an overview of measurements. Loads of fine tuning (layout wise) still needed, of course. (But I always leave that kind of stuff till last.)

Oh, please bear in mind that there are still a couple of weird (probably errounous mesurements in the salinity data (a pair of values that got switched) and the bleaching data (specifically the one result that mentions the unlikely number of 799(!) drops).
After correction these errors here at home, the spread of the histograms look fine. I can't make any corrections to the online datasets, so on the website especially the bleach histogram looks totally weird.

Regards,

wim
Regards,

wim

De Datagraaf's picture

@artzub Great colors. But what do they signify? Do they reflect the actual situation, or do they misinform us? To be honest, I don't think they are a proper reflection of the underlying data. Have enough measurments been done to warrant this kind of coloring?

For instance: Alaska is now colored, but there are no measurements from that area in the dataset. Similarly Canada, China and Russia are colored all the way, but this is based on a few (or one!) very localized measurements.
Or have a look at Australia: There is a gradient going from the center of the continent to the coastal regions. But do the measurements really give us enough information to conclude that there is a gradual increase or decrease (can't tell without a key) in pH from the center of the country towards the costal regions. Since there are no measurements done in the central regions, we have no way of knowing.

Just to be sure: I am not trying to be negative here, just being critical (I have a background in scientific methodology, so this has sort of become second nature). And since this challenge is as much about the process as about the result, I think I have to be as frank as possible.
Hope you don't mind too much.

Regards,

wim

De Datagraaf's picture

What's with the bloody spam?

De Datagraaf's picture

@Jerome Cukier Hey Jerome, good to see you around! I liked your ripples so I nicked them for my version 15. I adapted them somewhat for a smoother animation. Hope you don't mind.

Regards,

wim

De Datagraaf's picture

@Fangoria I switched the colors.

De Datagraaf's picture

@Fangoria Thanks for the feedback. Switching those colors is easy, I'll do it in the next version.
I'm not so sure a national pH average is a meaningful measure. One would have to check the dataset to explore how the pH levels vary per country and between countries. It might be that there is a lot of variance in the data for one country - which would mean an average does give useful information.

Regards,

wim

Fangoria's picture

It would be nice to show individual countries in different colours, maybe by "national average pH"

Fangoria's picture

Low pH is typically shown in red and high in blue.
http://chemteacher.chemeddl.org/services/chemteacher/index.php?option=co...

De Datagraaf's picture

What is missing from the map, is a graph that does some summing up. The map alone doesn't give any insight into the spread of the data. Someting like a histogram would be a great addition.
There is ample room (left bottom corner for instance) to put in.
Any volunteers? :)

De Datagraaf's picture

Ps: there is also a lot to do in the transition departement...

De Datagraaf's picture

@All
Well, I added a version that incorperates 3 of the 4 datafiles available (see the fork for why the 4th one isn't in there yet).
I used a couple of really rough and ready techniques to swap the data (for instance: all circles are removed and re-appended at switch over instead of updated, the same goes for the gradients - this works, but it is not very elegant).
The color scales are very much working versions - there is surely room for improvement there.

Also, there are a couple of things that aren't as they should be in the datasets. First of all "salinity" is "sanity" in dataset 2. This is a bit awkward.
Dataset 2 an 3 contain a couple of weird scores that stand out immediately. I left them unchanged so as not to confuse things. But we should definitely look into this. See my remarks in the code for more details. Of course there could be more, less obvious, mistakes in the data.

For any other loose ends and to-do's please see my remarks in the code.

Okay, have fun with it. And enjoy the remainder of your weekend!

Regards,

wim

De Datagraaf's picture

@Bernardo Santos You are right about the need to take the color blind into account. (red/green color blindness being the most common form). So long as there is only one scale, this should not be a big problem. (My choice of red and blue is by no means ment as a final one, of course. ) But there are a couple more variables in the project (see hte other data files) that also need a color scale, some sort of coherent scheme needs to be defined for all of these. This could be a bit trickier.

I like the addition of a white center color for this scale, btw.
Regards,

wim

Bernardo Santos's picture

"doubt that i'll be anywhere close to being able to implement the method proposed in the last link in d3, before this sprint is over".

wink

Bernardo Santos's picture

hello, haven't seen mentioned in the comments i've read. maybe at least one of them is planned for down the road, but still, here's my 2cts:

adjusting for colour-blindness (@ least as an option):
http://vis4.net/blog/posts/goodbye-redgreen-scales/?piwik_campaign=rss&p...
http://www.coloradd.net/index.asp >> i've met Miguel, I'm sure he'd be interested in helping, i can follow this lead and come back to you folks.

grouping overlapped nodes @ different zoom levels:
http://vis4.net/blog/posts/clean-your-symbol-maps/
by clicking on an oversized node (which can be visually coded as such), the visitor gets a list of all the different nodes that would overwise be overlapping. i suggest doing this per zoom level, i.e., getting the distances between nodes every time zoom is changed, and act accordingly, per case and threshold (see linked article).

i'd love to help with this, but i've limited skills with Processing and while very interested in d3, doubt that i'll be anywhere close to being able to implement the method proposed in the last link in d3.

rooting for this, it's looking good,
Bernardo, Portugal

De Datagraaf's picture

@Jonathan Schwabisch: School and country names added to the tooltips. But the layout is far from perfect yet.

rltrafael's picture

@De Datagraaf It would be easier if the datasets had the same 3-letter id's as the ones in the world countries file. But I don't know if we have any means to "unify" this data and use an improved .json with that country id. Anyone know anything about this possibility?

Jonathan Schwabish's picture

Nice work so far everyone. Could we pack more information into the menus above each point? Maybe the name of the country and the source school?

De Datagraaf's picture

Aha, in dataset 2, where it says "sanity" it should read "salinity"...
That makes a lot more sense.

De Datagraaf's picture

Okay, found it. Sorry for asking for stuff that was two clicks away. Time for bed, I guess... :)
Anyway, still eager to discuss the data with the other participants!

De Datagraaf's picture

Sorry to barge in like this, but I was wondering whether there is more background info about the exact nature of the data?
There are a couple of questions that spring to mind when I look at the data:
What are the research hypotheses and/or questions?
Why are there no times and dates mentioned?
What's it with all the different sources? Can we just chuck all this data into one graph?
What exactly is the definition of "sanity", "sanity by weight", and "stillefficiency"? What are these "drops of bleach" doing in there (I presume it is a test of some kind)?
There seem to be four seperate measurements, do we need to combine the results in some meaningful way?
Etcetera.

Regards,

wim

De Datagraaf's picture

@rltrafael It would be a lot easier if the datasets had the same 3-letter id's as the ones in the world countries file. I think they are official UN abbreveations, so they should maybe have been in there already. :)

Regards,
wim

rltrafael's picture

There is one thing I noticed about the countries data: the data in gwe_experiment1.json have diferent country values than the ones in the world-countries.json. For example, in gwe_experiment1.json, United States of America is called, sometimes, United or States. Maybe it even has more names. So, to work properly, we need to have a more consistent data set.

rltrafael's picture

@Mahir You are right. I can't look into it right now, but I will later. Also, what do you meand by "filter with zoom function would be better."?

Mahir's picture

@rltrafael I believe something is wrong in your code. When I choose USA, nothing is displayed.

@Rnhatch I agree, filter with zoom function would be better.

Rnhatch's picture

Having a country filter without the map zoom controls is not a very effective experience. I know the mouse wheel controls zoom in this Geo library - but the controls would be a nice affordance.

While we are talking features - It would be really elegant to add the coordinates for each country and have the map recenter on selection of the country item from the drop down box. (says the non coder in the room..) Otherwize great work here...

Add comment

Login or register to post comments

How It Works

A "visualization sprint" is a new collaborative design and coding experiment where contributors work together to find the best solution, or solutions, for visualizing a data set. Taking cues from citizen science, open source culture and code sprints, Visualizing.org has conceived of this format in order to promote community dialog, learning, and feedback about visualization techniques.

Starting from an initial sketch written in a popular scripting language, anyone is invited to "fork" the code — adding to or modifying an existing version and then posting it back to the sprint. Any version can be forked, and new "alpha" sketches can also be submitted to start a new branch from scratch.

Check out the project prompt and data set at the top of the page, then look at the initial sketch. What are we trying to visualize here?
Explore other contributions: what are the branches the design has taken? where does the visualization still need work?
Add your voice to the discussion by voting versions up or down and commenting as you explore.
Choose a version to modify and click the "Fork This Version" button.
Work on the code, testing as you go along. When you're ready, commit your contribution along with a short description.
Repeat from step 2!

More Info

Is this project only for programmers?
Sort of. The purpose of this sprint is to explore and discuss techniques in interactive visualization, and it uses simple scripting languages so that we can easily see and modify the code. If you're not a coder we encourage you to participate by voting on the versions, adding your voice to the comments, analyzing the data set, or even sketching visual design directions that others could help realize.

What resources are available for my version?
When you fork a version, the top of the edit form will have a list of libraries and data sets you can use. If you need help figuring this out, just send us an email.

Can I transform the data set?
Yes! You are encouraged to transform and analyze the data set. If you modify the data in a useful way, send us the updated file so we can share it with everyone working on the sprint.

How does the sprint end?
The focus here is on process, but we also want to have a nice visualization (or several) to present on World Water Day. Hopefully, the consensus of all participants will bring the most promising branches to a state polished enough to be called "final."

Is there a prize?
Because this is a collaborative effort, no "winner" will be chosen. As a bonus for participating, however, we will give away a prize to one participant chosen at random: a pass to this year's sold out Eyeo Festival.