Yesterday the Washington Post had an article regarding the rapid progress scientists are making in understanding and developing possible treatments for the coronavirus. The article describes how by making data (in this case genomic data) available to the world, had led to spontaneous research all over that has already mapped the Conravirus genome and moved on to possible drugs to treat it.

“The pace is unmatched,” said Karla Satchell, a professor of microbiology-immunology at Northwestern University Feinberg School of Medicine. “This is really new. Lots of people [in science] still try to hide what they’re doing, don’t want to talk about what they’re doing, and everybody out there is like: This is the case where we don’t worry about egos, we don’t worry about who’s first, we just care about solving the problem. The information flow has been really fast.”

Implicit in the quote from Dr. Satchell is that hiding data, be it because of ego’s and territoriality impede progress. It’s great that open data is being used for this global crises. Restricted data is still a very large problem within aerospace companies where data sharing practices are still straight from the 20th century instead of the 21st:

20th Century Data Sharing

  • Data shared graphically only in plots in PowerPoint slides used in meetings and then stored randomly on shared drive.
  • Data is shared graphically in a report that is released many months after someone gets around to writing it, and someone else gets around to approving it.
  • Data is shared in a disorganized excel sheet where the data was recorded.

Modern Data Sharing

  • Data is shared quickly after it is collected in text files with agreed upon schema.
  • Rather than a penultimate report at the end, current thoughts on the meaning of data is shared via quick web posts that are quick to modify as thinking changes.
  • Data is processed in notebooks where the steps from raw data to final post-processed data can be well documented.