General notes on imaging

From I Do Imaging
Jump to: navigation, search

Working with medical images is more demanding than regular tasks. The processes frequently are complex and have multiple steps. The data may be very large, or in very many files. Some notes that may help you in the hands-on imaging workshop and beyond...

Rule 1. Expect to lose your data.

Disks will fail. You will delete your results by mistake. You will lose that one USB drive with everything on it. With good procedures, these will be inconveniences, not disasters.

If you don't know where everything is, it's already lost

Keep all your data in one place. If there is one top level directory that contains everything, you know where to start looking. So be careful to save files systematically when you receive them. A good practice is to always use 'Save As' rather than 'Save'. This gives you a chance to check the destination location.

Your personal input is even more valuable than the data, which usually came from another source and can be retrieved. Again, keep everything in one place.

Develop and maintain a consistent naming standard

A systematic naming convention greatly helps keeping track of files and allows future growth. Some suggestions:

  • Store data in levels of directories that reflect the 'ownership' of the data. For example, a research study has several subjects, each of who have several scans. Each scan has several series, each series has multiple images. A directory structure to model this relationship would start with the study directories at the top, each containing directories for each subject, which contain directories for the scan and then series. In this way the path to any given file shows, for example, which study and subject a particular scan belongs to.
  • Similarly, file names can be made of various components to encode information about a scan. For example, the patient code, date and time of a scan, when combined unambigiously identify a scan.
  • It is better to stick to 'facts' when chosing file names. 'Facts' include the subject code, and the date and time. Choosing other name parts can lead to confusion: for example, 'scan2' might be obvious at the time, but not several years later. Similarly, the words 'new' and 'old' quickly lose their meaning. Dates and other factual data may seem less convenient at the time, but have the property of correctness, which does not change.
  • When combining parts of a file name in this manner, put the parts in order of increasing rate of change. In the above example, the order would be patient - date - time. This causes the files when sorted in order to be grouped by the slowest-changing component (by name, then date, then time). For example: subj010_140502_103000
  • Dates, if stored as YYMMDD (year, month, day) or YYYYMMDD , will sort in order when listed alphabetically (the default).
  • When using numeric values in a file or directory name, pad the nuber with leading zeroes to the maximum number of digits you think you will need. In this way, files sorted alphabetically will also be sorted numerically: file02 will sort before file10, whereas file2 would sort after file10.
  • When using an ordinal value, starting at 0 means that each group of 10 will differ only in their last digit. This makes it easier to identify subgroups based on their digits. file00 ... file09 groups more easily than file01 ... file10, and much more easily than file1 ... file10..


Identify your actual problem

Think about the problem you're trying to solve. Does your solution address the problem, or avoid it? If your problem is converting image files, why is this necessary? Conversion is a lot of work, makes it complicated to keep track of your original data, and can easily introduce error. Perhaps there is another program available that performs the same task and can handle the format of the input data.

Find and use the right tool for the job

There is a free program to address almost any imaging need. While it might not be the perfect or final solution, using a program designed for the task in hand will usually save considerable time. Try to not stick just with the program you are familiar with, don't treat everything as a job to be done with the one tool.

Don't view the world through 'X'-shaped glasses

This means, don't view every problem as one to be solved with program X, the one you are most familiar with. If you are a Matlab expert, you may see the world through Matlab-shaped glasses, and every problem is something to be solved with Matlab. There is quite likely an imaging program that will perform the task you need.

Try to not tolerate performing repetitive tasks by hand.

Manual tasks scale linearly: ten times the work takes ten times the effort. Automated tasks scale almost infinitely with no effort required from you. If you know you will be performing the same task many times, look for a way to automate it. This may mean finding different software to perform the same task, or asking someone with technical knowledge for assistance. Computers are there to do things for you.

Test your software on the smallest possible case

When you're developing an imaging process, don't test it on your entire data set. Find the smallest case (one file, if possible), and test that case exhaustively. Then gradually add a few more test cases, especially the ones that cause problems: the first, the last, the biggest, the smallest. Once you're satisfied that you can handle the 'edge' cases, process your data set.

Put aside some time for 'computer stuff'

If you spend 10 - 20% of your time on the computer, in tasks related to the computer itself rather than your field, you will recover this effort many times over through increased efficiency. This time could be spent in analyzing and improving your workflow, implementing a proper backup strategy, or finding and learning new software designed for your task.

it is worth while learning some skills on the command line, and consequently, scripting of command line processes. This opens the door to almost infinite scaling of your effort. These skills can be used identically on all computing platforms, and are not subject to technological obsolescence. Most Linux-like command line utilities are unchanged over decades and will remain so for decades to come.

Work on the process, and make it repeatable

Most image processing consists of several steps. Analyze these steps, and think about whether you are actually adding information to the system, or just manually performing work. In imaging, examples of the former include using your knowledge of the problem space to make a decision or perform a skilled task: judging if a segmentation is correct, or identifying anatomic regions. Examples of the latter include copying data around, converting images, or running a sequence of programs.

If a step in a process results in no information gain in the system, then by definition, it can be automated.

Identify what is important, and protect only this

What is important is anything you have directly put input into: files you have created. In general computing this means your documents, diagrams, illustrations. In imaging it means anything you have developed for image processing: source code, scripts, notes. These are very small files and can be thoroughly proctected. Things that you should or may not need to protect as thoroughly include:

  • Original image files. Unless you created them yourself, they probably came from a PACS or other imaging source, and have already been backed up.
  • Processed image files. These are the product of the original files, and your processing. If you have made your processing repeatable (or better still, automated), the result files may easily be regenerated. Then you will know exactly where they came from, and what they contain.

Keep one copy only, and back it up properly

'Taking a copy' is not a backup. A proper backup strategy has several qualities:

  • It happens automatically. If you have to do it yourself, it won't get done.
  • Regular incremental backups protect from accidental deletion of a file.
  • Less-frequent full backups protect from complete loss of a disk.

If you are careful to separate the small quantity of files directly containing your input, from the large quantity of image files, you can perform thorough backups on the single copy of your valuable files. Services exist to help you with this:

  • Dropbox. Even the free account (2 GB) will protect all your important files, if you are selective. This provides incremental coverage, and also synchronizes the files between all your computers. Since it is a remote location, you can't lose the backup device.
  • Time Machine (Mac) or simlar built-in real-time backup program, will give you a second level of incremental coverage. Better still, many of these programs can be configured to use free online storage (from Google, Microsoft, and many others), so you don't have the backup copies in the same location as your originals
  • A regular full backup (monthly) guards against total loss of your disk. A large external drive will allow you to back up everything, even your image data. Most backup software can maintain a full image of your disk without having to copy all the contents every time, saving time and space. It's a good idea to keep this in a different physical location from your computer.

In the event of a total loss, restoring the monthly backup gives you a known starting point upon which to restore the incremental backups.

If you work with text files, particularly scripts and computer code, you need to use some form of version control such as Git or Mercurial. A version control system allows you to commit your files to storage as a named timepoint, and later on, revert to that timepoint. You can also branch off to develop or test a new feature without the risk of introducing errors to your main branch, and when the feature is complete, merge the new feature into the main branch. Version control also allows you to contribute to someone else's code, or accept contributions, in a safe and reversible manner.

Expect your computer to fail. Expect to make mistakes.

The only promise in computing is that hardware will fail (and more frequently, data will be lost through mistakes). With the right preparation and procedures, loss hardware failure is an inconvenience, not a disaster. Getting your processed data back consists of a few steps: Restore your processes from backup, retrieve the original data from its source, re-run the processing. There, you have your results back.

The Golden Rule

Never forget Rule 1