I. Description of the Data Set

This is the light-weight version of the popular DDSM (Digital Database for Screening Mammography) data set which currently is obsolete. To answer the nagging question why Mini-DDSM, it is important to know that the DDSM database has a website maintained at the University of South Florida for purposes of keeping it accessible on the web. However, image files are compressed with lossless JPEG (i.e., “.LJPEG”) encoding that are generated using a broken software (or at least an outdated tool as described on the DDSM website). CBIS-DDSM provides an alternative host of the original DDSM, but unfortunately, images are stripped from their original identification filename and from the age attribute. It required a tremendous time, coding and machine processing power to get it in shape, below are some of the merits of this new Mini-DDSM version:


1. The intention here is to make an easy access to the DDSM (half resolution though)
2. The data set comes along with the age/density attributes, patient folders, original identification filename, and lesion binary mask.
3. No complication of extracting/loading images from tfrecords. You want images, you get images! So, whether you are using Python, MATLAB, JAVA, C++, you have the images stored as images.
4. The lesion binary mask is constructed based on the original freeman chain-coding, so this data set prevents you that inconvenience.
5. This data set comes with an excel sheet that gives you a direct access to all images attributes and metadata, get the link below.
6. Free of charge and open access, no lengthy protocols and no forms to fill/sign
7. If your machine/internet bandwidth does not allow you to download the 45GB data to work on it, no problem, let me know and I can convert it to JPEG with the resolution you want.




Figure 1. Age and Density Data Distribution in the Mini-DDSM.

Below is the excel file of the data accompanying the mammography image data set (which you can download from Kaggle -due to their large size-)