Large File Makeover

Back to Reuse & Revise


About the Large File Makeover

The goal of this makeover was to transform a very large file submitted by Air Washington:

Air Washington

Submission title: Electronics/Avionics
Submission link: https://www.skillscommons.org/handle/taaccct/398
Description: a zip file of almost 900MB, into a set of submissions that would be more wieldy, transparent, and easy to adopt.

Overview

SkillsCommons contains a pair of Collections for each Grantee: Learning Resources and Program Support Materials.  Each Collection can contain any number of submissions.  Each submission has its own title, description, and metadata such as the material type, industry sector, language, quality metrics, etc.  A submission can contain one or more files.   The view of a submission includes some of the metadata as well as the list of files and their sizes.  Each file can be downloaded separately.  Metadata is indexed for searching; the content of the files are not indexed.  This means that if a file’s purpose and content need to be referenced in the submission.  Aggregating many files into a single zip file submission limits the utility of the metadata, since it has to cover everything in the file.

The original Air Washing submission collected a variety of content comprising several courses into a single large file.  At almost 900MB this file might take an unacceptably long time to download, particularly on a slow connection.  Further, this large size might discourage someone who wants to just look at the material and is wary of downloading something so big.  Since the material covers a range of topics, a single description and set of metadata might not provide the detail needed to easily identify what is in the package and whether anything is relevant to a search.

Large files are not necessarily unacceptable.  For example, let’s say a grantee produces several long videos.  These files might be quite large.  Rather than have user download these files locally, a simpler solution is to host the files, perhaps on a free public service such as YouTube.  The original file course be available in the repository in case a user wants to edit the video, with the description providing the link to the hosted version for user’s that just want to view the video.

Covering more than one course worth of material in a single submission is problematic because the content and value of each separate course is obscured when multiple courses are bundled together.  If the courses belong as a set, perhaps as a sequence, there is more value in providing a curriculum path as a separate document in each submission.  This allows a user to note that there are other courses to consider, but allows each course to be viewed on its own.  Similarly, if a course contains a number of topics that can stand alone but should be presented in a particular sequence,  it may be easier to discover and adopt material if each topic stands alone and there is a course syllabus or map.  Another option is to offer each topic’s material and a single organization of all the material, perhaps as a course export in IMS Common Cartridge format.  This offers the best of both, namely fine-grained material as well as the organization.

Guidance for Future Makeovers

The section on the makeover process goes into some length about what was in the original submission and what was reorganized and why.  Here are some of the lessons learned that can be applied to other large submissions:

  • A submission can contain one or more files.  Each file appears in a list below the submission title and description.  Each file includes the title and size.  Place individual files in their own submission or as files that are part of a submission.  This is preferable to zipping files into a single large submission.  If there are files that are only useful as a set, those can be zipped into a single submission or submission file, assuming the file size is, say, less than 20MB.
  • Take advantage of the distinct description field and other metadata offered for each submission.  This will aid discovery, since submission metadata is indexed for searching.  Zipping files together as a single submission can hide the individual file’s content, making discover and reuse less likely.
  • For video, make a dedicated submission or a separate file in a multi-file submission. Also consider hosting a copy of the video on YouTube or elsewhere so that the file can be referenced by link rather than downloaded.  The download would only be for archiving and for users who want the original video, perhaps to edit it.
  • For a single large file that might stand alone, make a separate submission.  For example, if there are a set of labs as well as a large lab instructor’s manual, provide the labs and the manual as individual files in a submission rather than zipping everything together.  Someone might want the labs or the guide, but not both.  This approach also goes for not mixing different content types – separate submissions have separate metadata and a user might be searching only for a specific media type.  This approach applies to almost any file that has distinct metadata.
  • If you are going to include a file in multiple formats such as one for editing (e.g. Microsoft Word) and one for viewing (PDF), be consistent in providing each format.  Provide each format as a separate file in a submission rather than zipping the files together.
  • Remove any duplicate files, temporary files, and the like from submissions.
  • Provide a separate submission for any content that is quite general and likely to be used broadly.  Similarly, separate content that is included for completeness, but is likely available in a richer form in another project.  Common examples include safety material and foundation subjects such as reading, writing, and math.
  • Help improve discovery by naming files descriptively rather than obscurely.  For example, don’t name a file for the course number at your school (e.g. Elect 247) when the subject (e.g. Microwave-CATV-Satellite Communications Lab Outline) is more indicative of the content.

The Makeover Process

The first step was to download the two original submissions, one at 897MB and a Part 2 at 16MB. Expanding the files yielded 6 high-level content areas and 5 folders marked as Full Courses.  Here is how each content area broke out:

AC (Alternating Current Labs)

  • This folder contained 8 documents in Microsoft Word format, the same content in PDF files, and a single PDF file containing all the material.  Having files in both an edit format (Word, for those who have it) and a view format (PDF) is a nice convenience.  There are a couple of choices for how to handle this situation: group all the files as a submission (one for PDF and one for Word), group each file-pair as a submission (one for the PDF and one for the Word format), or make a submission for every file (might be a bit lengthy).  The makeover resulted in three separate files for the Alternating Current Labs submission: one containing all the Word documents, one containing all the PDF files, and one containing a PDF of all the content.
  • An inconsistency  can puzzle the user.  For example,  there are also PDF files titled “Electromagnitism” and “Electromagnitism II” without a corresponding DOC file.
  • There were also two temporary Microsoft Word files (begin with a ~), which were not brought forward in the makeover – no need to include that kind of file.

Avionics

  • This folder contained 11 content areas, each of which contained multiple files.  Under the makeover, each area became its own multi-file submission, also allowing more description per submission.
  • Two files, one on First Aid and one on basic math were teased out as separate submissions.  These can stand alone as well as being candidates for substitution with other materials in SkillsCommons.

DC (Direct Current)

  • This folder contained 11 movies, 5 of which are duplicates of another.  The makeover uploaded the 6 distinct movies to YouTube, using suggested tags plus “Skills Commons” and “Air Washington”, made a public play list, and licensed each video under Creative Commons Attribution.
  • Provided a submission which lists the 6 YouTube links.
  • There were individual lab files (PDF) and then all labs in a single file, which was omitted in the makeover.
  • Only the Electricity Theory file had both a Microsoft Word and PDF file.

Digital Logic

  • This folder contained 11 documents: 7 reference and 6 labs. Each become its own submission.

Electronic Theory

  • This folder contained 2 documents, each become its own submission.
  • Note that, for example there is a Solid State lab with a lot of material about Diodes.  This would not be found with a simple search of the repository (this requires indexing the files’ content and not just the submission metadata), but by adding these keywords to the description for this submission, someone can locate this material directly.

Fiber Optics

  • This folder contained 8 documents, which were logically divided into 2 related to syllabus, 3 related to lab, and 3 related to reference material.  The result was three submissions with multiple files in each.
  • As an example of how the large overall file size is broken out, the lab submission contains three files, a 14MB Lab Manual, a 3MB Instructor Guide, and a 95KB Student Worksheets document.  About 17MB of very specific material, that supporting Fiber Optics Labs, is now teased out of the original, monolithic submission.

The original submission also contained 5 full courses.

The Aerospace Fiber Optics course contained 21 files including class outlines and syllabus, 6 units of materials, and many exercise handouts.  This was all converted into a single submission with each file attached separately.

The OC course contained a lot of material also present in the content area.  For example, the OC Full Course contains the same Alternating Current lab as submitted above.  Rather than duplicate files, this submission only contains what is not already submitted elsewhere.  There were other cases of duplicate material, even just inside this Full Course folder.  Only distinct content was submitted here.  The original content was 66 files for 38MB that was made over to 21 files and 8.7MB.

There was a course titled SCC for Spokane Community College.  This turned out to contain the outlines for 36 courses.  The flaw in this approach is that the titles of each document reflect the course numbers and not the subject.  These files were renamed.  Also note that where there is a course and its companion, say a lab or an advanced, it is helpful to make the titles start similarly so the files appear next to each other in sorted lists.

Original Filenames

Makeover Filenames

APLED 121 Outline.doc
APLED 125 Outline.doc
Elect 111 outline.docx
Elect 112 outline.docx
Elect 113 outline.doc
Elect 121 outline.docx
Elect 122 outline.docx
Elect 123 outline.doc
Elect 136 outline.docx
Elect 137 outline.docx
Elect 138 outline.docx
Elect 139 outline.docx
Elect 211 Outline-F2011.docx
Elect 212 Outline-F2011.docx
Elect 213 outline.doc
Elect 214 outline.doc
Elect 221 outline.doc
Elect 222 outline.doc
Elect 223 outline.doc
Elect 224 outline.doc
Elect 231 outline.doc
Elect 232 outline.doc
Elect 233 outline.doc
Elect 234 outline.doc
Elect 245 outline.doc
Elect 246 outline.doc
Elect 247 outline.doc
Elect 248 outline.doc
Elect 255 outline.doc
Elect 256 outline.doc
Elect 257 outline.doc
Elect 258 outline.doc
Elect 278 outline.doc
Elect 279 outline.doc
Elect 294 outline.doc
Elect 295 outline.doc
AC Circuit Lab Outline.docx
AC Circuits Outline.docx
Advanced Communications Lab Outline.doc
Advanced Communications Outline.doc
Advanced Computer Systems Lab Outline.doc
Advanced Computer Systems Outline.doc
Applied Written Communications Outline.doc
Avionics Systems Lab Outline.doc
Avionics Systems Outline.doc
Basic Computer Systems Lab Outline.doc
Basic Computer Systems Outline.doc
Broadcast RF Communications Lab Outline.doc
Broadcast RF Communications Outline.doc
Communication Fundamentals Lab Outline.doc
Communication Fundamentals Outline.doc
DC Circuit Lab Outline.docx
DC Circuits Outline.docx
DC-AC Circuit Math Advanced Outline.doc
DC-AC Circuit Math Outline.doc
Digital Concepts Lab Outline.docx
Digital Concepts Outline.docx
Digital Data Communications Lab Outline.doc
Digital Data Communications Outline.doc
Employment Preparation Outline Outline.doc
Linear Devices-Circuits Lab Outline.docx
Linear Devices-Circuits Outline.docx
Microwave-CATV-Satellite Communications Lab Outline.doc
Microwave-CATV-Satellite Communications Outline.doc
Principles of Avionics Lab Outline.doc
Principles of Avionics Outline.doc
Solid State Devices-Circuits Lab Outline.docx
Solid State Devices-Circuits Outline.docx
Systems Troubleshooting Lab Outline.doc
Systems Troubleshooting Outline.doc
Wireless Communications Lab Outline.doc
Wireless Communications Outline.doc

The electronics capstone course was also broken out as a submission with each file attached separately.

To see the large file makeover submissions, look to SkillsCommons – Large File Makeover Examples

Batch Uploading

The makeover involved creating many submissions.  SkillsCommons has a provision for this called the batch upload process.  To prepare for a batch upload, you download a spreadsheet file and assign the metadata for each submission to a distinct row in the spreadsheet.  In the column for the filename to upload, there is an option to specify multiple files by separating each name with a pipe “|” character.  During the makeover process, there was a batch spreadsheet for each major folder.  While there is a bit of work to filling in the spreadsheets, a lot of the content can be copied from row to row and sheet to sheet.

There are detailed instructions on the batch upload process in the SkillsCommons Support Center.