Currently FDA official submissions must be in XML or PDF format, with the preference being the submission of at least the XML format. The rationale for these is the need to have a document that will be readable and accessible in perpetuity. This is definitely an important consideration. However, these formats contain limitations when it comes to the review, oversight and approval process as clinical reviewers are more effective “seeing” a case as a set of forms the way the Investigator filled out those forms, while analysis staff are more effective with the XML raw information. How do you handle ancillary data, such as images and graphical information that were part of the process which cannot be imbedded in XML?

Below we will present a case for allowing HTML archives to be added as a part of or one of the allowable formats for FDA submission, fundamentally to support clinical case review. The raw XML would continue to be favored format for data analysis.

HTML, Hyper Text Markup Language, is the foundation of all computer languages used to create web-based interfaces. HTML is a specialized form of XML that when combined with images and style sheets (CSS) make a web page appear as it does. The Internet has been growing exponentially since its inception transforming business and everyday life; the Internet is here to stay and so is HTML. “Billions and billions” of HTML web pages exist in the world today. HTML can be compared to the alphabet. Like the alphabet, it provides us with basic element in which we can combine in order to communicate via web pages. HTML does for computer language what the alphabet does for our language. It is safe to say HTML will stand the test of time.

One might assume that HTML requires a web server to operate, but in fact, an HTML page is just a document that resides on a computer hard drive on a server, and can also be saved locally. All browsers support “Save As” and can save a web page for later viewing when the computer is disconnected from the Internet. All that’s needed is any brand of web-browser. As such, these HTML/XML pages are safe, secure, and as perpetual as any computer file on the hard drive.

Now that we have established that HTML is a viable solution in perpetuity, let’s examine the differences between XML, PDF and HTML archives. The main differences are listed in the table below.

 XML ArchivePDF ArchiveHTML Archive
Accessing softwareXML can be imported into many database software programs as well as some spreadsheet programs. It is truly non-proprietary, and readable by humans with just a text viewer.PDFs can be read using a number of portable file document readers. The PDF format has been made public, but is proprietary and licensed by Adobe.HTML is truly non-proprietary, and can be read like XML, but looks best when combined with CSS in any browser.
Security and Virus vulnerabilityNo vulnerability.No current vulnerability but PDF enhancements may someday add scripting.HTML can have embedded JavaScript that a scanner would need to remove. These same malware detection components are included in all modern browsers.
Availability of software to access archive typeMost of these have to be purchased.Some of the readers are available free, but the population in general does not all have them.All computer users have access to free browsers.
NavigationXML is typically hierarchical which reduces duplication compared to repeating table rows. Navigation is by drilling into the hierarchy much like opening folders with subfolders.Data is presented in lengthy documents that are difficult to navigate, but that display the forms upon which data was entered. Limited capabilities to search and find information.Displays the data in the forms that they were entered in providing the viewer an experience similar to what was used when the data was input. The viewer is able to navigate through pages using the same process the data collector did.
Handling of uploaded documentsNot contained within the XML file.Usually available as separate PDFs in a separate folder with no link back to the patient or form.Documents are accessible through clicking the embedded hyperlink. Documents are in their original format or PDF.
Visibility of queries, field notations, and changes to data Prelude’s XML has queries, comments, and changes co-located with the field variable to which they apply. No separate files. However, XML while a simple structure and open format, is verbose and reviewers are challenged to view a “case” as an entity.The PDF as a screen image will show what was on the screen, but may also be a separate page or even a separate document. This can make searching and finding difficult.HTML Archives are saved web pages, and when done properly, allow users to click the hyperlinks to view and navigate the specific query, field notation or change to data related to a given field at the point of viewing the field.
Overall functionality for analysis (e.g. SAS)XML provides data that can be freely processed by many non-proprietary systems to support a data analysis.Does not allow data processing.Does not allow data processing.
Overall functionality for case reviewMost FDA personnel who review and provide study oversight are not data managers or statisticians; they are monitoring personnel who are more familiar with a paper-based analog (CRFs) process where they can look at the forms and see the data being collected in context as the Investigator staff viewed the case. XML does not provide a format that is easy for them to use.It is difficult to navigate through long PDFs of patient case report forms and to navigate between patients is nearly impossible or requires opening multiple individual patient PDFs.Provides excellent functionality allowing navigation from patient to patient and form to form in the same fashion that a person entering data would have done so. It makes the process of reviewing and providing oversight much faster because they do not have to organize the data and can focus on reviewing the data to ensure the safety and efficacy of drugs.

In conclusion, while the XML archive is a natural non-proprietary requirement for data submission because it does allow quick processing of the data in order to analyze the safety and efficacy of a drug, providing HTML data in addition is superior to PDF files in that it facilitates the case review within the same context in which the data was originally collected by Investigators. HTML will increase FDA reviewer’s confidence in the quality of the data and its outcomes.

Prelude Dynamics has initiated conversations with FDA CVM to discuss allowing HTML archives to be submitted. Among the discussion is also the possibility that Sponsors could chose to allow CVM to have access to the server in read-only mode where they could interact with the study data and even run reports, statistics and graphs within the system before final submission occurs in order to guide sponsors on any additional requirements or concerns they might have in order to help expedite the final review process, reduce the need for clarifications, and be quicker to market.


Jim Pedzinski – VP of Business Development
jpedzinski[at] – 512-476-5100 ext. 210