Troubleshooting PHP File Download Problems

PHP has odd behavior when downloading files.

Recently, I wrote my first PHP script to download files, and I discovered several problems, all of which were moderately difficult to understand, and a google search reveals that many other folks are seeing the same problem.

After experimentation, the one thing all my problems had in common was data being written to the same buffers used to stream the data which became the downloaded file.   If you want to get the summarized version, read the next list and skip the details.   If you enjoy the details, then what I found follows the summary.

Be warned:  once the header for the content-disposition is issued to Apache, that all echo statements until the file is closed, will become part of the output stream!!

Even if you don’t read the rest of this page, pay close attention to the following guidelines:
How to Avoid Strange Errors With PHP Scripts That Download Files
  • Always make sure that your PHP files do not have empty lines after the closing tag because the empty lines can become part of the data stream
  • Never issue any echos other than the ones necessary to handle the bitstream generated by fread because the echo statements will be written to the data stream or content disposition
  • Any SQL Stored Procedures you call should not have Print statements, because the Print statements can cause file corruption
  • Any SQL triggers that are called in the course of your file download should not have any Print statements for the same reason as stored procedures.
  • Make the code that does your download as small and tight as possible, which is good advice all the time.
Details

Search the internet for how to download files using PHP, and you will discover a plethora of information about strange problems with corruption and failures, but so far, I have not found one, single coherent set of guidelines about the exact things you can do and why they work.

These are the guidelines my boss and I came up with after experiencing what for a while seemed almost intractable set of problems.   In the end, I was able to cobble together bits and pieces of other peoples similar problems as described on the internet, and I think I have a little bit of insight into  why it is happening and how it can be prevented.

In my example, I will start with a script that works fine, and show you some ways to break it, and then discuss why it broke.      I am careful to “break” it by using things that “should” work, but don’t.

If I call the following script from another program, and properly send the $_GET information, it will work fine.    (I lifted most of this from somewhere on the internet):

This Script Works Fine:

<?php
$fullPath = isset( $_GET["id"]) ? $_GET["id"] : 'Undefined' ;$i = 0;
$j = 0;
ignore_user_abort(true);
set_time_limit(0);
if ($fd = fopen ($fullPath, "rb")) {
    $fsize = filesize($fullPath);
    $path_parts = pathinfo($fullPath);
    $ext = strtolower($path_parts["extension"]);
    header("Content-type: application/octet-stream");
    header("Content-Disposition: filename=\"".$path_parts["basename"]."\"");
    header("Content-length: $fsize");header("Cache-control: private"); 
    //use this to open files directly
    while(!feof($fd)) {
      $buffer = fread($fd, 2048);
      echo $buffer;
    }
    fclose ($fd);
}
?>

Two Ways to Corrupt the Download File Without Getting Any Error Messages

Scenario 1:  Include another file

If I add only this line to it:

<?php include_once 'AnotherModule.php';?>,

Then the script will run without any errors, and even download a file to my download directory, but the file will be corrupted.

Comparing the downloaded file in the first case to the downloaded file in the second case reveals that the last 2048 chunk of data (which contains the EOF marker) was not sent.

Why?

You might think the AnotherModule.php has a problem, but it does not.  I have included that file in a dozen other places with no issue, and if I test it with PHPCodeChecker.com, no errors are detected.   The only time I have a problem is when I include it in the same PHP program that downloads a file, but there are no runtime errors generated, and the error is only detected if you try to download a .jpg or .xls file, then try to open the file with their respective applications.

Scenario 2: Echo Statement preceding the While statement

A different form of “errorless” corruption is caused by adding the following echo statement:

echo "Content-Disposition: filename=\"".$path_parts["basename"]."\"";

In this case, after the downloaded file is opened, you will see the following characters occur at the beginning of the downloaded file:

Content-Disposition: filename=\"".$path_parts["basename"]."\"

followed by the remainder of the file.

Again, no runtime errors occur, and the problem is only detected when you try to open the file using an application such as Excel.

Scenario 3:  Missing 00 bytes

I can’t recall the details for how to produce this problem, so I will describe the results, and you may run into it some day.

In this scenario, the file is again downloaded, but almost all hex bytes that are 00 are stripped from the output, leaving a smaller, corrupted file.  Again, with no runtime errors detected.

What is happening?

Through experimentation, I came up with a few ideas about the root cause.

Scenario 2 is the easiest to understand.

All echo statements write to the same buffer that is used for Content Disposition

$buffer = fread($fd, 2048);
echo $buffer;

When you send the header information to apache,   [header(“Content-Disposition: filename=\””.$path_parts[“basename”].”\””);)]    it appears that whatever PHP “echo”‘s will become part of the disposition of the file.

Even if I set the buffer = ” before the first fread, the problem still happens.

An experiment to prove this would be to put the line,

echo "Content-Disposition: filename=\"".$path_parts["basename"]."\"";

before the header(“content-disposition”… line.

I think you can safely assume that once the header for the content-disposition is issued to Apache, that all echo statements until the file is closed, will become part of the output stream.

Empty Lines After the Closing PHP tag

In this case, it is empty likes after the ?> tag.  In other words, if you have a PHP element, <?php> …. ?>, make sure there are not spaces after the closing tag.    This is also true if you have a closing tag />, followed by a <script…> tag to put query inline with your PHP.  There should be no empty lines after the closing PHP tag.

SQL Server Can Interfere Because of Print Statements in Triggers and Stored procedures.

Never leave print statements in any SQL server stored procedures or triggers, that are invoked by your PHP code.  It is not uncommon to leave print statements in stored procedures or triggers since they can help with debugging, but they can cause your PHP code to fail without getting any error codes.

 

 

Leave a Reply