Writing C extension code that consumes data from any Python file-like object (e.g., normal files, StringIO objects, etc.). read()
method has to be repeatedly invoke to consume data on a file-like object and take steps to properly decode the resulting data.
Given below is a C extension function that merely consumes all of the data on a file-like object and dumps it to standard output.
#define CHUNK_SIZE 8192 /* Consume a "file-like" object and write bytes to stdout */ static PyObject* py_consume_file(PyObject* self, PyObject* args) { PyObject* obj; PyObject* read_meth; PyObject* result = NULL; PyObject* read_args; if (!PyArg_ParseTuple(args, "O" , &obj)) { return NULL; } /* Get the read method of the passed object */ if ((read_meth = PyObject_GetAttrString(obj, "read" )) == NULL) { return NULL; } /* Build the argument list to read() */ read_args = Py_BuildValue( "(i)" , CHUNK_SIZE); while (1) { PyObject* data; PyObject* enc_data; char * buf; Py_ssize_t len; /* Call read() */ if ((data = PyObject_Call(read_meth, read_args, NULL)) == NULL) { goto final; } /* Check for EOF */ if (PySequence_Length(data) == 0) { Py_DECREF(data); break ; } /* Encode Unicode as Bytes for C */ if ((enc_data = PyUnicode_AsEncodedString(data, "utf-8" , "strict" )) == NULL) { Py_DECREF(data); goto final; } /* Extract underlying buffer data */ PyBytes_AsStringAndSize(enc_data, &buf, &len); /* Write to stdout (replace with something more useful) */ write(1, buf, len); /* Cleanup */ Py_DECREF(enc_data); Py_DECREF(data); } result = Py_BuildValue( "" ); final: /* Cleanup */ Py_DECREF(read_meth); Py_DECREF(read_args); return result; } |
A file-like object such as a StringIO instance is prepared to test the code and then it is passed in:
Code #2 :
import io f = io.StringIO( 'Hello\nWorld\n' ) import sample sample.consume_file(f) |
Output :
Hello World
Unlike a normal system file, a file-like object is not necessarily built around a low-level file descriptor. Thus, a normal C library functions can’t be used to access it. Instead, a Python’s C API is used to manipulate the file-like object much like you would in Python.
So, the read()
method is extracted from the passed object. An argument list is built and then repeatedly passed to PyObject_Call()
to invoke the method. To detect end-of-file (EOF), PySequence_Length()
is used to see if the returned result has zero length.
For all I/O operations, the concern is underlying encoding and distinction between bytes and Unicode. This recipe shows how to read a file in text mode and decode the resulting text into a bytes encoding that can be used by C. If the file is read in binary mode, only minor changes will be made as shown in the code below.
Code #3 :
/* Call read() */ if ((data = PyObject_Call(read_meth, read_args, NULL)) == NULL) { goto final; } /* Check for EOF */ if (PySequence_Length(data) == 0) { Py_DECREF(data); break ; } if (!PyBytes_Check(data)) { Py_DECREF(data); PyErr_SetString(PyExc_IOError, "File must be in binary mode" ); goto final; } /* Extract underlying buffer data */ PyBytes_AsStringAndSize(data, &buf, &len); |