Loading Data into Python

From Python Wiki

Jump to: navigation, search

This page covers some of the ways of loading data into Python.

Contents

General purpose import of data

loadmat()

The scipy.loadmat function allows you to import data from a Matlab .mat file. I (Paul) tried creating a very simple Matlab save file as follows:


 a=randn(10,10);
 e=eig(a);
 c.mat=a;
 c.eig=e;
 save


I was able to read that into Python using mat=scipy.io.loadmat('matlab.mat'). This creates a dictionary in Python, where the keys are the matrix names. Typing mat.keys() gives me the following:

 ['a', 'c', 'e', '__header__', '__globals__', '__version__']

I can access the data as follows:

  • a=mat['a'] - returns the 10x10 array random array a into a
  • e=mat['e'] - returns the 10 element array of complex numbers e into c
  • c=mat['c'] - returns a structured array into c. To access c.mat use c[0,0]['mat'], or to access c.eig use c[0,0]['eig']
  • header=mat['__header__'] - returns 'MATLAB 5.0 MAT-file, Platform: PCWIN64, Created on: Thu Jul 14 13:20:29 2011' into header

This gets a little more interesting when importing structures from Matlab. Take for example the following Matlab data:


 a.x=linspace(0,10);
 a.y=sin(a.x);
 a.title='Sine Function'
 save

When I load this into Python the variable "a" is a tuple of length 3. The [0] element of a is the a.x, the [1] element is a.y and the [2] element is a.title.


loadtxt()

The numpy.loadtxt function allows you to quickly import data from any text file with a fixed number of columns on each line. The default delimiter is any whitespace, but that can be changed to any desired value. By default any line in the file beginning with '#' is a comment, but you can use any comment character you desire. The function skips blank lines, but does require that you have the same number of columns in every non-blank/non-comment line of text. By default it reads in a single array with the number of columns equal to the number of columns of data, but you can also request that it read each column into a separate, one dimensional array. The numpy.genfromtxt function is much more general, but slower and more complex when the number of columns of data is constant.

(loadtxt Examples)


read_two_columns()

A function written by Tom that uses lower level Python constructs to read two columns of data called read_two_columns() can be found in [1]. This is very general and can handle blank lines, and multiple different delimiters. It is also an excellent demonstration of using lower level file and string editing functions in Python to read specific file formats.


Interactive importing of data in Spyder

Note that the Spyder IDE interface allows the user to interactively import data into the workspace. Go to the "Variable Explorer" tab and look on the right side for the "import data" button. Many different file types are supported through this functionality (including .spydata, .npy, .mat, .csv, .txt, and a variety of image formats). I (Paul) have found this to work well, but to be somewhat slower than the raw loadtxt function.


Importing Nastran Data

We often need to import data from Nastran. I'm hoping that others will contribute in this area. I would think that formatted OUTPUT4 files would be fairly trivial, followed by other formatted files such as .pch, but I'm looking forward to the solution for OUTPUT2 files. One of the keys here will be the format of the data once it's imported. There are many ways of representing data in Python including lists, tuples, dictionaries, arrays and matrices. I'm curious to see what types of data representation others are using, but I'm hoping that we can converge on something fairly general and consistent so we don't get into a situation where every routine reads data into a different representation.


Ben Emory's Links

Ben Emory of NASA GSFC provided the following links:

Josh Fonseca's utility scripts - Look in the pynastran folder for Python scripts that read Nastran bulk data and OUTPUT2 files.

CAELinux UNV2X and X2UNV Scripts - These are Python scripts from the CAELinux project that convert to and from Universal file formats. I don't see Nastran, but there are lots of codes that read and write Universal files, so they're a fairly good "Neutral" file format for FEM and test data. Note that the IMAT Matlab toolbox from ATA is based on Universal File type data structures, because it first started as an I-deas to Matlab translator.


Alternate Nastran Data Loader

There is another pyNastran (unrelated to Josh Fonseca's project) that reads the BDF, F06, OP2, and OP4. The BDF reader requires an Executive Control Deck and a Case Control Deck currently (bad for reading DMIG cards), but is a future enhancement.

The BDF reading/writing supports ~210 bulk data cards in v0.5. The cards are cross-linked so you can access information about an element's property from the element card. Cards have a series of useful methods for calculating things like Mass, Centroid, Normal, Volume, Area, etc. All fields are read from every card and any Nastran defaults are set. Most card formats are supported including small/large/CSV/tab formatted field. Unrecognized cards are echoed to the BDF. A test script allows a user to validate the software for their model by reading their BDF, linking it, calculate Mass/Centroid/etc. and printing the BDF back to the user in small field format.

The OP2 reading support is limited to static/transient results for displacement/temperature, eigenvalue order, eigenvectors, velocity, acceleration, static/transient stress/strain (rods, bars, beams, shells, solids), grid point forces, SPC/MPC forces, and strain energy density are supported. Note that a static SOL 200 (made up of multiple SOL 101/103/144/etc.) is supported. Nonlinear and hyperelastic support is limited and things like PSD (Power Spectral Density) and other random results are not. Note that elements function as linear elements (supported) or a nonlinear element (limited) depending on the solution type. A test script allows a user to validate the software for their model by reading their OP2 printing a large table of all the results or as an F06 file.

The F06 reading support is more limited than the OP2, but it does add an F06 reader as well. The OP2 reader is much better.

The OP4 reader supports sparse or dense matrices in ASCII and binary format. There is a pure Python version and a slightly less robust Cython version that is a little faster.

Additionally, the software has a basic GUI. It currently supports static stress (oxx, oyy, ozz, o1, o2, o3, and von Mises (or max Shear). A legend and Patran-style max/min values and subcase information are also shown. It has options for BDF/OP2 file loading, wireframe, solid, picture taking, background color changing, snapping to x/y/z axes, and cycling through results. It doesn't handle nodal results (e.g. displacements, spc forces) or nodal/element-wise picking.

Personal tools