Research File Download Instructions, Formats, File Layouts, and Usage
Formats, File Layouts, and Usage
Please note that using the research files provided at this site requires
expertise in the management of large data files. These files can range from 1MB
up to 130MB and more and take many hours to download if you have low Internet bandwidth.
Working with these research files requires advanced data management skills. Many
of the district and county research files are too large for spreadsheet
applications such as MS Excel. Database applications such as MS Access,
SAS, or SPSS will be required to fully manage these research files.
There are on average 900 records for each entity (school, district, county, or
state). Each record represents a different combination of demographic
subgroups, grade levels, and test types. With so many records per entity,
it is critical that the desired combination of characteristics is selected
Copying individual report pages into a spreadsheet application is possible if
the target computer is using the most current operating systems and spreadsheet
application versions, and has sufficient computer memory.
The Research Files contain the aggregate score data for the Smarter Balanced Summative Assessment for English language arts/literacy (ELA) and mathematics. The research files are available in two formats: fixed width and comma delimited. A statewide research file containing the state, county, district, and school data for “All Students” (no demographic subgroup data) will be available in all two formats. In addition, a similar statewide research file containing the data for “All Subgroups” is available in each format.
Files can also be downloaded for any single county or district. These files
contain all data (all subgroups and tests) for all entities comprising the
selected entity. For example, if a district file is selected, the data for all
schools in that district will be included in the file. The research files are
comma delimited and zipped to allow for easier download and file import management.
“School only” files are not available.
The Entities file contains all school, district, and county names.
This file must be merged with the research file to join these entity names with
the appropriate score data. A database program such as MS Access is most
appropriate for this purpose.
Research file layouts and value lookup tables are available on the Research File Layout page.
The Research File Layout provides the following information:
- Type Legend–data format of individual data fields
- Entities File–layout of entity file data fields
- Test Data File–layout of test data file data fields
- Table A–demographic subgroup listing
- Table B–grade listing
- Table C–test name listing
Users of comma delimited research files will find these layouts useful in confirming the sequence of elements as well as value lookup. Users may view and/or download any of the layouts and tables.
Also available from the Research File Layout page are two additional comma delimited lookup files:
- Test ID/Name Lookup Table – This table identifies the subject test name and ID for the two Smarter Balanced Assessments, ELA and mathematics.
- Subgroup ID/Name Lookup Table – This table identifies each demographic subgroup and ID reported in the CAASPP results.
Both of these lookup tables are useful when associating test and subgroup IDs and names with codes in the comma delimited or fixed width test data file.
A database “shell” is another alternative provided at this site. Once downloaded to the target computer, this application provides a powerful school, district, CDS, and ZIP code search capability as well as a formatted report containing all the data for the selected entity. This MS Access 2007shell contains all entity data and is designed to import any of the selected state, county, or district comma delimited files. In order to use the shell, MS Access 2007 must already be installed on your computer.
Downloading Instructions for PC and MAC Users
Select the link for the file type that you will be using (CSV, TXT).
Save the compressed data file to your workstation or pc.
Uncompress the file. Each file has been compressed and will require compression software to uncompress
Downloading the Access Database Shell (Note: MS Access2007 must already be installed on the target computer)
Under the Access Database – Main Component heading, select Access
Database – Main Component.
Save the compressed file to your computer.
Uncompress the zipped file to your computer.
Identify and download a statewide, county, or district csv (comma delimited) file
containing the data you wish to evaluate according to the directions above. Be
sure to place the uncompressed data file in the same directory as the
Access Database – Main Component.
Open the Access Database. The program will give you the option of importing
any score data file in the same directory.
Select the file(s) to import. (Note: the Access Database – Main Component
already contains all entity data.)
Achieving accurate results when working with these research files requires an understanding of the structure and content of the two primary tables: the entities table and the test data table. The research files have many rows for each entity. There are records for each combination of grades, tests, and subgroups. This means that there are hundreds to thousands of records for each entity, with an average of approximately 900 records. In order to correctly work with the data, you must use constraints to limit the data you are reporting. These constraints are discussed below.
This table is comprised of the state, all counties, districts, and schools in California. Because there are both school-level and district summary records as well as county and state summary records, it is critical that in any analysis, a “Type ID” record type be selected. This will help avoid the double or triple-counting that will occur when a school count is also counted in the associated district record.
Test Data table
This table is comprised of the school, district, county, and state aggregate CAASPP counts and scores.
To accurately analyze and report from these research files, the appropriate constraints must be applied to the following elements:
- CDS code – The research files contain summary district and county records. A district summary record will have a “school” code of “0000000.” When working with the file, be sure to include the county, district, and school codes. Failure to include all three data codes will result in double-counting in any summary calculations.
- Test type – Identifying the desired test (ELA and mathematics) will help to provide clear query results.
- Subgroup ID – Each student will be included in both the “All Students” subgroup aggregation and each of the appropriate subgroup aggregations. Consequently, an individual subgroup must be selected to avoid duplicate counts.
- Test ID – In general, each student will take a number of tests (e.g., a grade five student may take the ELA and mathematics and science). Consequently, a specific test should be selected to avoid confusion.
Providing accurate and meaningful reports from the research files generally requires the “linking” of the 2016 Entities and Test Data tables. Additional efforts might include linking to the “lookup” tables discussed above. Working with these tables requires an understanding of “relational” data tables and their manipulation.