Session sorting by date


#1

In API v1, term objects from the Metadata API provide a list of session identifiers which appear to be reliably sorted chronologically even if the corresponding session_details objects don’t have start and end dates as attributes.

I understand that API v2 will be doing away with representing terms and that an effort is underway to bolster the start and end date data on session representations.

Do current plans involve getting to 100% with the session start/end dates before moving to API v2? If not, losing the information in the term objects would be a bit of a bummer. We are leaning on the sorting of those lists and inspection of session type to bolster a determination about which sessions are current or most recent.

P.S. I was curious how the lists of session identifiers on term objects in API v1 are created to begin with. Specifically, if they know how to sort that list, I’m wondering why the corresponding session data wouldn’t have start and end dates. I’m just curious to understand the situation better there.


#2

The current situation has been managed manually; as scrapers break due to the session list having new entries, we add new sessions. We try to research the correct dates, but they’re often an estimate since many states end up extending their sessions or not publishing a formal end date. If a session has multiple periods, (ie jan 1- mar 1 2017, jan 1 - mar 1 2018), we just overwrite the old ones.

Even states with statutory end dates can be fluid or complicated to compute (eg “The fourth fifth legislative day excluding holidays and early adjournments” and then they decide halfway through to start taking fridays off and just run a few weeks later). Many commercial sources publish calendars at the start of the new year, which is helpful, although they all differ a little.

They’re “sorted” in the sense that we want to keep the most recent “live” one the default so that we can just run “pupa update ABBR” and have it work, and we tend to just add new entries onto the end of the list. This has created a “mostly correctly sorted” historical list, but isn’t always accurate right now.

Some states run multiple sessions and/or specials simultaneously (LA and UT off the top of my head), or stop and start, interspersed with specials, which breaks this a bit. Term was nice for keeping track of this, but didn’t actually tell you what sessions were “live” at any given time, in the sense that changes could (or could no longer) happen.

Just FYI I’d strongly recommend you work off some kind of manually managed “current sessions in each state” file, as trying to map calendar dates to ‘active’ sessions is a fraught approach for a couple of reasons:

  1. Many states don’t give reliable dates for session end, or they pop in and out, or they add ‘special’ session info to the regular session’s website data.
  2. Actions can still happen in some states after the legislature is out. Primarily executive actions. but in a few states hearings happen or supplementary docs and sponsorships can change all year, even if the legislature is out of session.
  3. Prefiling, which we usually have covered because we switch the default session even before it starts, but some states also allow prefiling for the next session before the current one ends (IL, NY i believe).

I’d be open the idea of adding some kind of file/API for this to Openstates, but a I’m concerned about having one more thing to keep it up to date. @james is there any way to link up the scripts that actually run the scrapes at night to the API, so it would be clear what data OS is currently grabbing at least?


#3

Hi @tims,

The project I am working on will supplement the automated bits with manual administration. I’ve only recently waded into the waters of this domain, so I wanted to get a sense of how far automation could reasonably take us. Thanks for the added perspective. @EdStaub has helped me some on the side, as well. Thanks to you both and everyone else involved in this awesome project.


#4

@srasmussen any time!


#5

@srasmussen It would be really helpful to have a narrow definition of what you need here.

I doubt that anyone will go back and add session dates to all the sessions before 2017. On the other hand, a couple of months ago I ensured that all sessions from 2017 and 2018 had at least a start date. If there are new sessions since then, I don’t know what dates they have.

I just opened https://github.com/opencivicdata/pupa/issues/314


#6

For what I am working on, we wouldn’t be interested in historical data, just data going forward. So, when I asked about “getting to 100%” with the data, I was just wondering if current timelines for the project included a point past which one could expect to get start and end date attributes on all sessions.

The goal with the features I am working on is to be able to identify bills which are part of currently active sessions. We’re going to move ahead with the current data with some manual supplements and some qualifying language to make sure people understand the limitations of what we have. My main goal in this thread was just to further refine my view on the data and current development on the project.