Smooth Succession/Meeting Notes for 2016-09-19
< Smooth Succession(Redirected from NPSA Meeting Notes for 2016-09-19)
Jump to navigation
Jump to search
Contents
Smooth Succession
- Date
- Monday, 19 September 2016
- Event Announcement
- http://www.meetup.com/NetSquared-Kitchener-Waterloo/events/232556568/
Sooner or later, people move on. Sometimes they leave for greener pastures and sometimes they just leave. Sysadmins tend to have a lot knowledge about the systems they work with, and often their knowledge is in their heads and their heads alone. As responsible sysadmins, how do we transition out of our jobs without our organizations collapsing behind us? How do our replacements learn the institutional knowledge they need to keep things running? What best practices can we implement to document and share knowledge so that others know what is going on when we are hit by buses?
Future sessions
Documentation
- What do you document?
- What tools do you use?
Coming up with time/effort estimates?
- How do you be realistic but efficient
- How do you justify unanticipated difficulties
Questions
- Have you taken over from another person leaving? What was helpful? What was frustrating?
- What preparations have you made so that future people can successfully transition into your work?
- What barriers and challenges are there to smooth succession?
- How do you transfer institutional/oral culture?
- What best practices are there for documentation?
Meeting Notes
Our IT hats
- Schoolteachers: often one person gets picked to wear the IT hat
- 50 staff, 300 students
- He deals with tech support questions
- The board has a regular IT department but the ratio is high: 1 person for thousands of users
- Tickets take a lot of time to resolve from the IT department
- Teachers often have to pick up the slack
- The IT staff they get in now are younger
- The software stack seems to work better now
- Software compatibility would break when deployed
- eg a network game would break everything else
- Now they test deployments better
- But this reduces spontaneity
- What about interaction with the school boards? How do documents get passed around?
- This is more centralized now
- They were going to give all kids their own email accounts
- Schools have logins for their kids now
- Some school boards do BYOD (Bring Your Own Device)
- This is cheaper for the school boards, which can't keep up (and budgets are tight)
- They use the same number of IT staff for the Catholic school board as they did for the entire high school system
- This probably implies web interfaces for everything
- Small non-for-profit, 25 staff
- Prior to joining his director was the primary IT person
- They signed a contract for hardware/software support
- Now there is an IT committee
- He made the mistake of admitting that he "knew about computers"
- The organization decided to move to a cloud based service (Sharepoint) with a data migration
- This was somewhat painful because the outside supplier did not tell them about their slow upload speeds
- He does software/hardware problem solving
- He does software upgrades: Office 2013/Office 365
- Does training on the Sharepoint move
- They are trying to transfer knowledge from the director's head to the collective
- They have a local server
- They also do BYOD
- Getting information for connecting computers to the server is tough
- How can staff do their jobs day to day
- Do people prefer Office 2013 to Office 365?
- There is more functionality in Office 2013
- eg they have a room booking spreadsheet that has pane-freezing problems
- Do people have problems with file versioning?
- Not really
- They have had communications problems with outside tech support
- Even doing hardware audits and internet connections was tough
- Getting people up to speed in Sharepoint is a big issue
- People have problems adjusting to change
- Where is the storage? It is all on the Microsoft cloud
- How do you deal with shared documents on Google Drive?
- You can map your own drive to a drive letter but cannot access shared drives
- OCAML FUSE driver under Linux for Google Drive: https://github.com/astrada/google-drive-ocamlfuse
- Approaches to succession at a large company
- There were procedures that were documented in a lot of detail
- Important for time-sensitive stuff (eg batch jobs)
- People did document well
- You could search a spreadsheet for jobs to diagnose
- Disaster recovery testing were documented in a lot of detail
- He participated in disaster recovery one year
- A coworker then started the next year, and he gave pointers
- The documents were well-written and a good guide
- Reviewing the documents well before is important
- Management was invested in making sure that documented were well done
- Another co-op job was not as smooth
- A small one-person operation was not documented well -- much of the knowledge was in this person's head
- Maybe this person should have done more documentation
- The boss was very time-conscious, so he documented only the most complex issues
- Writing things down is a good buffer for dealing with remembering stuff that is on screens
- Is commenting code financially efficient? There is a short-term/long-term tradeoff.
- Implementing better error tracing can be used by future people
- He was working for a small startup where the emphasis was getting things as soon as possible with no succession of any kind
- There ought to be good handoff procedures
- This can be an issue with Google Summer of Code: people hang out for four months and leave
- But sometimes there are good changelogs
- Succession horror stories (small nonprofits)
- He would like people to assign administrator access
- Most organizations are staffed by nontechnical people
- When going to new organizations
- He had to explore how things are hooked up and why
- Naming conventions were weird
- He changed some of the printer names and got into trouble because it messed up the network documentation
- Other places have been decomissioning jobs
- He had to document everything before shutting things down
- Big municipality had a good disaster recovery plan
- Nobody should have to think in order to get things back up
- Problems: system change and then documentation goes out of date
- One on one training is better than doing no documentation
- He worked for an insurance company. Their disaster planning was based on insurance.
- This is called "key man insurance"
- Worked for a university press
- He kept the job for 30 years
- He had a lot of autonomy in writing his job descriptions
- Early on they had their own UNIX system and some people on Windows using UNIX tools
- User training was not difficult because typographers know how to type to get stuff done
- But in 1999 things changed. Kids these days! They only know how to use word processors
- Passing on old skills was hard
- When he went on leave he hired a friend who knew the same skills
- When he was getting closer to retiring there were a lot of meetings about the stuff he did. Other people were learning this but others didn't think they could handle the whole thing.
- The people who took his job have good communication skills and could change things to their preferences
- He found that his meetings were collaborative and good for problem solving
- Things are going well but are slower
- eg there are fewer spreadsheet manipulation abilities
- There is documentation in wikis. People can read them but not write to them easily.
- Have others dismantled your work since you left?
- Yes
- They were thinking of shutting down the Linux servers
- They were going to migrate the functionality to a virtual machine
- The server ran for a year without being rebooted and continued to work
- Working with text files on local servers can be simpler than the cloud, because of black boxes
- He had a lot of discipline to the structure of the data
- black box: you have a promise of input and output, but you don't know what is happening inside
- If the input data changes then everything can get messed up
- Can you troubleshoot problems when they come up
- Black boxes mean you can change the inputs and examine the outputs, but this is trial and error
- Is there good software for putting bounding box information on EPS information. He found a script that worked that was made of Perl and shell script.
- At TWC
- Lots of complicated infrastructre
- Some of it is documented but documentation goes out of date
- People come and go
- Understand everything about everything
- Oral culture (both positive and negative)
- Documentation is like survivalist training
- Documentation that gets used stays up to date
- Some documents are used frequently
- Write down passwords in a shared (encrypted!) document
- Multiple people working on a door system means documentation gets written
- Documentation that is hard to write and hard to update does not get written (or gets written and is useless)
- Text only
- No screenshots unless absolutely necessary
- Trivial update mechanisms
- DRY : Don't repeat yourself
- Trivial to search
- OneNote
- Plain text
- Documents with good search
- Email (yes, really)
- Write documentation as you go
- Too much documentation is kind of better than too little
- If you learn things twice then document carefully the second time
- Some people consider lack of documentation as job insurance
- HOWTO files can be helpful
- Make things as self-documenting as feasible
- Drop README files in source folders
- Inline comments
- Documentation as file names
- Log files and version control are forms of documentation (if you have the discipline)
- etckeeper is good for Linux systems
Best Practices
- Mind the bus factor and stay away from public transportation
- Don't store documents in someone's personal folders
- Having good documentation is helpful. How does it get created?
- Never admit you know computers
- How do you keep documentation up to date as things change?
- Make documentation accessible
- Get good at trawling other people's work
- Do regular training for staff and volunteers
- Forcing people's hands can help
- Start people small if you can
- This way you can assess their skills and commitment
- Make new people do documentation as they work
- This helps them learn the systems
Worries and Challenges
- Being the person who gets hit by the bus
- How do you spread information?
- Continuous learning by staff -- raising everybody's level of knowledge
- Management may not be on board
- Do people understand that not having long-term planning leaves them vulnerable?
- You can't boss around volunteers as much
- People think that the cloud solves backups and IT administration
- How hard will it be to step into a new position?
- When we are unemployed because we don't have the tools
- Money becomes a huge issue
- Getting access to hardware is an issue
- How many times will you be called after you left?
- Will you remember your old work
- There is a sense of liability -- who is responsible when things break?
- Choosing the wrong successor could be a disaster
- Finding time/resources to transfer knowledge
- Sometimes you need to be inefficient to be effiencent
- Letting other people do the thing even though you could do it faster and more efficiently
- Letting other people do the thing in ways you would not do it
- Giving people good base levels of knowledge helps
- How do you learn the system while being careful and not destroying everything in a burning ball of flame
- How do you make a good impression and getting things done both quickly and correctly
- Sometimes contractors get commissions with promises they cannot keep